917 resultados para Data Generation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of Mahalanobis squared distance–based novelty detection in statistical damage identification has become increasingly popular in recent years. The merit of the Mahalanobis squared distance–based method is that it is simple and requires low computational effort to enable the use of a higher dimensional damage-sensitive feature, which is generally more sensitive to structural changes. Mahalanobis squared distance–based damage identification is also believed to be one of the most suitable methods for modern sensing systems such as wireless sensors. Although possessing such advantages, this method is rather strict with the input requirement as it assumes the training data to be multivariate normal, which is not always available particularly at an early monitoring stage. As a consequence, it may result in an ill-conditioned training model with erroneous novelty detection and damage identification outcomes. To date, there appears to be no study on how to systematically cope with such practical issues especially in the context of a statistical damage identification problem. To address this need, this article proposes a controlled data generation scheme, which is based upon the Monte Carlo simulation methodology with the addition of several controlling and evaluation tools to assess the condition of output data. By evaluating the convergence of the data condition indices, the proposed scheme is able to determine the optimal setups for the data generation process and subsequently avoid unnecessarily excessive data. The efficacy of this scheme is demonstrated via applications to a benchmark structure data in the field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents the field applications and validations for the controlled Monte Carlo data generation scheme. This scheme was previously derived to assist the Mahalanobis squared distance–based damage identification method to cope with data-shortage problems which often cause inadequate data multinormality and unreliable identification outcome. To do so, real-vibration datasets from two actual civil engineering structures with such data (and identification) problems are selected as the test objects which are then shown to be in need of enhancement to consolidate their conditions. By utilizing the robust probability measures of the data condition indices in controlled Monte Carlo data generation and statistical sensitivity analysis of the Mahalanobis squared distance computational system, well-conditioned synthetic data generated by an optimal controlled Monte Carlo data generation configurations can be unbiasedly evaluated against those generated by other set-ups and against the original data. The analysis results reconfirm that controlled Monte Carlo data generation is able to overcome the shortage of observations, improve the data multinormality and enhance the reliability of the Mahalanobis squared distance–based damage identification method particularly with respect to false-positive errors. The results also highlight the dynamic structure of controlled Monte Carlo data generation that makes this scheme well adaptive to any type of input data with any (original) distributional condition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Linked Data is the key paradigm of the Semantic Web, a new generation of the World Wide Web that promises to bring meaning (semantics) to data. A large number of both public and private organizations have published their data following the Linked Data principles, or have done so with data from other organizations. To this extent, since the generation and publication of Linked Data are intensive engineering processes that require high attention in order to achieve high quality, and since experience has shown that existing general guidelines are not always sufficient to be applied to every domain, this paper presents a set of guidelines for generating and publishing Linked Data in the context of energy consumption in buildings (one aspect of Building Information Models). These guidelines offer a comprehensive description of the tasks to perform, including a list of steps, tools that help in achieving the task, various alternatives for performing the task, and best practices and recommendations. Furthermore, this paper presents a complete example on the generation and publication of Linked Data about energy consumption in buildings, following the presented guidelines, in which the energy consumption data of council sites (e.g., buildings and lights) belonging to the Leeds City Council jurisdiction have been generated and published as Linked Data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The dynamic interaction between building systems and external climate is extremely complex, involving a large number of difficult-to-predict variables. In order to study the impact of global warming on the built environment, the use of building simulation techniques together with forecast weather data are often necessary. Since all building simulation programs require hourly meteorological input data for their thermal comfort and energy evaluation, the provision of suitable weather data becomes critical. Based on a review of the existing weather data generation models, this paper presents an effective method to generate approximate future hourly weather data suitable for the study of the impact of global warming. Depending on the level of information available for the prediction of future weather condition, it is shown that either the method of retaining to current level, constant offset method or diurnal modelling method may be used to generate the future hourly variation of an individual weather parameter. An example of the application of this method to the different global warming scenarios in Australia is presented. Since there is no reliable projection of possible change in air humidity, solar radiation or wind characters, as a first approximation, these parameters have been assumed to remain at the current level. A sensitivity test of their impact on the building energy performance shows that there is generally a good linear relationship between building cooling load and the changes of weather variables of solar radiation, relative humidity or wind speed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A large number of methods have been published that aim to evaluate various components of multi-view geometry systems. Most of these have focused on the feature extraction, description and matching stages (the visual front end), since geometry computation can be evaluated through simulation. Many data sets are constrained to small scale scenes or planar scenes that are not challenging to new algorithms, or require special equipment. This paper presents a method for automatically generating geometry ground truth and challenging test cases from high spatio-temporal resolution video. The objective of the system is to enable data collection at any physical scale, in any location and in various parts of the electromagnetic spectrum. The data generation process consists of collecting high resolution video, computing accurate sparse 3D reconstruction, video frame culling and down sampling, and test case selection. The evaluation process consists of applying a test 2-view geometry method to every test case and comparing the results to the ground truth. This system facilitates the evaluation of the whole geometry computation process or any part thereof against data compatible with a realistic application. A collection of example data sets and evaluations is included to demonstrate the range of applications of the proposed system.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

During the last decades there has been a global shift in forest management from a focus solely on timber management to ecosystem management that endorses all aspects of forest functions: ecological, economic and social. This has resulted in a shift in paradigm from sustained yield to sustained diversity of values, goods and benefits obtained at the same time, introducing new temporal and spatial scales into forest resource management. The purpose of the present dissertation was to develop methods that would enable spatial and temporal scales to be introduced into the storage, processing, access and utilization of forest resource data. The methods developed are based on a conceptual view of a forest as a hierarchically nested collection of objects that can have a dynamically changing set of attributes. The temporal aspect of the methods consists of lifetime management for the objects and their attributes and of a temporal succession linking the objects together. Development of the forest resource data processing method concentrated on the extensibility and configurability of the data content and model calculations, allowing for a diverse set of processing operations to be executed using the same framework. The contribution of this dissertation to the utilisation of multi-scale forest resource data lies in the development of a reference data generation method to support forest inventory methods in approaching single-tree resolution.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analyses, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the single-nucleotide polymorphisms (SNPs) reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, and -DQB1). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1092 1000G samples and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect and that allele frequencies are estimated with an error greater than ±0.1 at approximately 25% of the SNPs in HLA genes. We found a bias toward overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates and discuss the outcomes of including those sites in different kinds of analyses. Because the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

HLA-E is a non-classical Human Leucocyte Antigen class I gene with immunomodulatory properties. Whereas HLA-E expression usually occurs at low levels, it is widely distributed amongst human tissues, has the ability to bind self and non-self antigens and to interact with NK cells and T lymphocytes, being important for immunosurveillance and also for fighting against infections. HLA-E is usually the most conserved locus among all class I genes. However, most of the previous studies evaluating HLA-E variability sequenced only a few exons or genotyped known polymorphisms. Here we report a strategy to evaluate HLA-E variability by next-generation sequencing (NGS) that might be used to other HLA loci and present the HLA-E haplotype diversity considering the segment encoding the entire HLA-E mRNA (including 5'UTR, introns and the 3'UTR) in two African population samples, Susu from Guinea-Conakry and Lobi from Burkina Faso. Our results indicate that (a) the HLA-E gene is indeed conserved, encoding mainly two different protein molecules; (b) Africans do present several unknown HLA-E alleles presenting synonymous mutations; (c) the HLA-E 3'UTR is quite polymorphic and (d) haplotypes in the HLA-E 3'UTR are in close association with HLA-E coding alleles. NGS has proved to be an important tool on data generation for future studies evaluating variability in non-classical MHC genes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

High-throughput assays, such as yeast two-hybrid system, have generated a huge amount of protein-protein interaction (PPI) data in the past decade. This tremendously increases the need for developing reliable methods to systematically and automatically suggest protein functions and relationships between them. With the available PPI data, it is now possible to study the functions and relationships in the context of a large-scale network. To data, several network-based schemes have been provided to effectively annotate protein functions on a large scale. However, due to those inherent noises in high-throughput data generation, new methods and algorithms should be developed to increase the reliability of functional annotations. Previous work in a yeast PPI network (Samanta and Liang, 2003) has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional associations between proteins, and hence suggest their functions. One advantage of the work is that their algorithm is not sensitive to noises (false positives) in high-throughput PPI data. In this study, we improved their prediction scheme by developing a new algorithm and new methods which we applied on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting functionally associated proteins. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as independent and unbiased benchmarks to evaluate our algorithms and methods within the human PPI network. We showed that, compared with the previous work from Samanta and Liang, our algorithm and methods developed in this study improved the overall quality of functional inferences for human proteins. By applying the algorithms to the human PPI network, we obtained 4,233 significant functional associations among 1,754 proteins. Further comparisons of their KEGG and GO annotations allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made pathway analysis to identify several subclusters that are highly enriched in certain signaling pathways. Particularly, we performed a detailed analysis on a subcluster enriched in the transforming growth factor β signaling pathway (P<10-50) which is important in cell proliferation and tumorigenesis. Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotations in this post-genomic era.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Purpose – Linked data is gaining great interest in the cultural heritage domain as a new way for publishing, sharing and consuming data. The paper aims to provide a detailed method and MARiMbA a tool for publishing linked data out of library catalogues in the MARC 21 format, along with their application to the catalogue of the National Library of Spain in the datos.bne.es project. Design/methodology/approach – First, the background of the case study is introduced. Second, the method and process of its application are described. Third, each of the activities and tasks are defined and a discussion of their application to the case study is provided. Findings – The paper shows that the FRBR model can be applied to MARC 21 records following linked data best practices, librarians can successfully participate in the process of linked data generation following a systematic method, and data sources quality can be improved as a result of the process. Originality/value – The paper proposes a detailed method for publishing and linking linked data from MARC 21 records, provides practical examples, and discusses the main issues found in the application to a real case. Also, it proposes the integration of a data curation activity and the participation of librarians in the linked data generation process.