894 resultados para data integration


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The spatial error structure of daily precipitation derived from the latest version 7 (v7) tropical rainfall measuring mission (TRMM) level 2 data products are studied through comparison with the Asian precipitation highly resolved observational data integration toward evaluation of the water resources (APHRODITE) data over a subtropical region of the Indian subcontinent for the seasonal rainfall over 6 years from June 2002 to September 2007. The data products examined include v7 data from the TRMM radiometer Microwave Imager (TMI) and radar precipitation radar (PR), namely, 2A12, 2A25, and 2B31 (combined data from PR and TMI). The spatial distribution of uncertainty from these data products were quantified based on performance metrics derived from the contingency table. For the seasonal daily precipitation over a subtropical basin in India, the data product of 2A12 showed greater skill in detecting and quantifying the volume of rainfall when compared with the 2A25 and 2B31 data products. Error characterization using various error models revealed that random errors from multiplicative error models were homoscedastic and that they better represented rainfall estimates from 2A12 algorithm. Error decomposition techniques performed to disentangle systematic and random errors verify that the multiplicative error model representing rainfall from 2A12 algorithm successfully estimated a greater percentage of systematic error than 2A25 or 2B31 algorithms. Results verify that although the radiometer derived 2A12 rainfall data is known to suffer from many sources of uncertainties, spatial analysis over the case study region of India testifies that the 2A12 rainfall estimates are in a very good agreement with the reference estimates for the data period considered.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Architecture, Engineering, Construction and Facilities Management (AEC/FM) industry is rapidly becoming a multidisciplinary, multinational and multi-billion dollar economy, involving large numbers of actors working concurrently at different locations and using heterogeneous software and hardware technologies. Since the beginning of the last decade, a great deal of effort has been spent within the field of construction IT in order to integrate data and information from most computer tools used to carry out engineering projects. For this purpose, a number of integration models have been developed, like web-centric systems and construction project modeling, a useful approach in representing construction projects and integrating data from various civil engineering applications. In the modern, distributed and dynamic construction environment it is important to retrieve and exchange information from different sources and in different data formats in order to improve the processes supported by these systems. Previous research demonstrated that a major hurdle in AEC/FM data integration in such systems is caused by its variety of data types and that a significant part of the data is stored in semi-structured or unstructured formats. Therefore, new integrative approaches are needed to handle non-structured data types like images and text files. This research is focused on the integration of construction site images. These images are a significant part of the construction documentation with thousands stored in site photographs logs of large scale projects. However, locating and identifying such data needed for the important decision making processes is a very hard and time-consuming task, while so far, there are no automated methods for associating them with other related objects. Therefore, automated methods for the integration of construction images are important for construction information management. During this research, processes for retrieval, classification, and integration of construction images in AEC/FM model based systems have been explored. Specifically, a combination of techniques from the areas of image and video processing, computer vision, information retrieval, statistics and content-based image and video retrieval have been deployed in order to develop a methodology for the retrieval of related construction site image data from components of a project model. This method has been tested on available construction site images from a variety of sources like past and current building construction and transportation projects and is able to automatically classify, store, integrate and retrieve image data files in inter-organizational systems so as to allow their usage in project management related tasks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Identifying protein-protein interactions is crucial for understanding cellular functions. Genomic data provides opportunities and challenges in identifying these interactions. We uncover the rules for predicting protein-protein interactions using a frequent pattern tree (FPT) approach modified to generate a minimum set of rules (mFPT), with rule attributes constructed from the interaction features of the yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regressions under various statistical measures. Our study indicates that mFPT outranks other methods in predicting the protein-protein interactions for the database used. We predict a new protein-protein interaction complex whose biological function is related to premRNA splicing and new protein-protein interactions within existing complexes based on the rules generated.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.

We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.

We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.

Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.

This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A rapidly increasing number of Web databases are now become accessible via
their HTML form-based query interfaces. Query result pages are dynamically generated
in response to user queries, which encode structured data and are displayed for human
use. Query result pages usually contain other types of information in addition to query
results, e.g., advertisements, navigation bar etc. The problem of extracting structured data
from query result pages is critical for web data integration applications, such as comparison
shopping, meta-search engines etc, and has been intensively studied. A number of approaches
have been proposed. As the structures of Web pages become more and more complex, the
existing approaches start to fail, and most of them do not remove irrelevant contents which
may a®ect the accuracy of data record extraction. We propose an automated approach for
Web data extraction. First, it makes use of visual features and query terms to identify data
sections and extracts data records in these sections. We also represent several content and
visual features of visual blocks in a data section, and use them to ¯lter out noisy blocks.
Second, it measures similarity between data items in di®erent data records based on their
visual and content features, and aligns them into di®erent groups so that the data in the
same group have the same semantics. The results of our experiments with a large set of
Web query result pages in di®erent domains show that our proposed approaches are highly
e®ective.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background Major Depressive Disorder (MDD) is among the most prevalent and disabling medical conditions worldwide. Identification of clinical and biological markers (“biomarkers”) of treatment response could personalize clinical decisions and lead to better outcomes. This paper describes the aims, design, and methods of a discovery study of biomarkers in antidepressant treatment response, conducted by the Canadian Biomarker Integration Network in Depression (CAN-BIND). The CAN-BIND research program investigates and identifies biomarkers that help to predict outcomes in patients with MDD treated with antidepressant medication. The primary objective of this initial study (known as CAN-BIND-1) is to identify individual and integrated neuroimaging, electrophysiological, molecular, and clinical predictors of response to sequential antidepressant monotherapy and adjunctive therapy in MDD. Methods CAN-BIND-1 is a multisite initiative involving 6 academic health centres working collaboratively with other universities and research centres. In the 16-week protocol, patients with MDD are treated with a first-line antidepressant (escitalopram 10–20 mg/d) that, if clinically warranted after eight weeks, is augmented with an evidence-based, add-on medication (aripiprazole 2–10 mg/d). Comprehensive datasets are obtained using clinical rating scales; behavioural, dimensional, and functioning/quality of life measures; neurocognitive testing; genomic, genetic, and proteomic profiling from blood samples; combined structural and functional magnetic resonance imaging; and electroencephalography. De-identified data from all sites are aggregated within a secure neuroinformatics platform for data integration, management, storage, and analyses. Statistical analyses will include multivariate and machine-learning techniques to identify predictors, moderators, and mediators of treatment response. Discussion From June 2013 to February 2015, a cohort of 134 participants (85 outpatients with MDD and 49 healthy participants) has been evaluated at baseline. The clinical characteristics of this cohort are similar to other studies of MDD. Recruitment at all sites is ongoing to a target sample of 290 participants. CAN-BIND will identify biomarkers of treatment response in MDD through extensive clinical, molecular, and imaging assessments, in order to improve treatment practice and clinical outcomes. It will also create an innovative, robust platform and database for future research.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this research the 3DVAR data assimilation scheme is implemented in the numerical model DIVAST in order to optimize the performance of the numerical model by selecting an appropriate turbulence scheme and tuning its parameters. Two turbulence closure schemes: the Prandtl mixing length model and the two-equation k-ε model were incorporated into DIVAST and examined with respect to their universality of application, complexity of solutions, computational efficiency and numerical stability. A square harbour with one symmetrical entrance subject to tide-induced flows was selected to investigate the structure of turbulent flows. The experimental part of the research was conducted in a tidal basin. A significant advantage of such laboratory experiment is a fully controlled environment where domain setup and forcing are user-defined. The research shows that the Prandtl mixing length model and the two-equation k-ε model, with default parameterization predefined according to literature recommendations, overestimate eddy viscosity which in turn results in a significant underestimation of velocity magnitudes in the harbour. The data assimilation of the model-predicted velocity and laboratory observations significantly improves model predictions for both turbulence models by adjusting modelled flows in the harbour to match de-errored observations. 3DVAR allows also to identify and quantify shortcomings of the numerical model. Such comprehensive analysis gives an optimal solution based on which numerical model parameters can be estimated. The process of turbulence model optimization by reparameterization and tuning towards optimal state led to new constants that may be potentially applied to complex turbulent flows, such as rapidly developing flows or recirculating flows.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

P>In livestock genetic resource conservation, decision making about conservation priorities is based on the simultaneous analysis of several different criteria that may contribute to long-term sustainable breeding conditions, such as genetic and demographic characteristics, environmental conditions, and role of the breed in the local or regional economy. Here we address methods to integrate different data sets and highlight problems related to interdisciplinary comparisons. Data integration is based on the use of geographic coordinates and Geographic Information Systems (GIS). In addition to technical problems related to projection systems, GIS have to face the challenging issue of the non homogeneous scale of their data sets. We give examples of the successful use of GIS for data integration and examine the risk of obtaining biased results when integrating datasets that have been captured at different scales.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The present study introduces a multi-agent architecture designed for doing automation process of data integration and intelligent data analysis. Different from other approaches the multi-agent architecture was designed using a multi-agent based methodology. Tropos, an agent based methodology was used for design. Based on the proposed architecture, we describe a Web based application where the agents are responsible to analyse petroleum well drilling data to identify possible abnormalities occurrence. The intelligent data analysis methods used was the Neural Network.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a method for indirect orientation of aerial images using ground control lines extracted from airborne Laser system (ALS) data. This data integration strategy has shown good potential in the automation of photogrammetric tasks, including the indirect orientation of images. The most important characteristic of the proposed approach is that the exterior orientation parameters (EOP) of a single or multiple images can be automatically computed with a space resection procedure from data derived from different sensors. The suggested method works as follows. Firstly, the straight lines are automatically extracted in the digital aerial image (s) and in the intensity image derived from an ALS data-set (S). Then, correspondence between s and S is automatically determined. A line-based coplanarity model that establishes the relationship between straight lines in the object and in the image space is used to estimate the EOP with the iterated extended Kalman filtering (IEKF). Implementation and testing of the method have employed data from different sensors. Experiments were conducted to assess the proposed method and the results obtained showed that the estimation of the EOP is function of ALS positional accuracy.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Several countries have acquired, over the past decades, large amounts of area covering Airborne Electromagnetic data. Contribution of airborne geophysics has dramatically increased for both groundwater resource mapping and management proving how those systems are appropriate for large-scale and efficient groundwater surveying. We start with processing and inversion of two AEM dataset from two different systems collected over the Spiritwood Valley Aquifer area, Manitoba, Canada respectively, the AeroTEM III (commissioned by the Geological Survey of Canada in 2010) and the “Full waveform VTEM” dataset, collected and tested over the same survey area, during the fall 2011. We demonstrate that in the presence of multiple datasets, either AEM and ground data, due processing, inversion, post-processing, data integration and data calibration is the proper approach capable of providing reliable and consistent resistivity models. Our approach can be of interest to many end users, ranging from Geological Surveys, Universities to Private Companies, which are often proprietary of large geophysical databases to be interpreted for geological and\or hydrogeological purposes. In this study we deeply investigate the role of integration of several complimentary types of geophysical data collected over the same survey area. We show that data integration can improve inversions, reduce ambiguity and deliver high resolution results. We further attempt to use the final, most reliable output resistivity models as a solid basis for building a knowledge-driven 3D geological voxel-based model. A voxel approach allows a quantitative understanding of the hydrogeological setting of the area, and it can be further used to estimate the aquifers volumes (i.e. potential amount of groundwater resources) as well as hydrogeological flow model prediction. In addition, we investigated the impact of an AEM dataset towards hydrogeological mapping and 3D hydrogeological modeling, comparing it to having only a ground based TEM dataset and\or to having only boreholes data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The current state of health and biomedicine includes an enormity of heterogeneous data ‘silos’, collected for different purposes and represented differently, that are presently impossible to share or analyze in toto. The greatest challenge for large-scale and meaningful analyses of health-related data is to achieve a uniform data representation for data extracted from heterogeneous source representations. Based upon an analysis and categorization of heterogeneities, a process for achieving comparable data content by using a uniform terminological representation is developed. This process addresses the types of representational heterogeneities that commonly arise in healthcare data integration problems. Specifically, this process uses a reference terminology, and associated "maps" to transform heterogeneous data to a standard representation for comparability and secondary use. The capture of quality and precision of the “maps” between local terms and reference terminology concepts enhances the meaning of the aggregated data, empowering end users with better-informed queries for subsequent analyses. A data integration case study in the domain of pediatric asthma illustrates the development and use of a reference terminology for creating comparable data from heterogeneous source representations. The contribution of this research is a generalized process for the integration of data from heterogeneous source representations, and this process can be applied and extended to other problems where heterogeneous data needs to be merged.