13 resultados para integrating data
Resumo:
In this paper, we present a Bayesian approach to estimate a chromosome and a disorder network from the Online Mendelian Inheritance in Man (OMIM) database. In contrast to other approaches, we obtain statistic rather than deterministic networks enabling a parametric control in the uncertainty of the underlying disorder-disease gene associations contained in the OMIM, on which the networks are based. From a structural investigation of the chromosome network, we identify three chromosome subgroups that reflect architectural differences in chromosome-disorder associations that are predictively exploitable for a functional analysis of diseases.
Resumo:
Background: Popular approaches in human tissue-based biomarker discovery include tissue microarrays (TMAs) and DNA Microarrays (DMAs) for protein and gene expression profiling respectively. The data generated by these analytic platforms, together with associated image, clinical and pathological data currently reside on widely different information platforms, making searching and cross-platform analysis difficult. Consequently, there is a strong need to develop a single coherent database capable of correlating all available data types.
Method: This study presents TMAX, a database system to facilitate biomarker discovery tasks. TMAX organises a variety of biomarker discovery-related data into the database. Both TMA and DMA experimental data are integrated in TMAX and connected through common DNA/protein biomarkers. Patient clinical data (including tissue pathological data), computer assisted tissue image and associated analytic data are also included in TMAX to enable the truly high throughput processing of ultra-large digital slides for both TMAs and whole slide tissue digital slides. A comprehensive web front-end was built with embedded XML parser software and predefined SQL queries to enable rapid data exchange in the form of standard XML files.
Results & Conclusion: TMAX represents one of the first attempts to integrate TMA data with public gene expression experiment data. Experiments suggest that TMAX is robust in managing large quantities of data from different sources (clinical, TMA, DMA and image analysis). Its web front-end is user friendly, easy to use, and most importantly allows the rapid and easy data exchange of biomarker discovery related data. In conclusion, TMAX is a robust biomarker discovery data repository and research tool, which opens up the opportunities for biomarker discovery and further integromics research.
Resumo:
When studying heterogeneous aquifer systems, especially at regional scale, a degree of generalization is anticipated. This can be due to sparse sampling regimes, complex depositional environments or lack of accessibility to measure the subsurface. This can lead to an inaccurate conceptualization which can be detrimental when applied to groundwater flow models. It is important that numerical models are based on observed and accurate geological information and do not rely on the distribution of artificial aquifer properties. This can still be problematic as data will be modelled at a different scale to which it was collected. It is proposed here that integrating geophysics and upscaling techniques can assist in a more realistic and deterministic groundwater flow model. In this study, the sedimentary aquifer of the Lagan Valley in Northern Ireland is chosen due to intruding sub-vertical dolerite dykes. These dykes are of a lower permeability than the sandstone aquifer. The use of airborne magnetics allows the delineation of heterogeneities, confirmed by field analysis. Permeability measured at the field scale is then upscaled to different levels using a correlation with the geophysical data, creating equivalent parameters that can be directly imported into numerical groundwater flow models. These parameters include directional equivalent permeabilities and anisotropy. Several stages of upscaling are modelled in finite element. Initial modelling is providing promising results, especially at the intermediate scale, suggesting an accurate distribution of aquifer properties. This deterministic based methodology is being expanded to include stochastic methods of obtaining heterogeneity location based on airborne geophysical data. This is through the Direct Sample method of Multiple-Point Statistics (MPS). This method uses the magnetics as a training image to computationally determine a probabilistic occurrence of heterogeneity. There is also a need to apply the method to alternate geological contexts where the heterogeneity is of a higher permeability than the host rock.
Integrating Multiple Point Statistics with Aerial Geophysical Data to assist Groundwater Flow Models
Resumo:
The process of accounting for heterogeneity has made significant advances in statistical research, primarily in the framework of stochastic analysis and the development of multiple-point statistics (MPS). Among MPS techniques, the direct sampling (DS) method is tested to determine its ability to delineate heterogeneity from aerial magnetics data in a regional sandstone aquifer intruded by low-permeability volcanic dykes in Northern Ireland, UK. The use of two two-dimensional bivariate training images aids in creating spatial probability distributions of heterogeneities of hydrogeological interest, despite relatively ‘noisy’ magnetics data (i.e. including hydrogeologically irrelevant urban noise and regional geologic effects). These distributions are incorporated into a hierarchy system where previously published density function and upscaling methods are applied to derive regional distributions of equivalent hydraulic conductivity tensor K. Several K models, as determined by several stochastic realisations of MPS dyke locations, are computed within groundwater flow models and evaluated by comparing modelled heads with field observations. Results show a significant improvement in model calibration when compared to a simplistic homogeneous and isotropic aquifer model that does not account for the dyke occurrence evidenced by airborne magnetic data. The best model is obtained when normal and reverse polarity dykes are computed separately within MPS simulations and when a probability threshold of 0.7 is applied. The presented stochastic approach also provides improvement when compared to a previously published deterministic anisotropic model based on the unprocessed (i.e. noisy) airborne magnetics. This demonstrates the potential of coupling MPS to airborne geophysical data for regional groundwater modelling.
Resumo:
The census and similar sources of data have been published for two centuries so the information that they contain should provide an unparalleled insight into the changing population of Britain over this time period. To date, however, the seemingly trivial problem of changes in boundaries has seriously hampered the use of these sources as they make it impossible to create long run time series of spatially detailed data. The paper reviews methodologies that attempt to resolve this problem by using geographical information systems and areal inter-polation to allow the reallocation of data from one set of administrative units onto another. This makes it possible to examine change over time for a standard geography and thus it becomes possible to unlock the spatial detail and the temporal depth that are held in the census and in related sources.
Resumo:
The advent of next generation sequencing technologies (NGS) has expanded the area of genomic research, offering high coverage and increased sensitivity over older microarray platforms. Although the current cost of next generation sequencing is still exceeding that of microarray approaches, the rapid advances in NGS will likely make it the platform of choice for future research in differential gene expression. Connectivity mapping is a procedure for examining the connections among diseases, genes and drugs by differential gene expression initially based on microarray technology, with which a large collection of compound-induced reference gene expression profiles have been accumulated. In this work, we aim to test the feasibility of incorporating NGS RNA-Seq data into the current connectivity mapping framework by utilizing the microarray based reference profiles and the construction of a differentially expressed gene signature from a NGS dataset. This would allow for the establishment of connections between the NGS gene signature and those microarray reference profiles, alleviating the associated incurring cost of re-creating drug profiles with NGS technology. We examined the connectivity mapping approach on a publicly available NGS dataset with androgen stimulation of LNCaP cells in order to extract candidate compounds that could inhibit the proliferative phenotype of LNCaP cells and to elucidate their potential in a laboratory setting. In addition, we also analyzed an independent microarray dataset of similar experimental settings. We found a high level of concordance between the top compounds identified using the gene signatures from the two datasets. The nicotine derivative cotinine was returned as the top candidate among the overlapping compounds with potential to suppress this proliferative phenotype. Subsequent lab experiments validated this connectivity mapping hit, showing that cotinine inhibits cell proliferation in an androgen dependent manner. Thus the results in this study suggest a promising prospect of integrating NGS data with connectivity mapping. © 2013 McArt et al.
Resumo:
Soil carbon stores are a major component of the annual returns required by EU governments to the Intergovernmental Panel on Climate Change. Peat has a high proportion of soil carbon due to the relatively high carbon density of peat and organic-rich soils. For this reason it has become increasingly important to measure and model soil carbon stores and changes in peat stocks to facilitate the management of carbon changes over time. The approach investigated in this research evaluates the use of airborne geophysical (radiometric) data to estimate peat thickness using the attenuation of bedrock geology radioactivity by superficial peat cover. Remotely sensed radiometric data are validated with ground peat depth measurements combined with non-invasive geophysical surveys. Two field-based case studies exemplify and validate the results. Variography and kriging are used to predict peat thickness from point measurements of peat depth and airborne radiometric data and provide an estimate of uncertainty in the predictions. Cokriging, by assessing the degree of spatial correlation between recent remote sensed geophysical monitoring and previous peat depth models, is used to examine changes in peat stocks over time. The significance of the coregionalisation is that the spatial cross correlation between the remote and ground based data can be used to update the model of peat depth. The result is that by integrating remotely sensed data with ground geophysics, the need is reduced for extensive ground-based monitoring and invasive peat depth measurements. The overall goal is to provide robust estimates of peat thickness to improve estimates of carbon stocks. The implications from the research have a broader significance that promotes a reduction in the need for damaging onsite peat thickness measurement and an increase in the use of remote sensed data for carbon stock estimations.
Resumo:
The majority of the kinetic models employed in catalytic after-treatment of exhaust emissions use a global kinetic approach owing to the simplicity because one expression can account for all the steps in a reaction. The major drawback of this approach is the limited predictive capabilities of the models. The intrinsic kinetic approach offers much more information about the processes occurring within the catalytic converter; however, it is significantly more complex and time consuming to develop. In the present work, a methodology which allows accessing a model that combines the simplicity of the global kinetic approach and the accuracy of the intrinsic kinetic approach is reported. To assess the performance of this new approach, the oxidation of carbon monoxide in the presence of nitric oxide as well as a driving cycle was investigated. The modelling of carbon monoxide oxidation with oxygen which utilised the intrinsic kinetic approach with the global kinetic approach was used for the carbon monoxide + nitric oxide reaction (and all remaining reactions for the driving cycle). The comparison of the model results for the dual intrinsic + global kinetic approach with the experimental data obtained for both the reactor and the driving cycle indicate that the dual approach is promising with results significantly better than those obtained with only the global kinetics approach.
Resumo:
Identifying groundwater contributions to baseflowforms an essential part of surfacewater body characterisation. The Gortinlieve catchment (5 km2) comprises a headwater stream network of the Carrigans River, itself a tributary of the River Foyle, NW Ireland. The bedrock comprises poorly productive metasediments that are characterised by fracture porosity. We present the findings of a multi-disciplinary study that integrates new hydrochemical and mineralogical investigations with existing hydraulic, geophysical and structural data to identify the scales of groundwater flow and the nature of groundwater/bedrock interaction (chemical denudation). At the catchment scale, the development of deep weathering profiles is controlled by NE-SW regional scale fracture zones associated with mountain building during the Grampian orogeny. In-situ chemical denudation of mineral phases is controlled by micro- to meso-scale fractures related to Alpine compression during Palaeocene to Oligocene times. The alteration of primary muscovite, chlorite (clinochlore) and albite along the surfaces of these small-scale fractures has resulted in the precipitation of illite, montmorillonite and illite/montmorillonite clay admixtures. The interconnected but discontinuous nature of these small-scale structures highlights the role of larger scale faults and fissures in the supply and transportation of weathering solutions to/from the sites of mineral weathering. The dissolution of primarily mineral phases releases the major ions Mg, Ca and HCO3 that are shown to subsequently formthe chemical makeup of groundwaters. Borehole groundwater and stream baseflow hydrochemical data are used to constrain the depths of groundwater flow pathways influencing the chemistry of surface waters throughout the stream profile. The results show that it is predominantly the lower part of the catchment, which receives inputs from catchment/regional scale groundwater flow, that is found to contribute to the maintenance of annual baseflow levels. This study identifies the importance
of deep groundwater in maintaining annual baseflow levels in poorly productive bedrock systems.
Resumo:
Master data management (MDM) integrates data from multiple
structured data sources and builds a consolidated 360-
degree view of business entities such as customers and products.
Today’s MDM systems are not prepared to integrate
information from unstructured data sources, such as news
reports, emails, call-center transcripts, and chat logs. However,
those unstructured data sources may contain valuable
information about the same entities known to MDM from
the structured data sources. Integrating information from
unstructured data into MDM is challenging as textual references
to existing MDM entities are often incomplete and
imprecise and the additional entity information extracted
from text should not impact the trustworthiness of MDM
data.
In this paper, we present an architecture for making MDM
text-aware and showcase its implementation as IBM InfoSphere
MDM Extension for Unstructured Text Correlation,
an add-on to IBM InfoSphere Master Data Management
Standard Edition. We highlight how MDM benefits from
additional evidence found in documents when doing entity
resolution and relationship discovery. We experimentally
demonstrate the feasibility of integrating information from
unstructured data sources into MDM.
Resumo:
Aggression occurs when individuals compete over limiting resources. While theoretical studies have long placed a strong emphasis on context-specificity of aggression, there is increasing recognition that consistent behavioural differences exist among individuals, and that aggressiveness may be an important component of individual personality. Though empirical studies tend to focus on one aspect or the other, we suggest there is merit in modelling both within- and among-individual variation in agonistic behaviour simultaneously. Here, we demonstrate how this can be achieved using multivariate linear mixed effect models. Using data from repeated mirror trials and dyadic interactions of male green swordtails, Xiphophorus helleri, we show repeatable components of (co)variation in a suite of agonistic behaviour that is broadly consistent with a major axis of variation in aggressiveness. We also show that observed focal behaviour is dependent on opponent effects, which can themselves be repeatable but were more generally found to be context specific. In particular, our models show that within-individual variation in agonistic behaviour is explained, at least in part, by the relative size of a live opponent as predicted by contest theory. Finally, we suggest several additional applications of the multivariate models demonstrated here. These include testing the recently queried functional equivalence of alternative experimental approaches, (e.g., mirror trials, dyadic interaction tests) for assaying individual aggressiveness. © 2011 Wilson et al.
Resumo:
Repositories containing high quality human biospecimens linked with robust and relevant clinical and pathological information are required for the discovery and validation of biomarkers for disease diagnosis, progression and response to treatment. Current molecular based discovery projects using either low or high throughput technologies rely heavily on ready access to such sample collections. It is imperative that modern biobanks align with molecular diagnostic pathology practices not only to provide the type of samples needed for discovery projects but also to ensure requirements for ongoing sample collections and the future needs of researchers are adequately addressed. Biobanks within comprehensive molecular pathology programmes are perfectly positioned to offer more than just tumour derived biospecimens; for example, they have the ability to facilitate researchers gaining access to sample metadata such as digitised scans of tissue samples annotated prior to macrodissection for molecular diagnostics or pseudoanonymised clinical outcome data or research results retrieved from other users utilising the same or overlapping cohorts of samples. Furthermore, biobanks can work with molecular diagnostic laboratories to develop standardized methodologies for the acquisition and storage of samples required for new approaches to research such as ‘liquid biopsies’ which will ultimately feed into the test validations required in large prospective clinical studies in order to implement liquid biopsy approaches for routine clinical practice. We draw on our experience in Northern Ireland to discuss how this harmonised approach of biobanks working synergistically with molecular pathology programmes is key for the future success of precision medicine.