106 resultados para Inference.
Resumo:
BACKGROUND: Urothelial pathogenesis is a complex process driven by an underlying network of interconnected genes. The identification of novel genomic target regions and gene targets that drive urothelial carcinogenesis is crucial in order to improve our current limited understanding of urothelial cancer (UC) on the molecular level. The inference of genome-wide gene regulatory networks (GRN) from large-scale gene expression data provides a promising approach for a detailed investigation of the underlying network structure associated to urothelial carcinogenesis.
METHODS: In our study we inferred and compared three GRNs by the application of the BC3Net inference algorithm to large-scale transitional cell carcinoma gene expression data sets from Illumina RNAseq (179 samples), Illumina Bead arrays (165 samples) and Affymetrix Oligo microarrays (188 samples). We investigated the structural and functional properties of GRNs for the identification of molecular targets associated to urothelial cancer.
RESULTS: We found that the urothelial cancer (UC) GRNs show a significant enrichment of subnetworks that are associated with known cancer hallmarks including cell cycle, immune response, signaling, differentiation and translation. Interestingly, the most prominent subnetworks of co-located genes were found on chromosome regions 5q31.3 (RNAseq), 8q24.3 (Oligo) and 1q23.3 (Bead), which all represent known genomic regions frequently deregulated or aberated in urothelial cancer and other cancer types. Furthermore, the identified hub genes of the individual GRNs, e.g., HID1/DMC1 (tumor development), RNF17/TDRD4 (cancer antigen) and CYP4A11 (angiogenesis/ metastasis) are known cancer associated markers. The GRNs were highly dataset specific on the interaction level between individual genes, but showed large similarities on the biological function level represented by subnetworks. Remarkably, the RNAseq UC GRN showed twice the proportion of significant functional subnetworks. Based on our analysis of inferential and experimental networks the Bead UC GRN showed the lowest performance compared to the RNAseq and Oligo UC GRNs.
CONCLUSION: To our knowledge, this is the first study investigating genome-scale UC GRNs. RNAseq based gene expression data is the data platform of choice for a GRN inference. Our study offers new avenues for the identification of novel putative diagnostic targets for subsequent studies in bladder tumors.
Resumo:
This study explored the validity of using critical thinking tests to predict final psychology degree marks over and above that already predicted by traditional admission exams (A-levels). Participants were a longitudinal sample of 109 psychology students from a university in the United Kingdom. The outcome measures were: total degree marks; and end of year marks. The predictor measures were: university admission exam results (A-levels); critical thinking test scores (skills & dispositions); and non-verbal intelligence scores. Hierarchical regressions showed A-levels significantly predicted 10% of the final degree score and the 11-item measure of ‘Inference skills’ from the California Critical Thinking Skills Test significantly predicted an additional 6% of degree outcome variance. The findings from this study should inform decisions about the precise measurement constructs included in aptitude tests used in the higher education admission process.
Resumo:
This paper proposes an efficient learning mechanism to build fuzzy rule-based systems through the construction of sparse least-squares support vector machines (LS-SVMs). In addition to the significantly reduced computational complexity in model training, the resultant LS-SVM-based fuzzy system is sparser while offers satisfactory generalization capability over unseen data. It is well known that the LS-SVMs have their computational advantage over conventional SVMs in the model training process; however, the model sparseness is lost, which is the main drawback of LS-SVMs. This is an open problem for the LS-SVMs. To tackle the nonsparseness issue, a new regression alternative to the Lagrangian solution for the LS-SVM is first presented. A novel efficient learning mechanism is then proposed in this paper to extract a sparse set of support vectors for generating fuzzy IF-THEN rules. This novel mechanism works in a stepwise subset selection manner, including a forward expansion phase and a backward exclusion phase in each selection step. The implementation of the algorithm is computationally very efficient due to the introduction of a few key techniques to avoid the matrix inverse operations to accelerate the training process. The computational efficiency is also confirmed by detailed computational complexity analysis. As a result, the proposed approach is not only able to achieve the sparseness of the resultant LS-SVM-based fuzzy systems but significantly reduces the amount of computational effort in model training as well. Three experimental examples are presented to demonstrate the effectiveness and efficiency of the proposed learning mechanism and the sparseness of the obtained LS-SVM-based fuzzy systems, in comparison with other SVM-based learning techniques.
Resumo:
The assimilation of discrete higher fidelity data points with model predictions can be used to achieve a reduction in the uncertainty of the model input parameters which generate accurate predictions. The problem investigated here involves the prediction of limit-cycle oscillations using a High-Dimensional Harmonic Balance method (HDHB). The efficiency of the HDHB method is exploited to enable calibration of structural input parameters using a Bayesian inference technique. Markov-chain Monte Carlo is employed to sample the posterior distributions. Parameter estimation is carried out on both a pitch/plunge aerofoil and Goland wing configuration. In both cases significant refinement was achieved in the distribution of possible structural parameters allowing better predictions of their
true deterministic values.
Resumo:
The accumulation of biogenic greenhouse gases (methane, carbon dioxide) in organic sediments is an important factor in the redevelopment and risk management of many brownfield sites. Good practice with brownfield site characterization requires the identification of free-gas phases and pathways that allow its migration and release at the ground surface. Gas pockets trapped in the subsurface have contrasting properties with the surrounding porous media that favor their detection using geophysical methods. We have developed a case study in which pockets of gas were intercepted with multilevel monitoring wells, and their lateral continuity was monitored over time using resistivity. We have developed a novel interpretation procedure based on Archie’s law to evaluate changes in water and gas content with respect to a mean background medium. We have used induced polarization data to account for errors in applying Archie’s law due to the contribution of surface conductivity effects. Mosaics defined by changes in water saturation allowed the recognition of gas migration and groundwater infiltration routes and the association of gas and groundwater fluxes. The inference on flux patterns was analyzed by taking into account pressure measurements in trapped gas reservoirs and by metagenomic analysis of the microbiological content, which was retrieved from suspended sediments in groundwater sampled in multilevel monitoring wells. A conceptual model combining physical and microbiological subsurface processes suggested that biogas trapped at depth may have the ability to quickly travel to the surface.
Resumo:
Introduction: Individuals carrying pathogenic mutations in the BRCA1 and BRCA2 genes have a high lifetime risk of breast cancer. BRCA1 and BRCA2 are involved in DNA double-strand break repair, DNA alterations that can be caused by exposure to reactive oxygen species, a main source of which are mitochondria. Mitochondrial genome variations affect electron transport chain efficiency and reactive oxygen species production. Individuals with different mitochondrial haplogroups differ in their metabolism and sensitivity to oxidative stress. Variability in mitochondrial genetic background can alter reactive oxygen species production, leading to cancer risk. In the present study, we tested the hypothesis that mitochondrial haplogroups modify breast cancer risk in BRCA1/2 mutation carriers.
Methods: We genotyped 22,214 (11,421 affected, 10,793 unaffected) mutation carriers belonging to the Consortium of Investigators of Modifiers of BRCA1/2 for 129 mitochondrial polymorphisms using the iCOGS array. Haplogroup inference and association detection were performed using a phylogenetic approach. ALTree was applied to explore the reference mitochondrial evolutionary tree and detect subclades enriched in affected or unaffected individuals.
Results: We discovered that subclade T1a1 was depleted in affected BRCA2 mutation carriers compared with the rest of clade T (hazard ratio (HR) = 0.55; 95% confidence interval (CI), 0.34 to 0.88; P = 0.01). Compared with the most frequent haplogroup in the general population (that is, H and T clades), the T1a1 haplogroup has a HR of 0.62 (95% CI, 0.40 to 0.95; P = 0.03). We also identified three potential susceptibility loci, including G13708A/rs28359178, which has demonstrated an inverse association with familial breast cancer risk.
Conclusions: This study illustrates how original approaches such as the phylogeny-based method we used can empower classical molecular epidemiological studies aimed at identifying association or risk modification effects.
Resumo:
Background: Selection bias in HIV prevalence estimates occurs if non-participation in testing is correlated with HIV status. Longitudinal data suggests that individuals who know or suspect they are HIV positive are less likely to participate in testing in HIV surveys, in which case methods to correct for missing data which are based on imputation and observed characteristics will produce biased results. Methods: The identity of the HIV survey interviewer is typically associated with HIV testing participation, but is unlikely to be correlated with HIV status. Interviewer identity can thus be used as a selection variable allowing estimation of Heckman-type selection models. These models produce asymptotically unbiased HIV prevalence estimates, even when non-participation is correlated with unobserved characteristics, such as knowledge of HIV status. We introduce a new random effects method to these selection models which overcomes non-convergence caused by collinearity, small sample bias, and incorrect inference in existing approaches. Our method is easy to implement in standard statistical software, and allows the construction of bootstrapped standard errors which adjust for the fact that the relationship between testing and HIV status is uncertain and needs to be estimated. Results: Using nationally representative data from the Demographic and Health Surveys, we illustrate our approach with new point estimates and confidence intervals (CI) for HIV prevalence among men in Ghana (2003) and Zambia (2007). In Ghana, we find little evidence of selection bias as our selection model gives an HIV prevalence estimate of 1.4% (95% CI 1.2% – 1.6%), compared to 1.6% among those with a valid HIV test. In Zambia, our selection model gives an HIV prevalence estimate of 16.3% (95% CI 11.0% - 18.4%), compared to 12.1% among those with a valid HIV test. Therefore, those who decline to test in Zambia are found to be more likely to be HIV positive. Conclusions: Our approach corrects for selection bias in HIV prevalence estimates, is possible to implement even when HIV prevalence or non-participation is very high or very low, and provides a practical solution to account for both sampling and parameter uncertainty in the estimation of confidence intervals. The wide confidence intervals estimated in an example with high HIV prevalence indicate that it is difficult to correct statistically for the bias that may occur when a large proportion of people refuse to test.
Resumo:
The ejected mass distribution of Type Ia supernovae (SNe Ia) directly probes progenitor evolutionary history and explosion mechanisms, with implications for their use as cosmological probes. Although the Chandrasekhar mass is a natural mass scale for the explosion of white dwarfs as SNe Ia, models allowing SNe Ia to explode at other masses have attracted much recent attention. Using an empirical relation between the ejected mass and the light-curve width, we derive ejected masses Mej and 56Ni masses MNi for a sample of 337 SNe Ia with redshifts z <0.7 used in recent cosmological analyses. We use hierarchical Bayesian inference to reconstruct the joint Mej-MNi distribution, accounting for measurement errors. The inferred marginal distribution of Mej has a long tail towards sub-Chandrasekhar masses, but cuts off sharply above 1.4 M⊙. Our results imply that 25-50 per cent of normal SNe Ia are inconsistent with Chandrasekhar-mass explosions, with almost all of these being sub-Chandrasekhar mass; super-Chandrasekhar-mass explosions make up no more than 1 per cent of all spectroscopically normal SNe Ia. We interpret the SN Ia width-luminosity relation as an underlying relation between Mej and MNi, and show that the inferred relation is not naturally explained by the predictions of any single known explosion mechanism.
Resumo:
Introduction
Mild cognitive impairment (MCI) has clinical value in its ability to predict later dementia. A better understanding of cognitive profiles can further help delineate who is most at risk of conversion to dementia. We aimed to (1) examine to what extent the usual MCI subtyping using core criteria corresponds to empirically defined clusters of patients (latent profile analysis [LPA] of continuous neuropsychological data) and (2) compare the two methods of subtyping memory clinic participants in their prediction of conversion to dementia.
Methods
Memory clinic participants (MCI, n = 139) and age-matched controls (n = 98) were recruited. Participants had a full cognitive assessment, and results were grouped (1) according to traditional MCI subtypes and (2) using LPA. MCI participants were followed over approximately 2 years after their initial assessment to monitor for conversion to dementia.
Results
Groups were well matched for age and education. Controls performed significantly better than MCI participants on all cognitive measures. With the traditional analysis, most MCI participants were in the amnestic multidomain subgroup (46.8%) and this group was most at risk of conversion to dementia (63%). From the LPA, a three-profile solution fit the data best. Profile 3 was the largest group (40.3%), the most cognitively impaired, and most at risk of conversion to dementia (68% of the group).
Discussion
LPA provides a useful adjunct in delineating MCI participants most at risk of conversion to dementia and adds confidence to standard categories of clinical inference.
Resumo:
Generative algorithms for random graphs have yielded insights into the structure and evolution of real-world networks. Most networks exhibit a well-known set of properties, such as heavy-tailed degree distributions, clustering and community formation. Usually, random graph models consider only structural information, but many real-world networks also have labelled vertices and weighted edges. In this paper, we present a generative model for random graphs with discrete vertex labels and numeric edge weights. The weights are represented as a set of Beta Mixture Models (BMMs) with an arbitrary number of mixtures, which are learned from real-world networks. We propose a Bayesian Variational Inference (VI) approach, which yields an accurate estimation while keeping computation times tractable. We compare our approach to state-of-the-art random labelled graph generators and an earlier approach based on Gaussian Mixture Models (GMMs). Our results allow us to draw conclusions about the contribution of vertex labels and edge weights to graph structure.
Resumo:
The spectral sensitivity of visual pigments in vertebrate eyes is optimized for specific light conditions. One of such pigments, rhodopsin (RH1), mediates dim-light vision. Amino acid replacements at tuning sites may alter spectral sensitivity, providing a mechanism to adapt to ambient light conditions and depth of habitat in fish. Here we present a first investigation of RH1 gene polymorphism among two ecotypes of Atlantic cod in Icelandic waters, which experience divergent light environments throughout the year due to alternative foraging behaviour. We identified one synonymous single nucleotide polymorphism (SNP) in the RH1 protein coding region and one in the 3' untranslated region (3'-UTR) that are strongly divergent between these two ecotypes. Moreover, these polymorphisms coincided with the well-known panthophysin (Pan I) polymorphism that differentiates coastal and frontal (migratory) populations of Atlantic cod. While the RH1 SNPs do not provide direct inference for a specific molecular mechanism, their association with this dim-sensitive pigment indicates the involvement of the visual system in local adaptation of Atlantic cod.
Resumo:
Hidden Markov models (HMMs) are widely used probabilistic models of sequential data. As with other probabilistic models, they require the specification of local conditional probability distributions, whose assessment can be too difficult and error-prone, especially when data are scarce or costly to acquire. The imprecise HMM (iHMM) generalizes HMMs by allowing the quantification to be done by sets of, instead of single, probability distributions. iHMMs have the ability to suspend judgment when there is not enough statistical evidence, and can serve as a sensitivity analysis tool for standard non-stationary HMMs. In this paper, we consider iHMMs under the strong independence interpretation, for which we develop efficient inference algorithms to address standard HMM usage such as the computation of likelihoods and most probable explanations, as well as performing filtering and predictive inference. Experiments with real data show that iHMMs produce more reliable inferences without compromising the computational efficiency.
Resumo:
Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods [12, 14] tackle the problem by using k-trees to learn the optimal Bayesian network with tree-width up to k. In this paper, we propose a sampling method to efficiently find representative k-trees by introducing an Informative score function to characterize the quality of a k-tree. The proposed algorithm can efficiently learn a Bayesian network with tree-width at most k. Experiment results indicate that our approach is comparable with exact methods, but is much more computationally efficient.
Resumo:
Bounding the tree-width of a Bayesian network can reduce the chance of overfitting, and allows exact inference to be performed efficiently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to find k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.
Resumo:
The goal of this contribution is to discuss local computation in credal networks — graphical models that can represent imprecise and indeterminate probability values. We analyze the inference problem in credal networks, discuss how inference algorithms can benefit from local computation, and suggest that local computation can be particularly important in approximate inference algorithms.