917 resultados para INFERENCE
Resumo:
Credal nets are probabilistic graphical models which extend Bayesian nets to cope with sets of distributions. This feature makes the model particularly suited for the implementation of classifiers and knowledge-based systems. When working with sets of (instead of single) probability distributions, the identification of the optimal option can be based on different criteria, some of them eventually leading to multiple choices. Yet, most of the inference algorithms for credal nets are designed to compute only the bounds of the posterior probabilities. This prevents some of the existing criteria from being used. To overcome this limitation, we present two simple transformations for credal nets which make it possible to compute decisions based on the maximality and E-admissibility criteria without any modification in the inference algorithms. We also prove that these decision problems have the same complexity of standard inference, being NP^PP-hard for general credal nets and NP-hard for polytrees.
Resumo:
This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) log-likelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters.
Resumo:
This paper investigates a representation language with flexibility inspired by probabilistic logic and compactness inspired by relational Bayesian networks. The goal is to handle propositional and first-order constructs together with precise, imprecise, indeterminate and qualitative probabilistic assessments. The paper shows how this can be achieved through the theory of credal networks. New exact and approximate inference algorithms based on multilinear programming and iterated/loopy propagation of interval probabilities are presented; their superior performance, compared to existing ones, is shown empirically.
Resumo:
Enhanced Indispensability Arguments (EIA) claim that Scientific Realists are committed to the existence of mathematical entities due to their reliance on Inference to the Best Explana- tion (IBE). Our central question concerns this purported parity of reasoning: do people who defend the EIA make an appropriate use of the resources of Scientific Realism (in particular, IBE) to achieve platonism? (§2) We argue that just because a variety of different inferential strategies can be employed by Scientific Realists does not mean that ontological conclusions concerning which things we should be Scientific Realists about are arrived at by any inferen- tial route which eschews causes (§3), and nor is there any direct pressure for Scientific Real- ists to change their inferential methods (§4). We suggest that in order to maintain inferential parity with Scientific Realism, proponents of EIA need to give details about how and in what way the presence of mathematical entities directly contribute to explanations (§5).
Resumo:
We describe an apparatus designed to make non-demolition measurements on a Bose-Einstein condensate (BEC) trapped in a double-well optical cavity. This apparatus contains, as well as the bosonic gas and the trap, an optical cavity. We show how the interaction between the light and the atoms, under appropriate conditions, can allow for a weakly disturbing yet highly precise measurement of the population imbalance between the two wells and its variance. We show that the setting is well suited for the implementation of quantum-limited estimation strategies for the inference of the key parameters defining the evolution of the atomic system and based on measurements performed on the cavity field. This would enable {\it de facto} Hamiltonian diagnosis via a highly controllable quantum probe.
Resumo:
Mollusks are the most morphologically disparate living animal phylum, they have diversified into all habitats, and have a deep fossil record. Monophyly and identity of their eight living classes is undisputed, but relationships between these groups and patterns of their early radiation have remained elusive. Arguments about traditional morphological phylogeny focus on a small number of topological concepts but often without regard to proximity of the individual classes. In contrast, molecular studies have proposed a number of radically different, inherently contradictory, and controversial sister relationships. Here, we assembled a dataset of 42 unique published trees describing molluscan interrelationships. We used these data to ask several questions about the state of resolution of molluscan phylogeny compared to a null model of the variation possible in random trees constructed from a monophyletic assemblage of eight terminals. Although 27 different unique trees have been proposed from morphological inference, the majority of these are not statistically different from each other. Within the available molecular topologies, only four studies to date have included the deep-sea class Monoplacophora; but 36.4% of all trees are not significantly different. We also present supertrees derived from 2 data partitions and 3 methods, including all available molecular molluscan phylogenies, which will form the basis for future hypothesis testing. The supertrees presented here were not constructed to provide yet another hypothesis of molluscan relationships, but rather to algorithmically evaluate the relationships present in the disparate published topologies. Based on the totality of available evidence, certain patterns of relatedness among constituent taxa become clear. The internodal distance is consistently short between a few taxon pairs, particularly supporting the relatedness of Monoplacophora and the chitons, Polyplacophora. Other taxon pairs are rarely or never found in close proximity, such as the vermiform Caudofoveata and Bivalvia. Our results have specific utility for guiding constructive research planning in order to better test relationships in Mollusca as well as other problematic groups. Taxa with consistently proximate relationships should be the focus of a combined approach in a concerted assessment of potential genetic and anatomical homology, while unequivocally distant taxa will make the most constructive choices for exemplar selection in higher-level phylogenomic analyses.
Resumo:
Background: Pedigree reconstruction using genetic analysis provides a useful means to estimate fundamental population biology parameters relating to population demography, trait heritability and individual fitness when combined with other sources of data. However, there remain limitations to pedigree reconstruction in wild populations, particularly in systems where parent-offspring relationships cannot be directly observed, there is incomplete sampling of individuals, or molecular parentage inference relies on low quality DNA from archived material. While much can still be inferred from incomplete or sparse pedigrees, it is crucial to evaluate the quality and power of available genetic information a priori to testing specific biological hypotheses. Here, we used microsatellite markers to reconstruct a multi-generation pedigree of wild Atlantic salmon (Salmo salar L.) using archived scale samples collected with a total trapping system within a river over a 10 year period. Using a simulation-based approach, we determined the optimal microsatellite marker number for accurate parentage assignment, and evaluated the power of the resulting partial pedigree to investigate important evolutionary and quantitative genetic characteristics of salmon in the system.
Results: We show that at least 20 microsatellites (ave. 12 alleles/locus) are required to maximise parentage assignment and to improve the power to estimate reproductive success and heritability in this study system. We also show that 1.5 fold differences can be detected between groups simulated to have differing reproductive success, and that it is possible to detect moderate heritability values for continuous traits (h(2) similar to 0.40) with more than 80% power when using 28 moderately to highly polymorphic markers.
Conclusion: The methodologies and work flow described provide a robust approach for evaluating archived samples for pedigree-based research, even where only a proportion of the total population is sampled. The results demonstrate the feasibility of pedigree-based studies to address challenging ecological and evolutionary questions in free-living populations, where genealogies can be traced only using molecular tools, and that significant increases in pedigree assignment power can be achieved by using higher numbers of markers.
Resumo:
BACKGROUND: Urothelial pathogenesis is a complex process driven by an underlying network of interconnected genes. The identification of novel genomic target regions and gene targets that drive urothelial carcinogenesis is crucial in order to improve our current limited understanding of urothelial cancer (UC) on the molecular level. The inference of genome-wide gene regulatory networks (GRN) from large-scale gene expression data provides a promising approach for a detailed investigation of the underlying network structure associated to urothelial carcinogenesis.
METHODS: In our study we inferred and compared three GRNs by the application of the BC3Net inference algorithm to large-scale transitional cell carcinoma gene expression data sets from Illumina RNAseq (179 samples), Illumina Bead arrays (165 samples) and Affymetrix Oligo microarrays (188 samples). We investigated the structural and functional properties of GRNs for the identification of molecular targets associated to urothelial cancer.
RESULTS: We found that the urothelial cancer (UC) GRNs show a significant enrichment of subnetworks that are associated with known cancer hallmarks including cell cycle, immune response, signaling, differentiation and translation. Interestingly, the most prominent subnetworks of co-located genes were found on chromosome regions 5q31.3 (RNAseq), 8q24.3 (Oligo) and 1q23.3 (Bead), which all represent known genomic regions frequently deregulated or aberated in urothelial cancer and other cancer types. Furthermore, the identified hub genes of the individual GRNs, e.g., HID1/DMC1 (tumor development), RNF17/TDRD4 (cancer antigen) and CYP4A11 (angiogenesis/ metastasis) are known cancer associated markers. The GRNs were highly dataset specific on the interaction level between individual genes, but showed large similarities on the biological function level represented by subnetworks. Remarkably, the RNAseq UC GRN showed twice the proportion of significant functional subnetworks. Based on our analysis of inferential and experimental networks the Bead UC GRN showed the lowest performance compared to the RNAseq and Oligo UC GRNs.
CONCLUSION: To our knowledge, this is the first study investigating genome-scale UC GRNs. RNAseq based gene expression data is the data platform of choice for a GRN inference. Our study offers new avenues for the identification of novel putative diagnostic targets for subsequent studies in bladder tumors.
Resumo:
This study explored the validity of using critical thinking tests to predict final psychology degree marks over and above that already predicted by traditional admission exams (A-levels). Participants were a longitudinal sample of 109 psychology students from a university in the United Kingdom. The outcome measures were: total degree marks; and end of year marks. The predictor measures were: university admission exam results (A-levels); critical thinking test scores (skills & dispositions); and non-verbal intelligence scores. Hierarchical regressions showed A-levels significantly predicted 10% of the final degree score and the 11-item measure of ‘Inference skills’ from the California Critical Thinking Skills Test significantly predicted an additional 6% of degree outcome variance. The findings from this study should inform decisions about the precise measurement constructs included in aptitude tests used in the higher education admission process.
Resumo:
This paper proposes an efficient learning mechanism to build fuzzy rule-based systems through the construction of sparse least-squares support vector machines (LS-SVMs). In addition to the significantly reduced computational complexity in model training, the resultant LS-SVM-based fuzzy system is sparser while offers satisfactory generalization capability over unseen data. It is well known that the LS-SVMs have their computational advantage over conventional SVMs in the model training process; however, the model sparseness is lost, which is the main drawback of LS-SVMs. This is an open problem for the LS-SVMs. To tackle the nonsparseness issue, a new regression alternative to the Lagrangian solution for the LS-SVM is first presented. A novel efficient learning mechanism is then proposed in this paper to extract a sparse set of support vectors for generating fuzzy IF-THEN rules. This novel mechanism works in a stepwise subset selection manner, including a forward expansion phase and a backward exclusion phase in each selection step. The implementation of the algorithm is computationally very efficient due to the introduction of a few key techniques to avoid the matrix inverse operations to accelerate the training process. The computational efficiency is also confirmed by detailed computational complexity analysis. As a result, the proposed approach is not only able to achieve the sparseness of the resultant LS-SVM-based fuzzy systems but significantly reduces the amount of computational effort in model training as well. Three experimental examples are presented to demonstrate the effectiveness and efficiency of the proposed learning mechanism and the sparseness of the obtained LS-SVM-based fuzzy systems, in comparison with other SVM-based learning techniques.
Resumo:
The assimilation of discrete higher fidelity data points with model predictions can be used to achieve a reduction in the uncertainty of the model input parameters which generate accurate predictions. The problem investigated here involves the prediction of limit-cycle oscillations using a High-Dimensional Harmonic Balance method (HDHB). The efficiency of the HDHB method is exploited to enable calibration of structural input parameters using a Bayesian inference technique. Markov-chain Monte Carlo is employed to sample the posterior distributions. Parameter estimation is carried out on both a pitch/plunge aerofoil and Goland wing configuration. In both cases significant refinement was achieved in the distribution of possible structural parameters allowing better predictions of their
true deterministic values.
Resumo:
The accumulation of biogenic greenhouse gases (methane, carbon dioxide) in organic sediments is an important factor in the redevelopment and risk management of many brownfield sites. Good practice with brownfield site characterization requires the identification of free-gas phases and pathways that allow its migration and release at the ground surface. Gas pockets trapped in the subsurface have contrasting properties with the surrounding porous media that favor their detection using geophysical methods. We have developed a case study in which pockets of gas were intercepted with multilevel monitoring wells, and their lateral continuity was monitored over time using resistivity. We have developed a novel interpretation procedure based on Archie’s law to evaluate changes in water and gas content with respect to a mean background medium. We have used induced polarization data to account for errors in applying Archie’s law due to the contribution of surface conductivity effects. Mosaics defined by changes in water saturation allowed the recognition of gas migration and groundwater infiltration routes and the association of gas and groundwater fluxes. The inference on flux patterns was analyzed by taking into account pressure measurements in trapped gas reservoirs and by metagenomic analysis of the microbiological content, which was retrieved from suspended sediments in groundwater sampled in multilevel monitoring wells. A conceptual model combining physical and microbiological subsurface processes suggested that biogas trapped at depth may have the ability to quickly travel to the surface.
Resumo:
Introduction: Individuals carrying pathogenic mutations in the BRCA1 and BRCA2 genes have a high lifetime risk of breast cancer. BRCA1 and BRCA2 are involved in DNA double-strand break repair, DNA alterations that can be caused by exposure to reactive oxygen species, a main source of which are mitochondria. Mitochondrial genome variations affect electron transport chain efficiency and reactive oxygen species production. Individuals with different mitochondrial haplogroups differ in their metabolism and sensitivity to oxidative stress. Variability in mitochondrial genetic background can alter reactive oxygen species production, leading to cancer risk. In the present study, we tested the hypothesis that mitochondrial haplogroups modify breast cancer risk in BRCA1/2 mutation carriers.
Methods: We genotyped 22,214 (11,421 affected, 10,793 unaffected) mutation carriers belonging to the Consortium of Investigators of Modifiers of BRCA1/2 for 129 mitochondrial polymorphisms using the iCOGS array. Haplogroup inference and association detection were performed using a phylogenetic approach. ALTree was applied to explore the reference mitochondrial evolutionary tree and detect subclades enriched in affected or unaffected individuals.
Results: We discovered that subclade T1a1 was depleted in affected BRCA2 mutation carriers compared with the rest of clade T (hazard ratio (HR) = 0.55; 95% confidence interval (CI), 0.34 to 0.88; P = 0.01). Compared with the most frequent haplogroup in the general population (that is, H and T clades), the T1a1 haplogroup has a HR of 0.62 (95% CI, 0.40 to 0.95; P = 0.03). We also identified three potential susceptibility loci, including G13708A/rs28359178, which has demonstrated an inverse association with familial breast cancer risk.
Conclusions: This study illustrates how original approaches such as the phylogeny-based method we used can empower classical molecular epidemiological studies aimed at identifying association or risk modification effects.
Resumo:
Background: Selection bias in HIV prevalence estimates occurs if non-participation in testing is correlated with HIV status. Longitudinal data suggests that individuals who know or suspect they are HIV positive are less likely to participate in testing in HIV surveys, in which case methods to correct for missing data which are based on imputation and observed characteristics will produce biased results. Methods: The identity of the HIV survey interviewer is typically associated with HIV testing participation, but is unlikely to be correlated with HIV status. Interviewer identity can thus be used as a selection variable allowing estimation of Heckman-type selection models. These models produce asymptotically unbiased HIV prevalence estimates, even when non-participation is correlated with unobserved characteristics, such as knowledge of HIV status. We introduce a new random effects method to these selection models which overcomes non-convergence caused by collinearity, small sample bias, and incorrect inference in existing approaches. Our method is easy to implement in standard statistical software, and allows the construction of bootstrapped standard errors which adjust for the fact that the relationship between testing and HIV status is uncertain and needs to be estimated. Results: Using nationally representative data from the Demographic and Health Surveys, we illustrate our approach with new point estimates and confidence intervals (CI) for HIV prevalence among men in Ghana (2003) and Zambia (2007). In Ghana, we find little evidence of selection bias as our selection model gives an HIV prevalence estimate of 1.4% (95% CI 1.2% – 1.6%), compared to 1.6% among those with a valid HIV test. In Zambia, our selection model gives an HIV prevalence estimate of 16.3% (95% CI 11.0% - 18.4%), compared to 12.1% among those with a valid HIV test. Therefore, those who decline to test in Zambia are found to be more likely to be HIV positive. Conclusions: Our approach corrects for selection bias in HIV prevalence estimates, is possible to implement even when HIV prevalence or non-participation is very high or very low, and provides a practical solution to account for both sampling and parameter uncertainty in the estimation of confidence intervals. The wide confidence intervals estimated in an example with high HIV prevalence indicate that it is difficult to correct statistically for the bias that may occur when a large proportion of people refuse to test.
Resumo:
The ejected mass distribution of Type Ia supernovae (SNe Ia) directly probes progenitor evolutionary history and explosion mechanisms, with implications for their use as cosmological probes. Although the Chandrasekhar mass is a natural mass scale for the explosion of white dwarfs as SNe Ia, models allowing SNe Ia to explode at other masses have attracted much recent attention. Using an empirical relation between the ejected mass and the light-curve width, we derive ejected masses Mej and 56Ni masses MNi for a sample of 337 SNe Ia with redshifts z <0.7 used in recent cosmological analyses. We use hierarchical Bayesian inference to reconstruct the joint Mej-MNi distribution, accounting for measurement errors. The inferred marginal distribution of Mej has a long tail towards sub-Chandrasekhar masses, but cuts off sharply above 1.4 M⊙. Our results imply that 25-50 per cent of normal SNe Ia are inconsistent with Chandrasekhar-mass explosions, with almost all of these being sub-Chandrasekhar mass; super-Chandrasekhar-mass explosions make up no more than 1 per cent of all spectroscopically normal SNe Ia. We interpret the SN Ia width-luminosity relation as an underlying relation between Mej and MNi, and show that the inferred relation is not naturally explained by the predictions of any single known explosion mechanism.