888 resultados para likelihood-based inference
Resumo:
Objective: To determine whether poverty and unemployment increase the likelihood of or delay recovery from common mental disorders, and whether these associations could be explained by subjective financial strain.
Resumo:
A maximum likelihood estimator based on the coalescent for unequal migration rates and different subpopulation sizes is developed. The method uses a Markov chain Monte Carlo approach to investigate possible genealogies with branch lengths and with migration events. Properties of the new method are shown by using simulated data from a four-population n-island model and a source–sink population model. Our estimation method as coded in migrate is tested against genetree; both programs deliver a very similar likelihood surface. The algorithm converges to the estimates fairly quickly, even when the Markov chain is started from unfavorable parameters. The method was used to estimate gene flow in the Nile valley by using mtDNA data from three human populations.
Resumo:
Phylogenetic analyses are increasingly used in attempts to clarify transmission patterns of human immunodeficiency virus type 1 (HIV-1), but there is a continuing discussion about their validity because convergent evolution and transmission of minor HIV variants may obscure epidemiological patterns. Here we have studied a unique HIV-1 transmission cluster consisting of nine infected individuals, for whom the time and direction of each virus transmission was exactly known. Most of the transmissions occurred between 1981 and 1983, and a total of 13 blood samples were obtained approximately 2-12 years later. The p17 gag and env V3 regions of the HIV-1 genome were directly sequenced from uncultured lymphocytes. A true phylogenetic tree was constructed based on the knowledge about when the transmissions had occurred and when the samples were obtained. This complex, known HIV-1 transmission history was compared with reconstructed molecular trees, which were calculated from the DNA sequences by several commonly used phylogenetic inference methods [Fitch-Margoliash, neighbor-joining, minimum-evolution, maximum-likelihood, maximum-parsimony, unweighted pair group method using arithmetic averages (UPGMA), and a Fitch-Margoliash method assuming a molecular clock (KITSCH)]. A majority of the reconstructed trees were good estimates of the true phylogeny; 12 of 13 taxa were correctly positioned in the most accurate trees. The choice of gene fragment was found to be more important than the choice of phylogenetic method and substitution model. However, methods that are sensitive to unequal rates of change performed more poorly (such as UPGMA and KITSCH, which assume a constant molecular clock). The rapidly evolving V3 fragment gave better reconstructions than p17, but a combined data set of both p17 and V3 performed best. The accuracy of the phylogenetic methods justifies their use in HIV-1 research and argues against convergent evolution and selective transmission of certain virus variants.
Resumo:
The genes for the protein synthesis elongation factors Tu (EF-Tu) and G (EF-G) are the products of an ancient gene duplication, which appears to predate the divergence of all extant organismal lineages. Thus, it should be possible to root a universal phylogeny based on either protein using the second protein as an outgroup. This approach was originally taken independently with two separate gene duplication pairs, (i) the regulatory and catalytic subunits of the proton ATPases and (ii) the protein synthesis elongation factors EF-Tu and EF-G. Questions about the orthology of the ATPase genes have obscured the former results, and the elongation factor data have been criticized for inadequate taxonomic representation and alignment errors. We have expanded the latter analysis using a broad representation of taxa from all three domains of life. All phylogenetic methods used strongly place the root of the universal tree between two highly distinct groups, the archaeons/eukaryotes and the eubacteria. We also find that a combined data set of EF-Tu and EF-G sequences favors placement of the eukaryotes within the Archaea, as the sister group to the Crenarchaeota. This relationship is supported by bootstrap values of 60-89% with various distance and maximum likelihood methods, while unweighted parsimony gives 58% support for archaeal monophyly.
Resumo:
The origin of land vertebrates was one of the major transitions in the history of vertebrates. Yet, despite many studies that are based on either morphology or molecules, the phylogenetic relationships among tetrapods and the other two living groups of lobe-finned fishes, the coelacanth and the lungfishes, are still unresolved and debated. Knowledge of the relationships among these lineages, which originated back in the Devonian, has profound implications for the reconstruction of the evolutionary scenario of the conquest of land. We collected the largest molecular data set on this issue so far, about 3,500 base pairs from seven species of the large 28S nuclear ribosomal gene. All phylogenetic analyses (maximum parsimony, neighbor-joining, and maximum likelihood) point toward the hypothesis that lungfishes and coelacanths form a monophyletic group and are equally closely related to land vertebrates. This evolutionary hypothesis complicates the identification of morphological or physiological preadaptations that might have permitted the common ancestor of tetrapods to colonize land. This is because the reconstruction of its ancestral conditions would be hindered by the difficulty to separate uniquely derived characters from shared derived characters in the coelacanth/lungfish and tetrapod lineages. This molecular phylogeny aids in the reconstruction of morphological evolutionary steps by providing a framework; however, only paleontological evidence can determine the sequence of morphological acquisitions that allowed lobe-finned fishes to colonize land.
Resumo:
Neste trabalho propomos o uso de um método Bayesiano para estimar o parâmetro de memória de um processo estocástico com memória longa quando sua função de verossimilhança é intratável ou não está disponível. Esta abordagem fornece uma aproximação para a distribuição a posteriori sobre a memória e outros parâmetros e é baseada numa aplicação simples do método conhecido como computação Bayesiana aproximada (ABC). Alguns estimadores populares para o parâmetro de memória serão revisados e comparados com esta abordagem. O emprego de nossa proposta viabiliza a solução de problemas complexos sob o ponto de vista Bayesiano e, embora aproximativa, possui um desempenho muito satisfatório quando comparada com métodos clássicos.
Resumo:
Many applications including object reconstruction, robot guidance, and. scene mapping require the registration of multiple views from a scene to generate a complete geometric and appearance model of it. In real situations, transformations between views are unknown and it is necessary to apply expert inference to estimate them. In the last few years, the emergence of low-cost depth-sensing cameras has strengthened the research on this topic, motivating a plethora of new applications. Although they have enough resolution and accuracy for many applications, some situations may not be solved with general state-of-the-art registration methods due to the signal-to-noise ratio (SNR) and the resolution of the data provided. The problem of working with low SNR data, in general terms, may appear in any 3D system, then it is necessary to propose novel solutions in this aspect. In this paper, we propose a method, μ-MAR, able to both coarse and fine register sets of 3D points provided by low-cost depth-sensing cameras, despite it is not restricted to these sensors, into a common coordinate system. The method is able to overcome the noisy data problem by means of using a model-based solution of multiplane registration. Specifically, it iteratively registers 3D markers composed by multiple planes extracted from points of multiple views of the scene. As the markers and the object of interest are static in the scenario, the transformations obtained for the markers are applied to the object in order to reconstruct it. Experiments have been performed using synthetic and real data. The synthetic data allows a qualitative and quantitative evaluation by means of visual inspection and Hausdorff distance respectively. The real data experiments show the performance of the proposal using data acquired by a Primesense Carmine RGB-D sensor. The method has been compared to several state-of-the-art methods. The results show the good performance of the μ-MAR to register objects with high accuracy in presence of noisy data outperforming the existing methods.
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Mixture models implemented via the expectation-maximization (EM) algorithm are being increasingly used in a wide range of problems in pattern recognition such as image segmentation. However, the EM algorithm requires considerable computational time in its application to huge data sets such as a three-dimensional magnetic resonance (MR) image of over 10 million voxels. Recently, it was shown that a sparse, incremental version of the EM algorithm could improve its rate of convergence. In this paper, we show how this modified EM algorithm can be speeded up further by adopting a multiresolution kd-tree structure in performing the E-step. The proposed algorithm outperforms some other variants of the EM algorithm for segmenting MR images of the human brain. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
Objective: Inpatient length of stay (LOS) is an important measure of hospital activity, health care resource consumption, and patient acuity. This research work aims at developing an incremental expectation maximization (EM) based learning approach on mixture of experts (ME) system for on-line prediction of LOS. The use of a batchmode learning process in most existing artificial neural networks to predict LOS is unrealistic, as the data become available over time and their pattern change dynamically. In contrast, an on-line process is capable of providing an output whenever a new datum becomes available. This on-the-spot information is therefore more useful and practical for making decisions, especially when one deals with a tremendous amount of data. Methods and material: The proposed approach is illustrated using a real example of gastroenteritis LOS data. The data set was extracted from a retrospective cohort study on all infants born in 1995-1997 and their subsequent admissions for gastroenteritis. The total number of admissions in this data set was n = 692. Linked hospitalization records of the cohort were retrieved retrospectively to derive the outcome measure, patient demographics, and associated co-morbidities information. A comparative study of the incremental learning and the batch-mode learning algorithms is considered. The performances of the learning algorithms are compared based on the mean absolute difference (MAD) between the predictions and the actual LOS, and the proportion of predictions with MAD < 1 day (Prop(MAD < 1)). The significance of the comparison is assessed through a regression analysis. Results: The incremental learning algorithm provides better on-line prediction of LOS when the system has gained sufficient training from more examples (MAD = 1.77 days and Prop(MAD < 1) = 54.3%), compared to that using the batch-mode learning. The regression analysis indicates a significant decrease of MAD (p-value = 0.063) and a significant (p-value = 0.044) increase of Prop(MAD
Resumo:
We have developed an alignment-free method that calculates phylogenetic distances using a maximum-likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf+py at http://www.bioinformatics.org.au), we have created a data set of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees. We find our pattern-based method statistically superior to all other tested alignment-free methods. We also demonstrate the general advantage of alignment-free methods over an approach based on automated alignments when sequences violate the assumption of collinearity. Similarly, we compare methods on empirical data from an existing alignment benchmark set that we used to derive reference distances and trees. Our pattern-based approach yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment-free methods. The pattern-based approach outperforms alignment-free methods and its phylogenetic accuracy is statistically indistinguishable from alignment-based distances.
Resumo:
Time-course experiments with microarrays are often used to study dynamic biological systems and genetic regulatory networks (GRNs) that model how genes influence each other in cell-level development of organisms. The inference for GRNs provides important insights into the fundamental biological processes such as growth and is useful in disease diagnosis and genomic drug design. Due to the experimental design, multilevel data hierarchies are often present in time-course gene expression data. Most existing methods, however, ignore the dependency of the expression measurements over time and the correlation among gene expression profiles. Such independence assumptions violate regulatory interactions and can result in overlooking certain important subject effects and lead to spurious inference for regulatory networks or mechanisms. In this paper, a multilevel mixed-effects model is adopted to incorporate data hierarchies in the analysis of time-course data, where temporal and subject effects are both assumed to be random. The method starts with the clustering of genes by fitting the mixture model within the multilevel random-effects model framework using the expectation-maximization (EM) algorithm. The network of regulatory interactions is then determined by searching for regulatory control elements (activators and inhibitors) shared by the clusters of co-expressed genes, based on a time-lagged correlation coefficients measurement. The method is applied to two real time-course datasets from the budding yeast (Saccharomyces cerevisiae) genome. It is shown that the proposed method provides clusters of cell-cycle regulated genes that are supported by existing gene function annotations, and hence enables inference on regulatory interactions for the genetic network.
Resumo:
Efficient new Bayesian inference technique is employed for studying critical properties of the Ising linear perceptron and for signal detection in code division multiple access (CDMA). The approach is based on a recently introduced message passing technique for densely connected systems. Here we study both critical and non-critical regimes. Results obtained in the non-critical regime give rise to a highly efficient signal detection algorithm in the context of CDMA; while in the critical regime one observes a first-order transition line that ends in a continuous phase transition point. Finite size effects are also studied. © 2006 Elsevier B.V. All rights reserved.
Resumo:
An improved inference method for densely connected systems is presented. The approach is based on passing condensed messages between variables, representing macroscopic averages of microscopic messages. We extend previous work that showed promising results in cases where the solution space is contiguous to cases where fragmentation occurs. We apply the method to the signal detection problem of Code Division Multiple Access (CDMA) for demonstrating its potential. A highly efficient practical algorithm is also derived on the basis of insight gained from the analysis. © EDP Sciences.