26 resultados para Expectation Maximization
em University of Queensland eSpace - Australia
Resumo:
When examining a rock mass, joint sets and their orientations can play a significant role with regard to how the rock mass will behave. To identify joint sets present in the rock mass, the orientation of individual fracture planer can be measured on exposed rock faces and the resulting data can be examined for heterogeneity. In this article, the expectation-maximization algorithm is used to lit mixtures of Kent component distributions to the fracture data to aid in the identification of joint sets. An additional uniform component is also included in the model to accommodate the noise present in the data.
Resumo:
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.
Resumo:
Mixture models implemented via the expectation-maximization (EM) algorithm are being increasingly used in a wide range of problems in pattern recognition such as image segmentation. However, the EM algorithm requires considerable computational time in its application to huge data sets such as a three-dimensional magnetic resonance (MR) image of over 10 million voxels. Recently, it was shown that a sparse, incremental version of the EM algorithm could improve its rate of convergence. In this paper, we show how this modified EM algorithm can be speeded up further by adopting a multiresolution kd-tree structure in performing the E-step. The proposed algorithm outperforms some other variants of the EM algorithm for segmenting MR images of the human brain. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.
Resumo:
QTL detection experiments in livestock species commonly use the half-sib design. Each male is mated to a number of females, each female producing a limited number of progeny. Analysis consists of attempting to detect associations between phenotype and genotype measured on the progeny. When family sizes are limiting experimenters may wish to incorporate as much information as possible into a single analysis. However, combining information across sires is problematic because of incomplete linkage disequilibrium between the markers and the QTL in the population. This study describes formulae for obtaining MLEs via the expectation maximization (EM) algorithm for use in a multiple-trait, multiple-family analysis. A model specifying a QTL with only two alleles, and a common within sire error variance is assumed. Compared to single-family analyses, power can be improved up to fourfold with multi-family analyses. The accuracy and precision of QTL location estimates are also substantially improved. With small family sizes, the multi-family, multi-trait analyses reduce substantially, but not totally remove, biases in QTL effect estimates. In situations where multiple QTL alleles are segregating the multi-family analysis will average out the effects of the different QTL alleles.
Resumo:
Objective: Inpatient length of stay (LOS) is an important measure of hospital activity, health care resource consumption, and patient acuity. This research work aims at developing an incremental expectation maximization (EM) based learning approach on mixture of experts (ME) system for on-line prediction of LOS. The use of a batchmode learning process in most existing artificial neural networks to predict LOS is unrealistic, as the data become available over time and their pattern change dynamically. In contrast, an on-line process is capable of providing an output whenever a new datum becomes available. This on-the-spot information is therefore more useful and practical for making decisions, especially when one deals with a tremendous amount of data. Methods and material: The proposed approach is illustrated using a real example of gastroenteritis LOS data. The data set was extracted from a retrospective cohort study on all infants born in 1995-1997 and their subsequent admissions for gastroenteritis. The total number of admissions in this data set was n = 692. Linked hospitalization records of the cohort were retrieved retrospectively to derive the outcome measure, patient demographics, and associated co-morbidities information. A comparative study of the incremental learning and the batch-mode learning algorithms is considered. The performances of the learning algorithms are compared based on the mean absolute difference (MAD) between the predictions and the actual LOS, and the proportion of predictions with MAD < 1 day (Prop(MAD < 1)). The significance of the comparison is assessed through a regression analysis. Results: The incremental learning algorithm provides better on-line prediction of LOS when the system has gained sufficient training from more examples (MAD = 1.77 days and Prop(MAD < 1) = 54.3%), compared to that using the batch-mode learning. The regression analysis indicates a significant decrease of MAD (p-value = 0.063) and a significant (p-value = 0.044) increase of Prop(MAD
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
Time-course experiments with microarrays are often used to study dynamic biological systems and genetic regulatory networks (GRNs) that model how genes influence each other in cell-level development of organisms. The inference for GRNs provides important insights into the fundamental biological processes such as growth and is useful in disease diagnosis and genomic drug design. Due to the experimental design, multilevel data hierarchies are often present in time-course gene expression data. Most existing methods, however, ignore the dependency of the expression measurements over time and the correlation among gene expression profiles. Such independence assumptions violate regulatory interactions and can result in overlooking certain important subject effects and lead to spurious inference for regulatory networks or mechanisms. In this paper, a multilevel mixed-effects model is adopted to incorporate data hierarchies in the analysis of time-course data, where temporal and subject effects are both assumed to be random. The method starts with the clustering of genes by fitting the mixture model within the multilevel random-effects model framework using the expectation-maximization (EM) algorithm. The network of regulatory interactions is then determined by searching for regulatory control elements (activators and inhibitors) shared by the clusters of co-expressed genes, based on a time-lagged correlation coefficients measurement. The method is applied to two real time-course datasets from the budding yeast (Saccharomyces cerevisiae) genome. It is shown that the proposed method provides clusters of cell-cycle regulated genes that are supported by existing gene function annotations, and hence enables inference on regulatory interactions for the genetic network.
Resumo:
We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright (C) 2003 John Wiley Sons, Ltd.
Resumo:
Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Resumo:
Cost functions dual to stochastic production technologies are derived and their properties are discussed. These cost functions are shown to be consistent with expected-utility maximization without placing serious structural restrictions on the underlying technology.
Resumo:
DNA replication fork arrest during the termination phase of chromosome replication in Bacillus subtilis is brought about by the replication terminator protein (RTP) bound to specific DNA terminator sequences (Tev sites) distributed throughout the terminus region. An attractive suggestion by others was that crucial to the functioning of the RTP-Ter complex is a specific interaction between RTP positioned on the DNA and the helicase associated with the approaching replication fork. Ln support of this was the behaviour of two site-directed mutants of RTP. They appeared to bind Ter DNA normally but were ineffective in fork arrest as ascertained by in vitro Escherichia coli DnaB helicase and replication assays. We describe here a system for assessing the fork-arrest behaviour of RTP mutants in a bona fide in vivo assay in B. subtilis. One of the previously studied mutants, RTP.Y33N, was non-functional in fork arrest in vivo, as predicted. But through extensive analyses, this RTP mutant was shown to be severely defective in binding to Ter DNA, contrary to expectation. Taken in conjunction with recent findings on the other mutant (RTP.E30K), it is concluded that there is as yet no substantive evidence from the behaviour of RTP mutants to support the Rm-helicase interaction model for fork arrest. In an extension of the present work on RTP.Y33N, we determined the dissociation rates of complexes formed by wild-type (wt) RTP and another RTP mutant with various terminator sequences. The functional wtRTP-TerI complex was quite stable (half-life of 182 minutes), reminiscent of the great stability of the E. coli Tus-Ter complex. More significant were the exceptional stabilities of complexes comprising wtRTP and an RTP double-mutant (E39K.R42Q) bound to some particular terminator sequences. From the measurement of in vivo fork-arrest activities of the various complexes, it is concluded that the stability (half-life) of the whole RTP-Ter complex is not the overriding determinant of arrest, and that the RTP-Ter complex must be actively disrupted, or RTP removed, by the action of the approaching replication fork. (C) 1999 Academic Press.
Resumo:
A chance constrained programming model is developed to assist Queensland barley growers make varietal and agronomic decisions in the face of changing product demands and volatile production conditions. Unsuitable or overlooked in many risk programming applications, the chance constrained programming approach nonetheless aptly captures the single-stage decision problem faced by barley growers of whether to plant lower-yielding but potentially higher-priced malting varieties, given a particular expectation of meeting malting grade standards. Different expectations greatly affect the optimal mix of malting and feed barley activities. The analysis highlights the suitability of chance constrained programming to this specific class of farm decision problem.
Resumo:
Diverse self-incompatibility (SI) mechanisms permit flowering plants to inhibit fertilization by pollen that express specificities in common with the pistil. Characteristic of at least two model systems is greatly reduced recombination across large genomic tracts surrounding the S-locus, which regulates SI. In three angiosperm families, including the Solanaceae, the gene that controls the expression of gametophytic SI in the pistil encodes a ribonuclease (S-RNase). The gene that controls pollen SI expression is currently unknown, although several candidates have recently been proposed. Although each candidate shows a high level of polymorphism and complete allelic disequilibrium with the S-RNase gene, such properties may merely reflect tight linkage to the S-locus, irrespective of any functional role in SI. We analyzed the magnitude and nature of nucleotide variation, with the objective of distinguishing likely candidates for regulators of SI from other genes embedded in the S-locus region. We studied the S-RNase gene of the Solanaceae and 48A, a candidate for the pollen gene in this system, and we also conducted a parallel analysis of the regulators of sporophytic SI in Brassica, a system in which both the pistil and pollen genes are known. Although the pattern of variation shown by the pollen gene of the Brassica system is consistent with its role as a determinant of pollen specificity, that of 48A departs from expectation. Our analysis further suggests that recombination between 48A and S-RNase may have occurred during the interval spanned by the gene genealogy, another indication that 48A may not regulate SI expression in pollen.
Resumo:
Quantum mechanics has been formulated in phase space, with the Wigner function as the representative of the quantum density operator, and classical mechanics has been formulated in Hilbert space, with the Groenewold operator as the representative of the classical Liouville density function. Semiclassical approximations to the quantum evolution of the Wigner function have been defined, enabling the quantum evolution to be approached from a classical starting point. Now analogous semiquantum approximations to the classical evolution of the Groenewold operator are defined, enabling the classical evolution to be approached from a quantum starting point. Simple nonlinear systems with one degree of freedom are considered, whose Hamiltonians are polynomials in the Hamiltonian of the simple harmonic oscillator. The behavior of expectation values of simple observables and of eigenvalues of the Groenewold operator are calculated numerically and compared for the various semiclassical and semiquantum approximations.