995 resultados para mixture distributions


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A popular way to account for unobserved heterogeneity is to assume that the data are drawn from a finite mixture distribution. A barrier to using finite mixture models is that parameters that could previously be estimated in stages must now be estimated jointly: using mixture distributions destroys any additive separability of the log-likelihood function. We show, however, that an extension of the EM algorithm reintroduces additive separability, thus allowing one to estimate parameters sequentially during each maximization step. In establishing this result, we develop a broad class of estimators for mixture models. Returning to the likelihood problem, we show that, relative to full information maximum likelihood, our sequential estimator can generate large computational savings with little loss of efficiency.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so. that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this paper is to establish some mixture distributions that arise in stochastic processes. Some basic functions associated with the probability mass function of the mixture distributions, such as k-th moments, characteristic function and factorial moments are computed. Further we obtain a three-term recurrence relation for each established mixture distribution.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A two-component mixture regression model that allows simultaneously for heterogeneity and dependency among observations is proposed. By specifying random effects explicitly in the linear predictor of the mixture probability and the mixture components, parameter estimation is achieved by maximising the corresponding best linear unbiased prediction type log-likelihood. Approximate residual maximum likelihood estimates are obtained via an EM algorithm in the manner of generalised linear mixed model (GLMM). The method can be extended to a g-component mixture regression model with the component density from the exponential family, leading to the development of the class of finite mixture GLMM. For illustration, the method is applied to analyse neonatal length of stay (LOS). It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining-a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi-likelihood estimation. The proposed hierarchical mixture regression approach enables the identification and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration, (C) 2003 Elsevier Science Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We investigate the utility to computational Bayesian analyses of a particular family of recursive marginal likelihood estimators characterized by the (equivalent) algorithms known as "biased sampling" or "reverse logistic regression" in the statistics literature and "the density of states" in physics. Through a pair of numerical examples (including mixture modeling of the well-known galaxy dataset) we highlight the remarkable diversity of sampling schemes amenable to such recursive normalization, as well as the notable efficiency of the resulting pseudo-mixture distributions for gauging prior-sensitivity in the Bayesian model selection context. Our key theoretical contributions are to introduce a novel heuristic ("thermodynamic integration via importance sampling") for qualifying the role of the bridging sequence in this procedure, and to reveal various connections between these recursive estimators and the nested sampling technique.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering algorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint ensures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the state-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and synthetic datasets show that the proposed algorithm outperforms SVM solvers in terms of training time and achieves similar accuracies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis entitled Reliability Modelling and Analysis in Discrete time Some Concepts and Models Useful in the Analysis of discrete life time data.The present study consists of five chapters. In Chapter II we take up the derivation of some general results useful in reliability modelling that involves two component mixtures. Expression for the failure rate, mean residual life and second moment of residual life of the mixture distributions in terms of the corresponding quantities in the component distributions are investigated. Some applications of these results are also pointed out. The role of the geometric,Waring and negative hypergeometric distributions as models of life lengths in the discrete time domain has been discussed already. While describing various reliability characteristics, it was found that they can be often considered as a class. The applicability of these models in single populations naturally extends to the case of populations composed of sub-populations making mixtures of these distributions worth investigating. Accordingly the general properties, various reliability characteristics and characterizations of these models are discussed in chapter III. Inference of parameters in mixture distribution is usually a difficult problem because the mass function of the mixture is a linear function of the component masses that makes manipulation of the likelihood equations, leastsquare function etc and the resulting computations.very difficult. We show that one of our characterizations help in inferring the parameters of the geometric mixture without involving computational hazards. As mentioned in the review of results in the previous sections, partial moments were not studied extensively in literature especially in the case of discrete distributions. Chapters IV and V deal with descending and ascending partial factorial moments. Apart from studying their properties, we prove characterizations of distributions by functional forms of partial moments and establish recurrence relations between successive moments for some well known families. It is further demonstrated that partial moments are equally efficient and convenient compared to many of the conventional tools to resolve practical problems in reliability modelling and analysis. The study concludes by indicating some new problems that surfaced during the course of the present investigation which could be the subject for a future work in this area.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we examine approaches to estimate a Bayesian mixture model at both single and multiple time points for a sample of actual and simulated aerosol particle size distribution (PSD) data. For estimation of a mixture model at a single time point, we use Reversible Jump Markov Chain Monte Carlo (RJMCMC) to estimate mixture model parameters including the number of components which is assumed to be unknown. We compare the results of this approach to a commonly used estimation method in the aerosol physics literature. As PSD data is often measured over time, often at small time intervals, we also examine the use of an informative prior for estimation of the mixture parameters which takes into account the correlated nature of the parameters. The Bayesian mixture model offers a promising approach, providing advantages both in estimation and inference.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

When estimating parameters that constitute a discrete probability distribution {pj}, it is difficult to determine how constraints should be made to guarantee that the estimated parameters { pˆj} constitute a probability distribution (i.e., pˆj>0, Σ pˆj =1). For age distributions estimated from mixtures of length-at-age distributions, the EM (expectationmaximization) algorithm (Hasselblad, 1966; Hoenig and Heisey, 1987; Kimura and Chikuni, 1987), restricted least squares (Clark, 1981), and weak quasisolutions (Troynikov, 2004) have all been used. Each of these methods appears to guarantee that the estimated distribution will be a true probability distribution with all categories greater than or equal to zero and with individual probabilities that sum to one. In addition, all these methods appear to provide a theoretical basis for solutions that will be either maximum-likelihood estimates or at least convergent to a probability distribut