950 resultados para Finite mixture modelling
Resumo:
When applying multivariate analysis techniques in information systems and social science disciplines, such as management information systems (MIS) and marketing, the assumption that the empirical data originate from a single homogeneous population is often unrealistic. When applying a causal modeling approach, such as partial least squares (PLS) path modeling, segmentation is a key issue in coping with the problem of heterogeneity in estimated cause-and-effect relationships. This chapter presents a new PLS path modeling approach which classifies units on the basis of the heterogeneity of the estimates in the inner model. If unobserved heterogeneity significantly affects the estimated path model relationships on the aggregate data level, the methodology will allow homogenous groups of observations to be created that exhibit distinctive path model estimates. The approach will, thus, provide differentiated analytical outcomes that permit more precise interpretations of each segment formed. An application on a large data set in an example of the American customer satisfaction index (ACSI) substantiates the methodology’s effectiveness in evaluating PLS path modeling results.
Resumo:
Abstract not available
Resumo:
The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining-a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi-likelihood estimation. The proposed hierarchical mixture regression approach enables the identification and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration, (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Tese apresentada como requisito parcial para obtenção do grau de Doutor em Estatística e Gestão de Informação pelo Instituto Superior de Estatística e Gestão de Informação da Universidade Nova de Lisboa
Resumo:
In a recent paper Bermúdez [2009] used bivariate Poisson regression models for ratemaking in car insurance, and included zero-inflated models to account for the excess of zeros and the overdispersion in the data set. In the present paper, we revisit this model in order to consider alternatives. We propose a 2-finite mixture of bivariate Poisson regression models to demonstrate that the overdispersion in the data requires more structure if it is to be taken into account, and that a simple zero-inflated bivariate Poisson model does not suffice. At the same time, we show that a finite mixture of bivariate Poisson regression models embraces zero-inflated bivariate Poisson regression models as a special case. Additionally, we describe a model in which the mixing proportions are dependent on covariates when modelling the way in which each individual belongs to a separate cluster. Finally, an EM algorithm is provided in order to ensure the models’ ease-of-fit. These models are applied to the same automobile insurance claims data set as used in Bermúdez [2009] and it is shown that the modelling of the data set can be improved considerably.
Resumo:
A combination of modelling and analysis techniques was used to design a six component force balance. The balance was designed specifically for the measurement of impulsive aerodynamic forces and moments characteristic of hypervelocity shock tunnel testing using the stress wave force measurement technique. Aerodynamic modelling was used to estimate the magnitude and distribution of forces and finite element modelling to determine the mechanical response of proposed balance designs. Simulation of balance performance was based on aerodynamic loads and mechanical responses using convolution techniques. Deconvolution was then used to assess balance performance and to guide further design modifications leading to the final balance design. (C) 2001 Elsevier Science Ltd. All rights reserved.
Resumo:
When the data consist of certain attributes measured on the same set of items in different situations, they would be described as a three-mode three-way array. A mixture likelihood approach can be implemented to cluster the items (i.e., one of the modes) on the basis of both of the other modes simultaneously (i.e,, the attributes measured in different situations). In this paper, it is shown that this approach can be extended to handle three-mode three-way arrays where some of the data values are missing at random in the sense of Little and Rubin (1987). The methodology is illustrated by clustering the genotypes in a three-way soybean data set where various attributes were measured on genotypes grown in several environments.
Resumo:
We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Mecânica Especialização em Concepção e Produção
Resumo:
Affiliation: Institut de recherche en immunologie et en cancérologie, Université de Montréal
Resumo:
The extensive shoreline deposits of Lake Chilwa, southern Malawi, a shallow water body today covering 600 km2 of a basin of 7500 km2, are investigated for their record of late Quaternary highstands. OSL dating, applied to 36 samples from five sediment cores from the northern and western marginal sand ridges, reveal a highstand record spanning 44 ka. Using two different grouping methods, highstand phases are identified at 43.7–33.3 ka, 26.2–21.0 ka and 17.9–12.0 ka (total error method) or 38.4–35.5 ka, 24.3–22.3 ka, 16.2–15.1 ka and 13.5–12.7 ka (Finite Mixture Model age components) with two further discrete events recorded at 11.01 ± 0.76 ka and 8.52 ± 0.56 ka. Highstands are comparable to the timing of wet phases from other basins in East and southern Africa, demonstrating wet conditions in the region before the LGM, which was dry, and a wet Lateglacial, which commenced earlier in the southern compared to northern hemisphere in East Africa. We find no evidence that wet phases are insolation driven, but analysis of the dataset and GCM modelling experiments suggest that Heinrich events may be associated with enhanced monsoon activity in East Africa in both timing and as a possible causal mechanism.
Resumo:
Clustering methods are increasingly being applied to residential smart meter data, providing a number of important opportunities for distribution network operators (DNOs) to manage and plan the low voltage networks. Clustering has a number of potential advantages for DNOs including, identifying suitable candidates for demand response and improving energy profile modelling. However, due to the high stochasticity and irregularity of household level demand, detailed analytics are required to define appropriate attributes to cluster. In this paper we present in-depth analysis of customer smart meter data to better understand peak demand and major sources of variability in their behaviour. We find four key time periods in which the data should be analysed and use this to form relevant attributes for our clustering. We present a finite mixture model based clustering where we discover 10 distinct behaviour groups describing customers based on their demand and their variability. Finally, using an existing bootstrapping technique we show that the clustering is reliable. To the authors knowledge this is the first time in the power systems literature that the sample robustness of the clustering has been tested.
Resumo:
The aim of this paper consists in presenting a method of simulating the warpage in 7xxx series aluminium alloy plates. To perform this simulation finite element software MSC.Patran and MSC.Marc were used. Another result of this analysis will be the influence on material residual stresses induced on the raw material during the rolling process upon the warpage of primary aeronautic parts, fabricated through machining (milling) at Embraer. The method used to determinate the aluminium plate residual stress was Layer Removal Test. The numerical algorithm Modified Flavenot Method was used to convert layer removal and beam deflection in stress level. With such information about the level and profile of residual stresses become possible, during the step that anticipate the manufacturing to incorporate these values in the finite-element approach for modelling warpage parts. Based on that warpage parameter surely the products are manufactured with low relative vulnerability propitiating competitiveness and price. © 2007 American Institute of Physics.