90 resultados para 230204 Applied Statistics
Finite mixture regression model with random effects: application to neonatal hospital length of stay
Resumo:
A two-component mixture regression model that allows simultaneously for heterogeneity and dependency among observations is proposed. By specifying random effects explicitly in the linear predictor of the mixture probability and the mixture components, parameter estimation is achieved by maximising the corresponding best linear unbiased prediction type log-likelihood. Approximate residual maximum likelihood estimates are obtained via an EM algorithm in the manner of generalised linear mixed model (GLMM). The method can be extended to a g-component mixture regression model with the component density from the exponential family, leading to the development of the class of finite mixture GLMM. For illustration, the method is applied to analyse neonatal length of stay (LOS). It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining-a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi-likelihood estimation. The proposed hierarchical mixture regression approach enables the identification and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration, (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
Australian sugar-producing regions have differed in terms of the extent and rate of incorporation of new technology into harvesting systems. The Mackay sugar industry has lagged behind most other sugar-producing regions in this regard. The reasons for this are addressed by invoking an evolutionary economics perspective. The development of harvesting systems, and the role of technology in shaping them, is mapped and interpreted using the concept of path dependency. Key events in the evolution of harvesting systems are identified, which show how the past has shaped the regional development of harvesting systems. From an evolutionary economics perspective, the outcomes observed are the end result of a specific history.
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Mixture models implemented via the expectation-maximization (EM) algorithm are being increasingly used in a wide range of problems in pattern recognition such as image segmentation. However, the EM algorithm requires considerable computational time in its application to huge data sets such as a three-dimensional magnetic resonance (MR) image of over 10 million voxels. Recently, it was shown that a sparse, incremental version of the EM algorithm could improve its rate of convergence. In this paper, we show how this modified EM algorithm can be speeded up further by adopting a multiresolution kd-tree structure in performing the E-step. The proposed algorithm outperforms some other variants of the EM algorithm for segmenting MR images of the human brain. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.
Resumo:
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Resumo:
Factorial experiments with spatially arranged units occur in many situations, particularly in agricultural field trials. The design of such experiments when observations are spatially correlated is investigated in this paper. We show that having a large number of within-factor level changes in rows and columns is important for efficient and robust designs, and demonstrate how designs with these properties can be constructed. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
The paper presents a framework for small area population estimation that enables users to select a method that is fit for the purpose. The adjustments to input data that are needed before use are outlined, with emphasis on developing consistent time series of inputs. We show how geographical harmonization of small areas, which is crucial to comparisons over time, can be achieved. For two study regions, the East of England and Yorkshire and the Humber, the differences in output and consequences of adopting different methods are illustrated. The paper concludes with a discussion of how data, on stream since 1998, might be included in future small area estimates.
Resumo:
Progress in bean breeding programs requires the exploitation of genetic variation that is present among races or through introgression across gene pools of Phaseolus vulgaris L. Of the two major common bean gene pools, the Andean gene pool seems to have a narrow genetic base, with about 10% of the accessions in the CIAT core collection presenting evidence of introgression. The objective of this study was to quantify the degree of spontaneous introgression in a sample of common bean landraces from the Andean gene pool. The effects of introgression on morphological, economic and nutritional attributes were also investigated. Homogeneity analysis was performed on molecular marker data from 426 Andean-type accessions from the primary centres of origin of the CIAT common bean core collection and two check varieties. Quantitative attribute diversity for 15 traits was studied based on the groups found from the cluster analysis of marker prevalence indices computed for each accession. The two-group summary consisted of one group of 58 accessions (14%) with low prevalence indices and another group of 370 accessions (86%) with high prevalence indices. The smaller group occupied the outlying area of points displayed from homogeneity analysis, yet their geographic origin was widely distributed over the Andean region. This group was regarded as introgressed, since its accessions displayed traits that are associated with the Middle American gene pool: high resistance to Andean disease isolates but low resistance to Middle American disease isolates, low seed weight and high scores for all nutrient elements. Genotypes generated by spontaneous introgression can be helpful for breeders to overcome the difficulties in transferring traits between gene pools.
Resumo:
In broader catchment scale investigations, there is a need to understand and ultimately exploit the spatial variation of agricultural crops for an improved economic return. In many instances, this spatial variation is temporally unstable and may be different for various crop attributes and crop species. In the Australian sugar industry, the opportunity arose to evaluate the performance of 231 farms in the Tully Mill area in far north Queensland using production information on cane yield (t/ha) and CCS ( a fresh weight measure of sucrose content in the cane) accumulated over a 12-year period. Such an arrangement of data can be expressed as a 3-way array where a farm x attribute x year matrix can be evaluated and interactions considered. Two multivariate techniques, the 3-way mixture method of clustering and the 3-mode principal component analysis, were employed to identify meaningful relationships between farms that performed similarly for both cane yield and CCS. In this context, farm has a spatial component and the aim of this analysis was to determine if systematic patterns in farm performance expressed by cane yield and CCS persisted over time. There was no spatial relationship between cane yield and CCS. However, the analysis revealed that the relationship between farms was remarkably stable from one year to the next for both attributes and there was some spatial aggregation of farm performance in parts of the mill area. This finding is important, since temporally consistent spatial variation may be exploited to improve regional production. Alternatively, the putative causes of the spatial variation may be explored to enhance the understanding of sugarcane production in the wet tropics of Australia.
Resumo:
When studying genotype X environment interaction in multi-environment trials, plant breeders and geneticists often consider one of the effects, environments or genotypes, to be fixed and the other to be random. However, there are two main formulations for variance component estimation for the mixed model situation, referred to as the unconstrained-parameters (UP) and constrained-parameters (CP) formulations. These formulations give different estimates of genetic correlation and heritability as well as different tests of significance for the random effects factor. The definition of main effects and interactions and the consequences of such definitions should be clearly understood, and the selected formulation should be consistent for both fixed and random effects. A discussion of the practical outcomes of using the two formulations in the analysis of balanced data from multi-environment trials is presented. It is recommended that the CP formulation be used because of the meaning of its parameters and the corresponding variance components. When managed (fixed) environments are considered, users will have more confidence in prediction for them but will not be overconfident in prediction in the target (random) environments. Genetic gain (predicted response to selection in the target environments from the managed environments) is independent of formulation.