90 resultados para Applied Statistics
Finite mixture regression model with random effects: application to neonatal hospital length of stay
Resumo:
A two-component mixture regression model that allows simultaneously for heterogeneity and dependency among observations is proposed. By specifying random effects explicitly in the linear predictor of the mixture probability and the mixture components, parameter estimation is achieved by maximising the corresponding best linear unbiased prediction type log-likelihood. Approximate residual maximum likelihood estimates are obtained via an EM algorithm in the manner of generalised linear mixed model (GLMM). The method can be extended to a g-component mixture regression model with the component density from the exponential family, leading to the development of the class of finite mixture GLMM. For illustration, the method is applied to analyse neonatal length of stay (LOS). It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining-a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi-likelihood estimation. The proposed hierarchical mixture regression approach enables the identification and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration, (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
Let K(r, s, t) denote the complete tripartite graph with partite sets of size r, s and t, where r less than or equal to s less than or equal to t. Let D be the graph consisting of a triangle with an edge attached. We show that K(r, s, t) may be decomposed into copies of D if and only if 4 divides rs + st + rt and t less than or equal to 3rs/(r + s).
Resumo:
In this note we first introduce balanced critical sets and near balanced critical sets in Latin squares. Then we prove that there exist balanced critical sets in the back circulant Latin squares of order 3n for n even. Using this result we decompose the back circulant Latin squares of order 3n, n even, into three isotopic and disjoint balanced critical sets each of size 3n. We also find near balanced critical sets in the back circulant Latin squares of order 3n for n odd. Finally, we examine representatives of each main class of Latin squares of order up to six in order to determine which main classes contain balanced or near balanced critical sets.
Resumo:
Australian sugar-producing regions have differed in terms of the extent and rate of incorporation of new technology into harvesting systems. The Mackay sugar industry has lagged behind most other sugar-producing regions in this regard. The reasons for this are addressed by invoking an evolutionary economics perspective. The development of harvesting systems, and the role of technology in shaping them, is mapped and interpreted using the concept of path dependency. Key events in the evolution of harvesting systems are identified, which show how the past has shaped the regional development of harvesting systems. From an evolutionary economics perspective, the outcomes observed are the end result of a specific history.
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Mixture models implemented via the expectation-maximization (EM) algorithm are being increasingly used in a wide range of problems in pattern recognition such as image segmentation. However, the EM algorithm requires considerable computational time in its application to huge data sets such as a three-dimensional magnetic resonance (MR) image of over 10 million voxels. Recently, it was shown that a sparse, incremental version of the EM algorithm could improve its rate of convergence. In this paper, we show how this modified EM algorithm can be speeded up further by adopting a multiresolution kd-tree structure in performing the E-step. The proposed algorithm outperforms some other variants of the EM algorithm for segmenting MR images of the human brain. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.
Resumo:
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Resumo:
Factorial experiments with spatially arranged units occur in many situations, particularly in agricultural field trials. The design of such experiments when observations are spatially correlated is investigated in this paper. We show that having a large number of within-factor level changes in rows and columns is important for efficient and robust designs, and demonstrate how designs with these properties can be constructed. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
A generic method for the estimation of parameters for Stochastic Ordinary Differential Equations (SODEs) is introduced and developed. This algorithm, called the GePERs method, utilises a genetic optimisation algorithm to minimise a stochastic objective function based on the Kolmogorov-Smirnov statistic. Numerical simulations are utilised to form the KS statistic. Further, the examination of some of the factors that improve the precision of the estimates is conducted. This method is used to estimate parameters of diffusion equations and jump-diffusion equations. It is also applied to the problem of model selection for the Queensland electricity market. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
The paper presents a framework for small area population estimation that enables users to select a method that is fit for the purpose. The adjustments to input data that are needed before use are outlined, with emphasis on developing consistent time series of inputs. We show how geographical harmonization of small areas, which is crucial to comparisons over time, can be achieved. For two study regions, the East of England and Yorkshire and the Humber, the differences in output and consequences of adopting different methods are illustrated. The paper concludes with a discussion of how data, on stream since 1998, might be included in future small area estimates.
Resumo:
Progress in bean breeding programs requires the exploitation of genetic variation that is present among races or through introgression across gene pools of Phaseolus vulgaris L. Of the two major common bean gene pools, the Andean gene pool seems to have a narrow genetic base, with about 10% of the accessions in the CIAT core collection presenting evidence of introgression. The objective of this study was to quantify the degree of spontaneous introgression in a sample of common bean landraces from the Andean gene pool. The effects of introgression on morphological, economic and nutritional attributes were also investigated. Homogeneity analysis was performed on molecular marker data from 426 Andean-type accessions from the primary centres of origin of the CIAT common bean core collection and two check varieties. Quantitative attribute diversity for 15 traits was studied based on the groups found from the cluster analysis of marker prevalence indices computed for each accession. The two-group summary consisted of one group of 58 accessions (14%) with low prevalence indices and another group of 370 accessions (86%) with high prevalence indices. The smaller group occupied the outlying area of points displayed from homogeneity analysis, yet their geographic origin was widely distributed over the Andean region. This group was regarded as introgressed, since its accessions displayed traits that are associated with the Middle American gene pool: high resistance to Andean disease isolates but low resistance to Middle American disease isolates, low seed weight and high scores for all nutrient elements. Genotypes generated by spontaneous introgression can be helpful for breeders to overcome the difficulties in transferring traits between gene pools.