998 resultados para 230204 Applied Statistics
Resumo:
When the data consist of certain attributes measured on the same set of items in different situations, they would be described as a three-mode three-way array. A mixture likelihood approach can be implemented to cluster the items (i.e., one of the modes) on the basis of both of the other modes simultaneously (i.e,, the attributes measured in different situations). In this paper, it is shown that this approach can be extended to handle three-mode three-way arrays where some of the data values are missing at random in the sense of Little and Rubin (1987). The methodology is illustrated by clustering the genotypes in a three-way soybean data set where various attributes were measured on genotypes grown in several environments.
Resumo:
1. There are a variety of methods that could be used to increase the efficiency of the design of experiments. However, it is only recently that such methods have been considered in the design of clinical pharmacology trials. 2. Two such methods, termed data-dependent (e.g. simulation) and data-independent (e.g. analytical evaluation of the information in a particular design), are becoming increasingly used as efficient methods for designing clinical trials. These two design methods have tended to be viewed as competitive, although a complementary role in design is proposed here. 3. The impetus for the use of these two methods has been the need for a more fully integrated approach to the drug development process that specifically allows for sequential development (i.e. where the results of early phase studies influence later-phase studies). 4. The present article briefly presents the background and theory that underpins both the data-dependent and -independent methods with the use of illustrative examples from the literature. In addition, the potential advantages and disadvantages of each method are discussed.
Resumo:
In this study we present a novel automated strategy for predicting infarct evolution, based on MR diffusion and perfusion images acquired in the acute stage of stroke. The validity of this methodology was tested on novel patient data including data acquired from an independent stroke clinic. Regions-of-interest (ROIs) defining the initial diffusion lesion and tissue with abnormal hemodynamic function as defined by the mean transit time (MTT) abnormality were automatically extracted from DWI/PI maps. Quantitative measures of cerebral blood flow (CBF) and volume (CBV) along with ratio measures defined relative to the contralateral hemisphere (r(a)CBF and r(a)CBV) were calculated for the MTT ROIs. A parametric normal classifier algorithm incorporating these measures was used to predict infarct growth. The mean r(a)CBF and r(a)CBV values for eventually infarcted MTT tissue were 0.70 +/-0.19 and 1.20 +/-0.36. For recovered tissue the mean values were 0.99 +/-0.25 and 1.87 +/-0.71, respectively. There was a significant difference between these two regions for both measures (P
Resumo:
We present the conditional quantum dynamics of an electron tunneling between two quantum dots subject to a measurement using a low transparency point contact or tunnel junction. The double dot system forms a single qubit and the measurement corresponds to a continuous in time readout of the occupancy of the quantum dot. We illustrate the difference between conditional and unconditional dynamics of the qubit. The conditional dynamics is discussed in two regimes depending on the rate of tunneling through the point contact: quantum jumps, in which individual electron tunneling current events can be distinguished, and a diffusive dynamics in which individual events are ignored, and the time-averaged current is considered as a continuous diffusive variable. We include the effect of inefficient measurement and the influence of the relative phase between the two tunneling amplitudes of the double dot/point contact system.
Resumo:
In many occupational safety interventions, the objective is to reduce the injury incidence as well as the mean claims cost once injury has occurred. The claims cost data within a period typically contain a large proportion of zero observations (no claim). The distribution thus comprises a point mass at 0 mixed with a non-degenerate parametric component. Essentially, the likelihood function can be factorized into two orthogonal components. These two components relate respectively to the effect of covariates on the incidence of claims and the magnitude of claims, given that claims are made. Furthermore, the longitudinal nature of the intervention inherently imposes some correlation among the observations. This paper introduces a zero-augmented gamma random effects model for analysing longitudinal data with many zeros. Adopting the generalized linear mixed model (GLMM) approach reduces the original problem to the fitting of two independent GLMMs. The method is applied to evaluate the effectiveness of a workplace risk assessment teams program, trialled within the cleaning services of a Western Australian public hospital.
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
This paper studies the role of income distribution and technology transfer in the process of economic development. A novel aspect of the model is that the composition of human capital as well as the level affect economic growth. Utilizing an overlapping-generations model in which income distribution changes endogenously, we present an economic explanation for why some countries could not start modern economic growth; why some countries took off but have apparently stopped growing after some time; and why some countries have successfully developed and continue to grow
Resumo:
In this paper, we consider testing for additivity in a class of nonparametric stochastic regression models. Two test statistics are constructed and their asymptotic distributions are established. We also conduct a small sample study for one of the test statistics through a simulated example. (C) 2002 Elsevier Science (USA).
Resumo:
This article develops a weighted least squares version of Levene's test of homogeneity of variance for a general design, available both for univariate and multivariate situations. When the design is balanced, the univariate and two common multivariate test statistics turn out to be proportional to the corresponding ordinary least squares test statistics obtained from an analysis of variance of the absolute values of the standardized mean-based residuals from the original analysis of the data. The constant of proportionality is simply a design-dependent multiplier (which does not necessarily tend to unity). Explicit results are presented for randomized block and Latin square designs and are illustrated for factorial treatment designs and split-plot experiments. The distribution of the univariate test statistic is close to a standard F-distribution, although it can be slightly underdispersed. For a complex design, the test assesses homogeneity of variance across blocks, treatments, or treatment factors and offers an objective interpretation of residual plot.
Resumo:
We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.