Biblioteca Digital

90 resultados para Applied Statistics

Robust cluster analysis via mixture models

Relevância:

60.00% 60.00%

Publicador:

Veja mais

Using intervention time series analyses to assess the effects of imperfectly identifiable natural events: A general method and example

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Intervention time series analysis (ITSA) is an important method for analysing the effect of sudden events on time series data. ITSA methods are quasi-experimental in nature and the validity of modelling with these methods depends upon assumptions about the timing of the intervention and the response of the process to it. METHOD: This paper describes how to apply ITSA to analyse the impact of unplanned events on time series when the timing of the event is not accurately known, and so the problems of ITSA methods are magnified by uncertainty in the point of onset of the unplanned intervention. RESULTS: The methods are illustrated using the example of the Australian Heroin Shortage of 2001, which provided an opportunity to study the health and social consequences of an abrupt change in heroin availability in an environment of widespread harm reduction measures. CONCLUSION: Application of these methods enables valuable insights about the consequences of unplanned and poorly identified interventions while minimising the risk of spurious results.

Veja mais

The availability of nitrogen from sugarcane trash on contrasting soils in the wet tropics of North Queensland

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sugarcane crop residues ('trash') have the potential to supply nitrogen (N) to crops when they are retained on the soil surface after harvest. Farmers should account for the contribution of this N to crop requirements in order to avoid over-fertilisation. In very wet tropical locations, the climate may increase the rate of trash decomposition as well as the amount of N lost from the soil-plant system due to leaching or denitrification. A field experiment was conducted on Hydrosol and Ferrosol soils in the wet tropics of northern Australia using N-15-labelled trash either applied to the soil surface or incorporated. Labelled urea fertiliser was also applied with unlabelled surface trash. The objective of the experiment was to investigate the contribution of trash to crop N nutrition in wet tropical climates, the timing of N mineralisation from trash, and the retention of trash N in contrasting soils. Less than 6% of the N in trash was recovered in the first crop and the recovery was not affected by trash incorporation. Around 6% of the N in fertiliser was also recovered in the first crop, which was less than previously measured in temperate areas (20-40%). Leaf samples taken at the end of the second crop contined 2-3% of N from trash and fertilizer applied at the beginning of the experiment. Although most N was recovered in the 0-1.5 m soil layer there was some evidence of movement of N below this depth. The results showed that trash supplies N slowly and in small amounts to the succeeding crop in wet tropics sugarcane growing areas regardless of trash placement (on the soil surface or incorporated) or soil type, and so N mineralisation from a single trash blanket is not important for sugarcane production in the wet tropics.

Veja mais

Linking gene-expression experiments with survival-time data

Relevância:

60.00% 60.00%

Publicador:

Veja mais

On the simultaneous use of clinical and microarray expression data in the cluster analysis of tissue samples

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).

Veja mais

Grouping three-mode data with mixture methods: the case of the diseased blue crabs

Relevância:

60.00% 60.00%

Publicador:

Veja mais

Analysing environmental data from the Great Barrier Reef using nonlinear principal component analysis

Relevância:

60.00% 60.00%

Publicador:

Veja mais

Methods for Categorical Longitudinal Survey Data: Understanding Employment Status of Australian Women

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many variables that are of interest in social science research are nominal variables with two or more categories, such as employment status, occupation, political preference, or self-reported health status. With longitudinal survey data it is possible to analyse the transitions of individuals between different employment states or occupations (for example). In the statistical literature, models for analysing categorical dependent variables with repeated observations belong to the family of models known as generalized linear mixed models (GLMMs). The specific GLMM for a dependent variable with three or more categories is the multinomial logit random effects model. For these models, the marginal distribution of the response does not have a closed form solution and hence numerical integration must be used to obtain maximum likelihood estimates for the model parameters. Techniques for implementing the numerical integration are available but are computationally intensive requiring a large amount of computer processing time that increases with the number of clusters (or individuals) in the data and are not always readily accessible to the practitioner in standard software. For the purposes of analysing categorical response data from a longitudinal social survey, there is clearly a need to evaluate the existing procedures for estimating multinomial logit random effects model in terms of accuracy, efficiency and computing time. The computational time will have significant implications as to the preferred approach by researchers. In this paper we evaluate statistical software procedures that utilise adaptive Gaussian quadrature and MCMC methods, with specific application to modeling employment status of women using a GLMM, over three waves of the HILDA survey.

Veja mais

Mixture Model-based Statistical Pattern Recognition of Clustered or Longitudinal Data

Relevância:

60.00% 60.00%

Publicador:

Veja mais

The significance of the competition between rural living and cane growing

Relevância:

60.00% 60.00%

Publicador:

Veja mais

Issues of robustness and high dimensionality in cluster analysis

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.

Veja mais

Multilevel modelling for inference of genetic regulatory networks

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Time-course experiments with microarrays are often used to study dynamic biological systems and genetic regulatory networks (GRNs) that model how genes inﬂuence each other in cell-level development of organisms. The inference for GRNs provides important insights into the fundamental biological processes such as growth and is useful in disease diagnosis and genomic drug design. Due to the experimental design, multilevel data hierarchies are often present in time-course gene expression data. Most existing methods, however, ignore the dependency of the expression measurements over time and the correlation among gene expression proﬁles. Such independence assumptions violate regulatory interactions and can result in overlooking certain important subject eﬀects and lead to spurious inference for regulatory networks or mechanisms. In this paper, a multilevel mixed-eﬀects model is adopted to incorporate data hierarchies in the analysis of time-course data, where temporal and subject eﬀects are both assumed to be random. The method starts with the clustering of genes by ﬁtting the mixture model within the multilevel random-eﬀects model framework using the expectation-maximization (EM) algorithm. The network of regulatory interactions is then determined by searching for regulatory control elements (activators and inhibitors) shared by the clusters of co-expressed genes, based on a time-lagged correlation coeﬃcients measurement. The method is applied to two real time-course datasets from the budding yeast (Saccharomyces cerevisiae) genome. It is shown that the proposed method provides clusters of cell-cycle regulated genes that are supported by existing gene function annotations, and hence enables inference on regulatory interactions for the genetic network.

Veja mais