35 resultados para Statistics - Analysis
Resumo:
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Resumo:
Previous research has reported both agreements and serious anomalies in relationships between production attributes of sugarcane varieties in variety trials (VTs) and commercial production (CP). This paper examines VT and CP data for tonnes of cane per hectare (TCH) and sugar content (CCS). Data, analysed by REML, included 107 VTs and 54 CP mill years for 9 varieties from the mill districts of Mulgrave, Babinda, and Tully for harvest years 1982-99. Important consistencies included high TCH of Q152, high CCS of Q117 and Q120, and low CCS of H56-752. Significant anomalies existed with respect to TCH for Q113, Q117, Q120, Q122, Q138, and H56-752 and to CCS for Q113 and Q124. Investigation of these anomalies was assisted by access to independent REML analyses of CP data for 65692 individual Tully cane blocks from 1988 to 1999 and by the knowledge of persons familiar with the preferential uses of varieties by farmers. Minor anomalies were due to limited year or mill area data. Q124 TCH was deemed to be decreased and its CCS increased by severe disease in Babinda CP in the extremely wet 1998 and 1999 seasons. Other serious anomalies have credible but unsubstantiated explanations. The most convincing, for Q113, Q117, Q138, and H56-752, are that these varieties were deployed unevenly with regard to late season harvesting, predominant use or avoidance on high fertility soils, or use confined to low fertility sandy soils, respectively. Uneven deployment results in confounding of these effects in the varietal CP statistics at mill area level. It is concluded that VTs cannot be enhanced to anticipate or evaluate most effects of uneven deployment. They give adequate predictions of relative CP performance for varieties deployed evenly across confounding influences. Routine analyses of individual block CP data would be useful and enhanced by addition of relevant information to the block records.
Resumo:
We examine the event statistics obtained from two differing simplified models for earthquake faults. The first model is a reproduction of the Block-Slider model of Carlson et al. (1991), a model often employed in seismicity studies. The second model is an elastodynamic fault model based upon the Lattice Solid Model (LSM) of Mora and Place (1994). We performed simulations in which the fault length was varied in each model and generated synthetic catalogs of event sizes and times. From these catalogs, we constructed interval event size distributions and inter-event time distributions. The larger, localised events in the Block-Slider model displayed the same scaling behaviour as events in the LSM however the distribution of inter-event times was markedly different. The analysis of both event size and inter-event time statistics is an effective method for comparative studies of differing simplified models for earthquake faults.
Resumo:
We present phase-space techniques for the modelling of spontaneous emission in two-level bosonic atoms. The positive-P representation is shown to give a full and complete description within the limits of our model. The Wigner representation, even when truncated at second order, is shown to need a doubling of the phase-space to allow for a positive-definite diffusion matrix in the appropriate Fokker-Planck equation and still fails to agree with the full quantum results of the positive-P representation. We show that quantum statistics and correlations between the ground and excited states affect the dynamics of the emission process, so that it is in general non-exponential. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
QTL detection experiments in livestock species commonly use the half-sib design. Each male is mated to a number of females, each female producing a limited number of progeny. Analysis consists of attempting to detect associations between phenotype and genotype measured on the progeny. When family sizes are limiting experimenters may wish to incorporate as much information as possible into a single analysis. However, combining information across sires is problematic because of incomplete linkage disequilibrium between the markers and the QTL in the population. This study describes formulae for obtaining MLEs via the expectation maximization (EM) algorithm for use in a multiple-trait, multiple-family analysis. A model specifying a QTL with only two alleles, and a common within sire error variance is assumed. Compared to single-family analyses, power can be improved up to fourfold with multi-family analyses. The accuracy and precision of QTL location estimates are also substantially improved. With small family sizes, the multi-family, multi-trait analyses reduce substantially, but not totally remove, biases in QTL effect estimates. In situations where multiple QTL alleles are segregating the multi-family analysis will average out the effects of the different QTL alleles.
Resumo:
The aim of this report is to describe the use of WinBUGS for two datasets that arise from typical population pharmacokinetic studies. The first dataset relates to gentamicin concentration-time data that arose as part of routine clinical care of 55 neonates. The second dataset incorporated data from 96 patients receiving enoxaparin. Both datasets were originally analyzed by using NONMEM. In the first instance, although NONMEM provided reasonable estimates of the fixed effects parameters it was unable to provide satisfactory estimates of the between-subject variance. In the second instance, the use of NONMEM resulted in the development of a successful model, albeit with limited available information on the between-subject variability of the pharmacokinetic parameters. WinBUGS was used to develop a model for both of these datasets. Model comparison for the enoxaparin dataset was performed by using the posterior distribution of the log-likelihood and a posterior predictive check. The use of WinBUGS supported the same structural models tried in NONMEM. For the gentamicin dataset a one-compartment model with intravenous infusion was developed, and the population parameters including the full between-subject variance-covariance matrix were available. Analysis of the enoxaparin dataset supported a two compartment model as superior to the one-compartment model, based on the posterior predictive check. Again, the full between-subject variance-covariance matrix parameters were available. Fully Bayesian approaches using MCMC methods, via WinBUGS, can offer added value for analysis of population pharmacokinetic data.
Resumo:
Purpose To evaluate the use of leflunomide in the Australian community since introduction in 2000. Trends in adverse drug reaction (ADR) reporting were also studied. Methods Annual Australian prescription and dispensing statistics were analysed. Drug utilisation was estimated as defined daily doses (DDD)/1000 inhabitants/day. ADR data from the Therapeutic Goods Administration's Adverse Drug Reactions Advisory Committee (ADRAC) national monitoring system were compared with the World Health Organisation (WHO) Vigibase records. Results Leflunomide use in Australia (dispensing data) increased from 0.2 in 2000 to 0.4 DDD/1000 inhabitants/day in 2002. The same overall pattern was observed in the 'authority to prescribe' data. From 2000-2002, prescribing of the starter pack (3 x 100 mg loading dose plus 30 x 20 mg tablets) declined (down 74%); likewise for the 20mg (30 tablets) pack. Gradual increases were noted for the 10 mg (30 tablets) pack (up 40%). Approximately 135 reports, detailing about 370 individual ADR, were generated annually. Gastro-intestinal disorders predominated, accounting for 24% of reactions reported to ADRAC. Skin and appendages disorders constituted 14% of reported reactions. Deaths in leflunomide users were attributed to a combination of haematological and gastro-intestinal complications, but it was not possible to ascertain other medication usage or contributing factors. Trends observed with the ADRAC reports were consistent with the WHO database. Conclusions Leflunomide was the first registered DMARD in Australia in over a decade and its use has increased within the community. The ADR reports might have contributed to Australian rheumatologists gradually abandoning loading patients with high doses of leflunomide in favour of starting therapy at lower doses. Copyright (c) 2006 John Wiley & Sons, Ltd.
Resumo:
The main purpose of this article is to gain an insight into the relationships between variables describing the environmental conditions of the Far Northern section of the Great Barrier Reef, Australia, Several of the variables describing these conditions had different measurement levels and often they had non-linear relationships. Using non-linear principal component analysis, it was possible to acquire an insight into these relationships. Furthermore. three geographical areas with unique environmental characteristics could be identified. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
A set of techniques referred to as circular statistics has been developed for the analysis of directional and orientational data. The unit of measure for such data is angular (usually in either degrees or radians), and the statistical distributions underlying the techniques are characterised by their cyclic nature-for example, angles of 359.9 degrees are considered close to angles of 0 degrees. In this paper, we assert that such approaches can be easily adapted to analyse time-of-day and time-of-week data, and in particular daily cycles in the numbers of incidents reported to the police. We begin the paper by describing circular statistics. We then discuss how these may be modified, and demonstrate the approach with some examples for reported incidents in the Cardiff area of Wales. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).
Resumo:
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.