998 resultados para 230204 Applied Statistics
Resumo:
In broader catchment scale investigations, there is a need to understand and ultimately exploit the spatial variation of agricultural crops for an improved economic return. In many instances, this spatial variation is temporally unstable and may be different for various crop attributes and crop species. In the Australian sugar industry, the opportunity arose to evaluate the performance of 231 farms in the Tully Mill area in far north Queensland using production information on cane yield (t/ha) and CCS ( a fresh weight measure of sucrose content in the cane) accumulated over a 12-year period. Such an arrangement of data can be expressed as a 3-way array where a farm x attribute x year matrix can be evaluated and interactions considered. Two multivariate techniques, the 3-way mixture method of clustering and the 3-mode principal component analysis, were employed to identify meaningful relationships between farms that performed similarly for both cane yield and CCS. In this context, farm has a spatial component and the aim of this analysis was to determine if systematic patterns in farm performance expressed by cane yield and CCS persisted over time. There was no spatial relationship between cane yield and CCS. However, the analysis revealed that the relationship between farms was remarkably stable from one year to the next for both attributes and there was some spatial aggregation of farm performance in parts of the mill area. This finding is important, since temporally consistent spatial variation may be exploited to improve regional production. Alternatively, the putative causes of the spatial variation may be explored to enhance the understanding of sugarcane production in the wet tropics of Australia.
Resumo:
When studying genotype X environment interaction in multi-environment trials, plant breeders and geneticists often consider one of the effects, environments or genotypes, to be fixed and the other to be random. However, there are two main formulations for variance component estimation for the mixed model situation, referred to as the unconstrained-parameters (UP) and constrained-parameters (CP) formulations. These formulations give different estimates of genetic correlation and heritability as well as different tests of significance for the random effects factor. The definition of main effects and interactions and the consequences of such definitions should be clearly understood, and the selected formulation should be consistent for both fixed and random effects. A discussion of the practical outcomes of using the two formulations in the analysis of balanced data from multi-environment trials is presented. It is recommended that the CP formulation be used because of the meaning of its parameters and the corresponding variance components. When managed (fixed) environments are considered, users will have more confidence in prediction for them but will not be overconfident in prediction in the target (random) environments. Genetic gain (predicted response to selection in the target environments from the managed environments) is independent of formulation.
Resumo:
Previous research has reported both agreements and serious anomalies in relationships between production attributes of sugarcane varieties in variety trials (VTs) and commercial production (CP). This paper examines VT and CP data for tonnes of cane per hectare (TCH) and sugar content (CCS). Data, analysed by REML, included 107 VTs and 54 CP mill years for 9 varieties from the mill districts of Mulgrave, Babinda, and Tully for harvest years 1982-99. Important consistencies included high TCH of Q152, high CCS of Q117 and Q120, and low CCS of H56-752. Significant anomalies existed with respect to TCH for Q113, Q117, Q120, Q122, Q138, and H56-752 and to CCS for Q113 and Q124. Investigation of these anomalies was assisted by access to independent REML analyses of CP data for 65692 individual Tully cane blocks from 1988 to 1999 and by the knowledge of persons familiar with the preferential uses of varieties by farmers. Minor anomalies were due to limited year or mill area data. Q124 TCH was deemed to be decreased and its CCS increased by severe disease in Babinda CP in the extremely wet 1998 and 1999 seasons. Other serious anomalies have credible but unsubstantiated explanations. The most convincing, for Q113, Q117, Q138, and H56-752, are that these varieties were deployed unevenly with regard to late season harvesting, predominant use or avoidance on high fertility soils, or use confined to low fertility sandy soils, respectively. Uneven deployment results in confounding of these effects in the varietal CP statistics at mill area level. It is concluded that VTs cannot be enhanced to anticipate or evaluate most effects of uneven deployment. They give adequate predictions of relative CP performance for varieties deployed evenly across confounding influences. Routine analyses of individual block CP data would be useful and enhanced by addition of relevant information to the block records.
Resumo:
An investigation was conducted to evaluate the impact of experimental designs and spatial analyses (single-trial models) of the response to selection for grain yield in the northern grains region of Australia (Queensland and northern New South Wales). Two sets of multi-environment experiments were considered. One set, based on 33 trials conducted from 1994 to 1996, was used to represent the testing system of the wheat breeding program and is referred to as the multi-environment trial (MET). The second set, based on 47 trials conducted from 1986 to 1993, sampled a more diverse set of years and management regimes and was used to represent the target population of environments (TPE). There were 18 genotypes in common between the MET and TPE sets of trials. From indirect selection theory, the phenotypic correlation coefficient between the MET and TPE single-trial adjusted genotype means [r(p(MT))] was used to determine the effect of the single-trial model on the expected indirect response to selection for grain yield in the TPE based on selection in the MET. Five single-trial models were considered: randomised complete block (RCB), incomplete block (IB), spatial analysis (SS), spatial analysis with a measurement error (SSM) and a combination of spatial analysis and experimental design information to identify the preferred (PF) model. Bootstrap-resampling methodology was used to construct multiple MET data sets, ranging in size from 2 to 20 environments per MET sample. The size and environmental composition of the MET and the single-trial model influenced the r(p(MT)). On average, the PF model resulted in a higher r(p(MT)) than the IB, SS and SSM models, which were in turn superior to the RCB model for MET sizes based on fewer than ten environments. For METs based on ten or more environments, the r(p(MT)) was similar for all single-trial models.
Resumo:
Many studies on birds focus on the collection of data through an experimental design, suitable for investigation in a classical analysis of variance (ANOVA) framework. Although many findings are confirmed by one or more experts, expert information is rarely used in conjunction with the survey data to enhance the explanatory and predictive power of the model. We explore this neglected aspect of ecological modelling through a study on Australian woodland birds, focusing on the potential impact of different intensities of commercial cattle grazing on bird density in woodland habitat. We examine a number of Bayesian hierarchical random effects models, which cater for overdispersion and a high frequency of zeros in the data using WinBUGS and explore the variation between and within different grazing regimes and species. The impact and value of expert information is investigated through the inclusion of priors that reflect the experience of 20 experts in the field of bird responses to disturbance. Results indicate that expert information moderates the survey data, especially in situations where there are little or no data. When experts agreed, credible intervals for predictions were tightened considerably. When experts failed to agree, results were similar to those evaluated in the absence of expert information. Overall, we found that without expert opinion our knowledge was quite weak. The fact that the survey data is quite consistent, in general, with expert opinion shows that we do know something about birds and grazing and we could learn a lot faster if we used this approach more in ecology, where data are scarce. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local false discovery rate is provided for each gene, and it can be implemented so that the implied global false discovery rate is bounded as with the Benjamini-Hochberg methodology based on tail areas. The latter procedure is too conservative, unless it is modified according to the prior probability that a gene is not differentially expressed. An attractive feature of the mixture model approach is that it provides a framework for the estimation of this probability and its subsequent use in forming a decision rule. The rule can also be formed to take the false negative rate into account.
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
Background Regression to the mean (RTM) is a statistical phenomenon that can make natural variation in repeated data look like real change. It happens when unusually large or small measurements tend to be followed by measurements that are closer to the mean. Methods We give some examples of the phenomenon, and discuss methods to overcome it at the design and analysis stages of a study. Results The effect of RTM in a sample becomes more noticeable with increasing measurement error and when follow-up measurements are only examined on a sub-sample selected using a baseline value. Conclusions RTM is a ubiquitous phenomenon in repeated data and should always be considered as a possible cause of an observed change. Its effect can be alleviated through better study design and use of suitable statistical methods.
Resumo:
The bispectrum and third-order moment can be viewed as equivalent tools for testing for the presence of nonlinearity in stationary time series. This is because the bispectrum is the Fourier transform of the third-order moment. An advantage of the bispectrum is that its estimator comprises terms that are asymptotically independent at distinct bifrequencies under the null hypothesis of linearity. An advantage of the third-order moment is that its values in any subset of joint lags can be used in the test, whereas when using the bispectrum the entire (or truncated) third-order moment is required to construct the Fourier transform. In this paper, we propose a test for nonlinearity based upon the estimated third-order moment. We use the phase scrambling bootstrap method to give a nonparametric estimate of the variance of our test statistic under the null hypothesis. Using a simulation study, we demonstrate that the test obtains its target significance level, with large power, when compared to an existing standard parametric test that uses the bispectrum. Further we show how the proposed test can be used to identify the source of nonlinearity due to interactions at specific frequencies. We also investigate implications for heuristic diagnosis of nonstationarity.
Resumo:
QTL detection experiments in livestock species commonly use the half-sib design. Each male is mated to a number of females, each female producing a limited number of progeny. Analysis consists of attempting to detect associations between phenotype and genotype measured on the progeny. When family sizes are limiting experimenters may wish to incorporate as much information as possible into a single analysis. However, combining information across sires is problematic because of incomplete linkage disequilibrium between the markers and the QTL in the population. This study describes formulae for obtaining MLEs via the expectation maximization (EM) algorithm for use in a multiple-trait, multiple-family analysis. A model specifying a QTL with only two alleles, and a common within sire error variance is assumed. Compared to single-family analyses, power can be improved up to fourfold with multi-family analyses. The accuracy and precision of QTL location estimates are also substantially improved. With small family sizes, the multi-family, multi-trait analyses reduce substantially, but not totally remove, biases in QTL effect estimates. In situations where multiple QTL alleles are segregating the multi-family analysis will average out the effects of the different QTL alleles.
Resumo:
With mixed feature data, problems are induced in modeling the gating network of normalized Gaussian (NG) networks as the assumption of multivariate Gaussian becomes invalid. In this paper, we propose an independence model to handle mixed feature data within the framework of NG networks. The method is illustrated using a real example of breast cancer data.
Resumo:
All signals that appear to be periodic have some sort of variability from period to period regardless of how stable they appear to be in a data plot. A true sinusoidal time series is a deterministic function of time that never changes and thus has zero bandwidth around the sinusoid's frequency. A zero bandwidth is impossible in nature since all signals have some intrinsic variability over time. Deterministic sinusoids are used to model cycles as a mathematical convenience. Hinich [IEEE J. Oceanic Eng. 25 (2) (2000) 256-261] introduced a parametric statistical model, called the randomly modulated periodicity (RMP) that allows one to capture the intrinsic variability of a cycle. As with a deterministic periodic signal the RMP can have a number of harmonics. The likelihood ratio test for this model when the amplitudes and phases are known is given in [M.J. Hinich, Signal Processing 83 (2003) 1349-13521. A method for detecting a RMP whose amplitudes and phases are unknown random process plus a stationary noise process is addressed in this paper. The only assumption on the additive noise is that it has finite dependence and finite moments. Using simulations based on a simple RMP model we show a case where the new method can detect the signal when the signal is not detectable in a standard waterfall spectrograrn display. (c) 2005 Elsevier B.V. All rights reserved.