924 resultados para Multivariate statistics
Resumo:
Long-term monitoring of forest soils as part of a pan-European network to detect environmental change depends on an accurate determination of the mean of the soil properties at each monitoring event. Forest soil is known to be very variable spatially, however. A study was undertaken to explore and quantify this variability at three forest monitoring plots in Britain. Detailed soil sampling was carried out, and the data from the chemical analyses were analysed by classical statistics and geostatistics. An analysis of variance showed that there were no consistent effects from the sample sites in relation to the position of the trees. The variogram analysis showed that there was spatial dependence at each site for several variables and some varied in an apparently periodic way. An optimal sampling analysis based on the multivariate variogram for each site suggested that a bulked sample from 36 cores would reduce error to an acceptable level. Future sampling should be designed so that it neither targets nor avoids trees and disturbed ground. This can be achieved best by using a stratified random sampling design.
Resumo:
A regional climate model is used to investigate changes in Israel and Jordan precipitation at the end of the 21st century on daily to monthly timescales. The model predicts that this region will get significantly drier at the peak of the rainy season, reflecting a reduction in both the frequency and duration of rainy events. These changes may be associated with a reduction in the strength of the Mediterranean storm track
Resumo:
Direct numerical simulations of turbulent flow over regular arrays of urban-like, cubical obstacles are reported. Results are analysed in terms of a formal spatial averaging procedure to enable interpretation of the flow within the arrays as a canopy flow, and of the flow above as a rough wall boundary layer. Spatial averages of the mean velocity, turbulent stresses and pressure drag are computed. The statistics compare very well with data from wind-tunnel experiments. Within the arrays the time-averaged flow structure gives rise to significant 'dispersive stress' whereas above the Reynolds stress dominates. The mean flow structure and turbulence statistics depend significantly on the layout of the cubes. Unsteady effects are important, especially in the lower canopy layer where turbulent fluctuations dominate over the mean flow.
Resumo:
We describe the main features of a program written to perform electronic marking of quantitative or simple text questions. One of the main benefits is that it can check answers for being consistent with earlier errors, so can cope with a range of numerical questions. We summarise our experience of using it in a statistics course taught to 200 bioscience students.
Resumo:
Maize silage nutritive quality is routinely determined by near infrared reflectance spectroscopy (NIRS). However, little is known about the impact of sample preparation on the accuracy of the calibration to predict biological traits. A sample population of 48 maize silages representing a wide range of physiological maturities was used in a study to determine the impact of different sample preparation procedures (i.e., drying regimes; the presence or absence of residual moisture; the degree of particle comminution) on resultant NIR prediction statistics. All silages were scanned using a total of 12 combinations of sample pre-treatments. Each sample preparation combination was subjected to three multivariate regression techniques to give a total of 36 predictions per biological trait. Increased sample preparations procedure, relative to scanning the unprocessed whole plant (WP) material, always resulted in a numerical minimisation of model statistics. However, the ability of each of the treatments to significantly minimise the model statistics differed. Particle comminution was the most important factor, oven-drying regime was intermediate, and residual moisture presence was the least important. Models to predict various biological parameters of maize silage will be improved if material is subjected to a high degree of particle comminution (i.e., having been passed through a 1 mm screen) and developed on plant material previously dried at 60 degrees C. The extra effort in terms of time and cost required to remove sample residual moisture cannot be justified. (c) 2005 Elsevier B.V. All rights reserved.
A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms
Resumo:
Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.
Resumo:
In clinical trials, situations often arise where more than one response from each patient is of interest; and it is required that any decision to stop the study be based upon some or all of these measures simultaneously. Theory for the design of sequential experiments with simultaneous bivariate responses is described by Jennison and Turnbull (Jennison, C., Turnbull, B. W. (1993). Group sequential tests for bivariate response: interim analyses of clinical trials with both efficacy and safety endpoints. Biometrics 49:741-752) and Cook and Farewell (Cook, R. J., Farewell, V. T. (1994). Guidelines for monitoring efficacy and toxicity responses in clinical trials. Biometrics 50:1146-1152) in the context of one efficacy and one safety response. These expositions are in terms of normally distributed data with known covariance. The methods proposed require specification of the correlation, ρ between test statistics monitored as part of the sequential test. It can be difficult to quantify ρ and previous authors have suggested simply taking the lowest plausible value, as this will guarantee power. This paper begins with an illustration of the effect that inappropriate specification of ρ can have on the preservation of trial error rates. It is shown that both the type I error and the power can be adversely affected. As a possible solution to this problem, formulas are provided for the calculation of correlation from data collected as part of the trial. An adaptive approach is proposed and evaluated that makes use of these formulas and an example is provided to illustrate the method. Attention is restricted to the bivariate case for ease of computation, although the formulas derived are applicable in the general multivariate case.
Resumo:
This paper presents our experience with combining statistical principles and participatory methods to generate national statistics. The methodology was developed in Malawi during 1999–2002. We demonstrate that if PRA is combined with statistical principles (including probability-based sampling and standardization), it can produce total population statistics and estimates of the proportion of households with certain characteristics (e.g., poverty). It can also provide quantitative data on complex issues of national importance such as poverty targeting. This approach is distinct from previous PRA-based approaches, which generate numbers at community level but only provide qualitative information at national level.
Resumo:
Multivariate statistical methods were used to investigate file Causes of toxicity and controls on groundwater chemistry from 274 boreholes in an Urban area (London) of the United Kingdom. The groundwater was alkaline to neutral, and chemistry was dominated by calcium, sodium, and Sulfate. Contaminants included fuels, solvents, and organic compounds derived from landfill material. The presence of organic material in the aquifer caused decreases in dissolved oxygen, sulfate and nitrate concentrations. and increases in ferrous iron and ammoniacal nitrogen concentrations. Pearson correlations between toxicity results and the concentration of individual analytes indicated that concentrations of ammoinacal nitrogen, dissolved oxygen, ferrous iron, and hydrocarbons were important where present. However, principal component and regression analysis suggested no significant correlation between toxicity and chemistry over the whole area. Multidimensional Scaling was used to investigate differences in sites caused by historical use, landfill gas status, or position within the sample area. Significant differences were observed between sites with different historical land use and those with different gas status. Examination of the principal component matrix revealed that these differences are related to changes in the importance of reduced chemical species.