32 resultados para statistic


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The evolutionary history of gains and losses of vegetative reproductive propagules (soredia) in Porpidia s.l., a group of lichen-forming ascomycetes, was clarified using Bayesian Markov chain Monte Carlo (MCMC) approaches to monophyly tests and a combined MCMC and maximum likelihood approach to ancestral character state reconstructions. The MCMC framework provided confidence estimates for the reconstructions of relationships and ancestral character states, which formed the basis for tests of evolutionary hypotheses. Monophyly tests rejected all hypotheses that predicted any clustering of reproductive modes in extant taxa. In addition, a nearest-neighbor statistic could not reject the hypothesis that the vegetative reproductive mode is randomly distributed throughout the group. These results show that transitions between presence and absence of the vegetative reproductive mode within Porpidia s.l. occurred several times and independently of each other. Likelihood reconstructions of ancestral character states at selected nodes suggest that - contrary to previous thought - the ancestor to Porpidia s.l. already possessed the vegetative reproductive mode. Furthermore, transition rates are reconstructed asymmetrically with the vegetative reproductive mode being gained at a much lower rate than it is lost. A cautious note has to be added, because a simulation study showed that the ancestral character state reconstructions were highly dependent on taxon sampling. However, our central conclusions, particularly the higher rate of change from vegetative reproductive mode present to absent than vice versa within Porpidia s.l., were found to be broadly independent of taxon sampling. [Ancestral character state reconstructions; Ascomycota, Bayesian inference; hypothesis testing; likelihood; MCMC; Porpidia; reproductive systems]

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We describe and evaluate a new estimator of the effective population size (N-e), a critical parameter in evolutionary and conservation biology. This new "SummStat" N-e. estimator is based upon the use of summary statistics in an approximate Bayesian computation framework to infer N-e. Simulations of a Wright-Fisher population with known N-e show that the SummStat estimator is useful across a realistic range of individuals and loci sampled, generations between samples, and N-e values. We also address the paucity of information about the relative performance of N-e estimators by comparing the SUMMStat estimator to two recently developed likelihood-based estimators and a traditional moment-based estimator. The SummStat estimator is the least biased of the four estimators compared. In 32 of 36 parameter combinations investigated rising initial allele frequencies drawn from a Dirichlet distribution, it has the lowest bias. The relative mean square error (RMSE) of the SummStat estimator was generally intermediate to the others. All of the estimators had RMSE > 1 when small samples (n = 20, five loci) were collected a generation apart. In contrast, when samples were separated by three or more generations and Ne less than or equal to 50, the SummStat and likelihood-based estimators all had greatly reduced RMSE. Under the conditions simulated, SummStat confidence intervals were more conservative than the likelihood-based estimators and more likely to include true N-e. The greatest strength of the SummStat estimator is its flexible structure. This flexibility allows it to incorporate any, potentially informative summary statistic from Population genetic data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper new robust nonlinear model construction algorithms for a large class of linear-in-the-parameters models are introduced to enhance model robustness, including three algorithms using combined A- or D-optimality or PRESS statistic (Predicted REsidual Sum of Squares) with regularised orthogonal least squares algorithm respectively. A common characteristic of these algorithms is that the inherent computation efficiency associated with the orthogonalisation scheme in orthogonal least squares or regularised orthogonal least squares has been extended such that the new algorithms are computationally efficient. A numerical example is included to demonstrate effectiveness of the algorithms. Copyright (C) 2003 IFAC.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nonlinear system identification is considered using a generalized kernel regression model. Unlike the standard kernel model, which employs a fixed common variance for all the kernel regressors, each kernel regressor in the generalized kernel model has an individually tuned diagonal covariance matrix that is determined by maximizing the correlation between the training data and the regressor using a repeated guided random search based on boosting optimization. An efficient construction algorithm based on orthogonal forward regression with leave-one-out (LOO) test statistic and local regularization (LR) is then used to select a parsimonious generalized kernel regression model from the resulting full regression matrix. The proposed modeling algorithm is fully automatic and the user is not required to specify any criterion to terminate the construction procedure. Experimental results involving two real data sets demonstrate the effectiveness of the proposed nonlinear system identification approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A modified radial basis function (RBF) neural network and its identification algorithm based on observational data with heterogeneous noise are introduced. The transformed system output of Box-Cox is represented by the RBF neural network. To identify the model from observational data, the singular value decomposition of the full regression matrix consisting of basis functions formed by system input data is initially carried out and a new fast identification method is then developed using Gauss-Newton algorithm to derive the required Box-Cox transformation, based on a maximum likelihood estimator (MLE) for a model base spanned by the largest eigenvectors. Finally, the Box-Cox transformation-based RBF neural network, with good generalisation and sparsity, is identified based on the derived optimal Box-Cox transformation and an orthogonal forward regression algorithm using a pseudo-PRESS statistic to select a sparse RBF model with good generalisation. The proposed algorithm and its efficacy are demonstrated with numerical examples.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this correspondence new robust nonlinear model construction algorithms for a large class of linear-in-the-parameters models are introduced to enhance model robustness via combined parameter regularization and new robust structural selective criteria. In parallel to parameter regularization, we use two classes of robust model selection criteria based on either experimental design criteria that optimizes model adequacy, or the predicted residual sums of squares (PRESS) statistic that optimizes model generalization capability, respectively. Three robust identification algorithms are introduced, i.e., combined A- and D-optimality with regularized orthogonal least squares algorithm, respectively; and combined PRESS statistic with regularized orthogonal least squares algorithm. A common characteristic of these algorithms is that the inherent computation efficiency associated with the orthogonalization scheme in orthogonal least squares or regularized orthogonal least squares has been extended such that the new algorithms are computationally efficient. Numerical examples are included to demonstrate effectiveness of the algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A large fraction of papers in the climate literature includes erroneous uses of significance tests. A Bayesian analysis is presented to highlight the meaning of significance tests and why typical misuse occurs. The significance statistic is not a quantitative measure of how confident we can be of the ‘reality’ of a given result. It is concluded that a significance test very rarely provides useful quantitative information.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The polar winter stratospheric vortex is a coherent structure that undergoes different types of deformation that can be revealed by the geometric invariant moments. Three moments are used—the aspect ratio, the centroid latitude, and the area of the vortex based on stratospheric data from the 40-yr ECMWF Re-Analysis (ERA-40) project—to study sudden stratospheric warmings. Hierarchical clustering combined with data image visualization techniques is used as well. Using the gap statistic, three optimal clusters are obtained based on the three geometric moments considered here. The 850-K potential vorticity field, as well as the vertical profiles of polar temperature and zonal wind, provides evidence that the clusters represent, respectively, the undisturbed (U), displaced (D), and split (S) states of the polar vortex. This systematic method for identifying and characterizing the state of the polar vortex using objective methods is useful as a tool for analyzing observations and as a test for climate models to simulate the observations. The method correctly identifies all previously identified major warmings and also identifies significant minor warmings where the atmosphere is substantially disturbed but does not quite meet the criteria to qualify as a major stratospheric warming.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Reliability analysis of probabilistic forecasts, in particular through the rank histogram or Talagrand diagram, is revisited. Two shortcomings are pointed out: Firstly, a uniform rank histogram is but a necessary condition for reliability. Secondly, if the forecast is assumed to be reliable, an indication is needed how far a histogram is expected to deviate from uniformity merely due to randomness. Concerning the first shortcoming, it is suggested that forecasts be grouped or stratified along suitable criteria, and that reliability is analyzed individually for each forecast stratum. A reliable forecast should have uniform histograms for all individual forecast strata, not only for all forecasts as a whole. As to the second shortcoming, instead of the observed frequencies, the probability of the observed frequency is plotted, providing and indication of the likelihood of the result under the hypothesis that the forecast is reliable. Furthermore, a Goodness-Of-Fit statistic is discussed which is essentially the reliability term of the Ignorance score. The discussed tools are applied to medium range forecasts for 2 m-temperature anomalies at several locations and lead times. The forecasts are stratified along the expected ranked probability score. Those forecasts which feature a high expected score turn out to be particularly unreliable.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Association mapping, initially developed in human disease genetics, is now being applied to plant species. The model species Arabidopsis provided some of the first examples of association mapping in plants, identifying previously cloned flowering time genes, despite high population sub-structure. More recently, association genetics has been applied to barley, where breeding activity has resulted in a high degree of population sub-structure. A major genotypic division within barley is that between winter- and spring-sown varieties, which differ in their requirement for vernalization to promote subsequent flowering. To date, all attempts to validate association genetics in barley by identifying major flowering time loci that control vernalization requirement (VRN-H1 and VRN-H2) have failed. Here, we validate the use of association genetics in barley by identifying VRN-H1 and VRN-H2, despite their prominent role in determining population sub-structure. Results: By taking barley as a typical inbreeding crop, and seasonal growth habit as a major partitioning phenotype, we develop an association mapping approach which successfully identifies VRN-H1 and VRN-H2, the underlying loci largely responsible for this agronomic division. We find a combination of Structured Association followed by Genomic Control to correct for population structure and inflation of the test statistic, resolved significant associations only with VRN-H1 and the VRN-H2 candidate genes, as well as two genes closely linked to VRN-H1 (HvCSFs1 and HvPHYC). Conclusion: We show that, after employing appropriate statistical methods to correct for population sub-structure, the genome-wide partitioning effect of allelic status at VRN-H1 and VRN-H2 does not result in the high levels of spurious association expected to occur in highly structured samples. Furthermore, we demonstrate that both VRN-H1 and the candidate VRN-H2 genes can be identified using association mapping. Discrimination between intragenic VRN-H1 markers was achieved, indicating that candidate causative polymorphisms may be discerned and prioritised within a larger set of positive associations. This proof of concept study demonstrates the feasibility of association mapping in barley, even within highly structured populations. A major advantage of this method is that it does not require large numbers of genome-wide markers, and is therefore suitable for fine mapping and candidate gene evaluation, especially in species for which large numbers of genetic markers are either unavailable or too costly.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Optimal estimation (OE) improves sea surface temperature (SST) estimated from satellite infrared imagery in the “split-window”, in comparison to SST retrieved using the usual multi-channel (MCSST) or non-linear (NLSST) estimators. This is demonstrated using three months of observations of the Advanced Very High Resolution Radiometer (AVHRR) on the first Meteorological Operational satellite (Metop-A), matched in time and space to drifter SSTs collected on the global telecommunications system. There are 32,175 matches. The prior for the OE is forecast atmospheric fields from the Météo-France global numerical weather prediction system (ARPEGE), the forward model is RTTOV8.7, and a reduced state vector comprising SST and total column water vapour (TCWV) is used. Operational NLSST coefficients give mean and standard deviation (SD) of the difference between satellite and drifter SSTs of 0.00 and 0.72 K. The “best possible” NLSST and MCSST coefficients, empirically regressed on the data themselves, give zero mean difference and SDs of 0.66 K and 0.73 K respectively. Significant contributions to the global SD arise from regional systematic errors (biases) of several tenths of kelvin in the NLSST. With no bias corrections to either prior fields or forward model, the SSTs retrieved by OE minus drifter SSTs have mean and SD of − 0.16 and 0.49 K respectively. The reduction in SD below the “best possible” regression results shows that OE deals with structural limitations of the NLSST and MCSST algorithms. Using simple empirical bias corrections to improve the OE, retrieved minus drifter SSTs are obtained with mean and SD of − 0.06 and 0.44 K respectively. Regional biases are greatly reduced, such that the absolute bias is less than 0.1 K in 61% of 10°-latitude by 30°-longitude cells. OE also allows a statistic of the agreement between modelled and measured brightness temperatures to be calculated. We show that this measure is more efficient than the current system of confidence levels at identifying reliable retrievals, and that the best 75% of satellite SSTs by this measure have negligible bias and retrieval error of order 0.25 K.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper considers the effect of using a GARCH filter on the properties of the BDS test statistic as well as a number of other issues relating to the application of the test. It is found that, for certain values of the user-adjustable parameters, the finite sample distribution of the test is far-removed from asymptotic normality. In particular, when data generated from some completely different model class are filtered through a GARCH model, the frequency of rejection of iid falls, often substantially. The implication of this result is that it might be inappropriate to use non-rejection of iid of the standardised residuals of a GARCH model as evidence that the GARCH model ‘fits’ the data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Variations in lake area and depth reflect climatically induced changes in the water balance of overflowing as well as closed lakes. A new global data base of lake status has been assembled, and is used to compare two simulations for 6 ka (6000 yr ago) made with successive R15 versions of the NCAR Community Climate Model (CCM). Simulated water balance was expressed as anomalies of annual precipitation minus evaporation (P-E); observed water balance as anomalies of lake status. Comparisons were made visually, by comparing regional averages, and by a statistic that compares the signs of simulated P-E anomalies (smoothly interpolated to the lake sites) with the status anomalies. Both CCM0 and CCM1 showed enhanced Northern-Hemisphere monsoons at 6 ka. Both underestimated the effect, but CCM1 fitted the spatial patterns better. In the northern mid- and high-latitudes the two versions differed more, and fitted the data less satisfactorily. CCM1 performed better than CCM0 in North America and central Eurasia, but not in Europe. Both models (especially CCM0) simulated excessive aridity in interior Eurasia. The models were systematically wrong in the southern mid-latitudes. Problems may have been caused by inadequate treatment of changes in sea-surface conditions in both models. Palaeolake status data will continue to provide a benchmark for the evaluation of modelling improvements.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The presence of 10 virulence genes was examined using polymerase chain reaction (PCR) in 365 European O157 and non-O157 Escherichia coli isolates associated with verotoxin production. Strain-specific PCR data were analysed using hierarchical clustering. The resulting dendrogram clearly separated O157 from non-O157 strains. The former clustered typical high-risk seropathotype (SPT) A strains from all regions, including Sweden and Spain, which were homogenous by Cramer's V statistic, and strains with less typical O157 features mostly from Hungary. The non-O157 strains divided into a high-risk SPTB harbouring O26, O111 and O103 strains, a group pathogenic to pigs, and a group with few virulence genes other than for verotoxin. The data demonstrate SPT designation and selected PCR separated verotoxigenic E. coli of high and low risk to humans; although more virulence genes or pulsed-field gel electrophoresis will need to be included to separate high-risk strains further for epidemiological tracing.