56 resultados para comparison method
em CentAUR: Central Archive University of Reading - UK
Resumo:
This paper introduces a simple futility design that allows a comparative clinical trial to be stopped due to lack of effect at any of a series of planned interim analyses. Stopping due to apparent benefit is not permitted. The design is for use when any positive claim should be based on the maximum sample size, for example to allow subgroup analyses or the evaluation of safety or secondary efficacy responses. A final frequentist analysis can be performed that is valid for the type of design employed. Here the design is described and its properties are presented. Its advantages and disadvantages relative to the use of stochastic curtailment are discussed. Copyright (C) 2003 John Wiley Sons, Ltd.
Resumo:
Recent interest in the validation of general circulation models (GCMs) has been devoted to objective methods. A small number of authors have used the direct synoptic identification of phenomena together with a statistical analysis to perform the objective comparison between various datasets. This paper describes a general method for performing the synoptic identification of phenomena that can be used for an objective analysis of atmospheric, or oceanographic, datasets obtained from numerical models and remote sensing. Methods usually associated with image processing have been used to segment the scene and to identify suitable feature points to represent the phenomena of interest. This is performed for each time level. A technique from dynamic scene analysis is then used to link the feature points to form trajectories. The method is fully automatic and should be applicable to a wide range of geophysical fields. An example will be shown of results obtained from this method using data obtained from a run of the Universities Global Atmospheric Modelling Project GCM.
Resumo:
Data from four recent reanalysis projects [ECMWF, NCEP-NCAR, NCEP - Department of Energy ( DOE), NASA] have been diagnosed at the scale of synoptic weather systems using an objective feature tracking method. The tracking statistics indicate that, overall, the reanalyses correspond very well in the Northern Hemisphere (NH) lower troposphere, although differences for the spatial distribution of mean intensities show that the ECMWF reanalysis is systematically stronger in the main storm track regions but weaker around major orographic features. A direct comparison of the track ensembles indicates a number of systems with a broad range of intensities that compare well among the reanalyses. In addition, a number of small-scale weak systems are found that have no correspondence among the reanalyses or that only correspond upon relaxing the matching criteria, indicating possible differences in location and/or temporal coherence. These are distributed throughout the storm tracks, particularly in the regions known for small-scale activity, such as secondary development regions and the Mediterranean. For the Southern Hemisphere (SH), agreement is found to be generally less consistent in the lower troposphere with significant differences in both track density and mean intensity. The systems that correspond between the various reanalyses are considerably reduced and those that do not match span a broad range of storm intensities. Relaxing the matching criteria indicates that there is a larger degree of uncertainty in both the location of systems and their intensities compared with the NH. At upper-tropospheric levels, significant differences in the level of activity occur between the ECMWF reanalysis and the other reanalyses in both the NH and SH winters. This occurs due to a lack of coherence in the apparent propagation of the systems in ERA15 and appears most acute above 500 hPa. This is probably due to the use of optimal interpolation data assimilation in ERA15. Also shown are results based on using the same techniques to diagnose the tropical easterly wave activity. Results indicate that the wave activity is sensitive not only to the resolution and assimilation methods used but also to the model formulation.
Resumo:
We discuss and test the potential usefulness of single-column models (SCMs) for the testing of stchastic physics schemes that have been proposed for use in general circulation models (GCMs). We argue that although single column tests cannot be definitive in exposing the full behaviour of a stochastic method in the full GCM, and although there are differences between SCM testing of deterministic and stochastic methods, nonetheless SCM testing remains a useful tool. It is necessary to consider an ensemble of SCM runs produced by the stochastic method. These can be usefully compared to deterministic ensembles describing initial condition uncertainty and also to combinations of these (with structural model changes) into poor man's ensembles. The proposed methodology is demonstrated using an SCM experiment recently developed by the GCSS community, simulating the transitions between active and suppressed periods of tropical convection.
Resumo:
We discuss and test the potential usefulness of single-column models (SCMs) for the testing of stochastic physics schemes that have been proposed for use in general circulation models (GCMs). We argue that although single column tests cannot be definitive in exposing the full behaviour of a stochastic method in the full GCM, and although there are differences between SCM testing of deterministic and stochastic methods, SCM testing remains a useful tool. It is necessary to consider an ensemble of SCM runs produced by the stochastic method. These can be usefully compared to deterministic ensembles describing initial condition uncertainty and also to combinations of these (with structural model changes) into poor man's ensembles. The proposed methodology is demonstrated using an SCM experiment recently developed by the GCSS (GEWEX Cloud System Study) community, simulating transitions between active and suppressed periods of tropical convection.
Resumo:
There is a growing interest in using stochastic parametrizations in numerical weather and climate prediction models. Previously, Palmer (2001) outlined the issues that give rise to the need for a stochastic parametrization and the forms such a parametrization could take. In this article a method is presented that uses a comparison between a standard-resolution version and a high-resolution version of the same model to gain information relevant for a stochastic parametrization in that model. A correction term that could be used in a stochastic parametrization is derived from the thermodynamic equations of both models. The origin of the components of this term is discussed. It is found that the component related to unresolved wave-wave interactions is important and can act to compensate for large parametrized tendencies. The correction term is not proportional to the parametrized tendency. Finally, it is explained how the correction term could be used to give information about the shape of the random distribution to be used in a stochastic parametrization. Copyright © 2009 Royal Meteorological Society
Resumo:
We develop the linearization of a semi-implicit semi-Lagrangian model of the one-dimensional shallow-water equations using two different methods. The usual tangent linear model, formed by linearizing the discrete nonlinear model, is compared with a model formed by first linearizing the continuous nonlinear equations and then discretizing. Both models are shown to perform equally well for finite perturbations. However, the asymptotic behaviour of the two models differs as the perturbation size is reduced. This leads to difficulties in showing that the models are correctly coded using the standard tests. To overcome this difficulty we propose a new method for testing linear models, which we demonstrate both theoretically and numerically. © Crown copyright, 2003. Royal Meteorological Society
Resumo:
Estimating the magnitude of Agulhas leakage, the volume flux of water from the Indian to the Atlantic Ocean, is difficult because of the presence of other circulation systems in the Agulhas region. Indian Ocean water in the Atlantic Ocean is vigorously mixed and diluted in the Cape Basin. Eulerian integration methods, where the velocity field perpendicular to a section is integrated to yield a flux, have to be calibrated so that only the flux by Agulhas leakage is sampled. Two Eulerian methods for estimating the magnitude of Agulhas leakage are tested within a high-resolution two-way nested model with the goal to devise a mooring-based measurement strategy. At the GoodHope line, a section halfway through the Cape Basin, the integrated velocity perpendicular to that line is compared to the magnitude of Agulhas leakage as determined from the transport carried by numerical Lagrangian floats. In the first method, integration is limited to the flux of water warmer and more saline than specific threshold values. These threshold values are determined by maximizing the correlation with the float-determined time series. By using the threshold values, approximately half of the leakage can directly be measured. The total amount of Agulhas leakage can be estimated using a linear regression, within a 90% confidence band of 12 Sv. In the second method, a subregion of the GoodHope line is sought so that integration over that subregion yields an Eulerian flux as close to the float-determined leakage as possible. It appears that when integration is limited within the model to the upper 300 m of the water column within 900 km of the African coast the time series have the smallest root-mean-square difference. This method yields a root-mean-square error of only 5.2 Sv but the 90% confidence band of the estimate is 20 Sv. It is concluded that the optimum thermohaline threshold method leads to more accurate estimates even though the directly measured transport is a factor of two lower than the actual magnitude of Agulhas leakage in this model.
Resumo:
The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) is a World Weather Research Programme project. One of its main objectives is to enhance collaboration on the development of ensemble prediction between operational centers and universities by increasing the availability of ensemble prediction system (EPS) data for research. This study analyzes the prediction of Northern Hemisphere extratropical cyclones by nine different EPSs archived as part of the TIGGE project for the 6-month time period of 1 February 2008–31 July 2008, which included a sample of 774 cyclones. An objective feature tracking method has been used to identify and track the cyclones along the forecast trajectories. Forecast verification statistics have then been produced [using the European Centre for Medium-Range Weather Forecasts (ECMWF) operational analysis as the truth] for cyclone position, intensity, and propagation speed, showing large differences between the different EPSs. The results show that the ECMWF ensemble mean and control have the highest level of skill for all cyclone properties. The Japanese Meteorological Administration (JMA), the National Centers for Environmental Prediction (NCEP), the Met Office (UKMO), and the Canadian Meteorological Centre (CMC) have 1 day less skill for the position of cyclones throughout the forecast range. The relative performance of the different EPSs remains the same for cyclone intensity except for NCEP, which has larger errors than for position. NCEP, the Centro de Previsão de Tempo e Estudos Climáticos (CPTEC), and the Australian Bureau of Meteorology (BoM) all have faster intensity error growth in the earlier part of the forecast. They are also very underdispersive and significantly underpredict intensities, perhaps due to the comparatively low spatial resolutions of these EPSs not being able to accurately model the tilted structure essential to cyclone growth and decay. There is very little difference between the levels of skill of the ensemble mean and control for cyclone position, but the ensemble mean provides an advantage over the control for all EPSs except CPTEC in cyclone intensity and there is an advantage for propagation speed for all EPSs. ECMWF and JMA have an excellent spread–skill relationship for cyclone position. The EPSs are all much more underdispersive for cyclone intensity and propagation speed than for position, with ECMWF and CMC performing best for intensity and CMC performing best for propagation speed. ECMWF is the only EPS to consistently overpredict cyclone intensity, although the bias is small. BoM, NCEP, UKMO, and CPTEC significantly underpredict intensity and, interestingly, all the EPSs underpredict the propagation speed, that is, the cyclones move too slowly on average in all EPSs.
Resumo:
We consider the comparison of two formulations in terms of average bioequivalence using the 2 × 2 cross-over design. In a bioequivalence study, the primary outcome is a pharmacokinetic measure, such as the area under the plasma concentration by time curve, which is usually assumed to have a lognormal distribution. The criterion typically used for claiming bioequivalence is that the 90% confidence interval for the ratio of the means should lie within the interval (0.80, 1.25), or equivalently the 90% confidence interval for the differences in the means on the natural log scale should be within the interval (-0.2231, 0.2231). We compare the gold standard method for calculation of the sample size based on the non-central t distribution with those based on the central t and normal distributions. In practice, the differences between the various approaches are likely to be small. Further approximations to the power function are sometimes used to simplify the calculations. These approximations should be used with caution, because the sample size required for a desirable level of power might be under- or overestimated compared to the gold standard method. However, in some situations the approximate methods produce very similar sample sizes to the gold standard method. Copyright © 2005 John Wiley & Sons, Ltd.
Resumo:
Sequential methods provide a formal framework by which clinical trial data can be monitored as they accumulate. The results from interim analyses can be used either to modify the design of the remainder of the trial or to stop the trial as soon as sufficient evidence of either the presence or absence of a treatment effect is available. The circumstances under which the trial will be stopped with a claim of superiority for the experimental treatment, must, however, be determined in advance so as to control the overall type I error rate. One approach to calculating the stopping rule is the group-sequential method. A relatively recent alternative to group-sequential approaches is the adaptive design method. This latter approach provides considerable flexibility in changes to the design of a clinical trial at an interim point. However, a criticism is that the method by which evidence from different parts of the trial is combined means that a final comparison of treatments is not based on a sufficient statistic for the treatment difference, suggesting that the method may lack power. The aim of this paper is to compare two adaptive design approaches with the group-sequential approach. We first compare the form of the stopping boundaries obtained using the different methods. We then focus on a comparison of the power of the different trials when they are designed so as to be as similar as possible. We conclude that all methods acceptably control type I error rate and power when the sample size is modified based on a variance estimate, provided no interim analysis is so small that the asymptotic properties of the test statistic no longer hold. In the latter case, the group-sequential approach is to be preferred. Provided that asymptotic assumptions hold, the adaptive design approaches control the type I error rate even if the sample size is adjusted on the basis of an estimate of the treatment effect, showing that the adaptive designs allow more modifications than the group-sequential method.
Resumo:
MOTIVATION: The accurate prediction of the quality of 3D models is a key component of successful protein tertiary structure prediction methods. Currently, clustering or consensus based Model Quality Assessment Programs (MQAPs) are the most accurate methods for predicting 3D model quality; however they are often CPU intensive as they carry out multiple structural alignments in order to compare numerous models. In this study, we describe ModFOLDclustQ - a novel MQAP that compares 3D models of proteins without the need for CPU intensive structural alignments by utilising the Q measure for model comparisons. The ModFOLDclustQ method is benchmarked against the top established methods in terms of both accuracy and speed. In addition, the ModFOLDclustQ scores are combined with those from our older ModFOLDclust method to form a new method, ModFOLDclust2, that aims to provide increased prediction accuracy with negligible computational overhead. RESULTS: The ModFOLDclustQ method is competitive with leading clustering based MQAPs for the prediction of global model quality, yet it is up to 150 times faster than the previous version of the ModFOLDclust method at comparing models of small proteins (<60 residues) and over 5 times faster at comparing models of large proteins (>800 residues). Furthermore, a significant improvement in accuracy can be gained over the previous clustering based MQAPs by combining the scores from ModFOLDclustQ and ModFOLDclust to form the new ModFOLDclust2 method, with little impact on the overall time taken for each prediction. AVAILABILITY: The ModFOLDclustQ and ModFOLDclust2 methods are available to download from: http://www.reading.ac.uk/bioinf/downloads/ CONTACT: l.j.mcguffin@reading.ac.uk.
Resumo:
The abattoir and the fallen stock surveys constitute the active surveillance component aimed at improving the detection of scrapie across the European Union. Previous studies have suggested the occurrence of significant differences in the operation of the surveys across the EU. In the present study we assessed the standardisation of the surveys throughout time across the EU and identified clusters of countries with similar underlying characteristics allowing comparisons between them. In the absence of sufficient covariate information to explain the observed variability across countries, we modelled the unobserved heterogeneity by means of non-parametric distributions on the risk ratios of the fallen stock over the abattoir survey. More specifically, we used the profile likelihood method on 2003, 2004 and 2005 active surveillance data for 18 European countries on classical scrapie, and on 2004 and 2005 data for atypical scrapie separately. We extended our analyses to include the limited covariate information available, more specifically, the proportion of the adult sheep population sampled by the fallen stock survey every year. Our results show that the between-country heterogeneity dropped in 2004 and 2005 relative to that of 2003 for classical scrapie. As a consequence, the number of clusters in the last two years was also reduced indicating the gradual standardisation of the surveillance efforts across the EU. The crude analyses of the atypical data grouped all the countries in one cluster and showed non-significant gain in the detection of this type of scrapie by any of the two sources. The proportion of the population sampled by the fallen stock appeared significantly associated with our risk ratio for both types of scrapie, although in opposite directions: negative for classical and positive for atypical. The initial justification for the fallen stock, targeting a high-risk population to increase the likelihood of case finding, appears compromised for both types of scrapie in some countries.
Resumo:
We focus on the comparison of three statistical models used to estimate the treatment effect in metaanalysis when individually pooled data are available. The models are two conventional models, namely a multi-level and a model based upon an approximate likelihood, and a newly developed model, the profile likelihood model which might be viewed as an extension of the Mantel-Haenszel approach. To exemplify these methods, we use results from a meta-analysis of 22 trials to prevent respiratory tract infections. We show that by using the multi-level approach, in the case of baseline heterogeneity, the number of clusters or components is considerably over-estimated. The approximate and profile likelihood method showed nearly the same pattern for the treatment effect distribution. To provide more evidence two simulation studies are accomplished. The profile likelihood can be considered as a clear alternative to the approximate likelihood model. In the case of strong baseline heterogeneity, the profile likelihood method shows superior behaviour when compared with the multi-level model. Copyright (C) 2006 John Wiley & Sons, Ltd.