983 resultados para Statistical Prediction
Resumo:
Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp). Results Model selection based on cross-validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross-validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.
Resumo:
The predictive potential of six selected factors was assessed in 72 patients with primary myelodysplastic syndrome using univariate and multivariate logistic regression analysis of survival at 18 months. Factors were age (above median of 69 years), dysplastic features in the three myeloid bone marrow cell lineages, presence of chromosome defects, all metaphases abnormal, double or complex chromosome defects (C23), and a Bournemouth score of 2, 3, or 4 (B234). In the multivariate approach, B234 and C23 proved to be significantly associated with a reduction in the survival probability. The similarity of the regression coefficients associated with these two factors means that they have about the same weight. Consequently, the model was simplified by counting the number of factors (0, 1, or 2) present in each patient, thus generating a scoring system called the Lausanne-Bournemouth score (LB score). The LB score combines the well-recognized and easy-to-use Bournemouth score (B score) with the chromosome defect complexity, C23 constituting an additional indicator of patient outcome. The predicted risk of death within 18 months calculated from the model is as follows: 7.1% (confidence interval: 1.7-24.8) for patients with an LB score of 0, 60.1% (44.7-73.8) for an LB score of 1, and 96.8% (84.5-99.4) for an LB score of 2. The scoring system presented here has several interesting features. The LB score may improve the predictive value of the B score, as it is able to recognize two prognostic groups in the intermediate risk category of patients with B scores of 2 or 3. It has also the ability to identify two distinct prognostic subclasses among RAEB and possibly CMML patients. In addition to its above-described usefulness in the prognostic evaluation, the LB score may bring new insights into the understanding of evolution patterns in MDS. We used the combination of the B score and chromosome complexity to define four classes which may be considered four possible states of myelodysplasia and which describe two distinct evolutional pathways.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.
Resumo:
Ground clutter caused by anomalous propagation (anaprop) can affect seriously radar rain rate estimates, particularly in fully automatic radar processing systems, and, if not filtered, can produce frequent false alarms. A statistical study of anomalous propagation detected from two operational C-band radars in the northern Italian region of Emilia Romagna is discussed, paying particular attention to its diurnal and seasonal variability. The analysis shows a high incidence of anaprop in summer, mainly in the morning and evening, due to the humid and hot summer climate of the Po Valley, particularly in the coastal zone. Thereafter, a comparison between different techniques and datasets to retrieve the vertical profile of the refractive index gradient in the boundary layer is also presented. In particular, their capability to detect anomalous propagation conditions is compared. Furthermore, beam path trajectories are simulated using a multilayer ray-tracing model and the influence of the propagation conditions on the beam trajectory and shape is examined. High resolution radiosounding data are identified as the best available dataset to reproduce accurately the local propagation conditions, while lower resolution standard TEMP data suffers from interpolation degradation and Numerical Weather Prediction model data (Lokal Model) are able to retrieve a tendency to superrefraction but not to detect ducting conditions. Observing the ray tracing of the centre, lower and upper limits of the radar antenna 3-dB half-power main beam lobe it is concluded that ducting layers produce a change in the measured volume and in the power distribution that can lead to an additional error in the reflectivity estimate and, subsequently, in the estimated rainfall rate.
Resumo:
Statistical models allow the representation of data sets and the estimation and/or prediction of the behavior of a given variable through its interaction with the other variables involved in a phenomenon. Among other different statistical models, are the autoregressive state-space models (ARSS) and the linear regression models (LR), which allow the quantification of the relationships among soil-plant-atmosphere system variables. To compare the quality of the ARSS and LR models for the modeling of the relationships between soybean yield and soil physical properties, Akaike's Information Criterion, which provides a coefficient for the selection of the best model, was used in this study. The data sets were sampled in a Rhodic Acrudox soil, along a spatial transect with 84 points spaced 3 m apart. At each sampling point, soybean samples were collected for yield quantification. At the same site, soil penetration resistance was also measured and soil samples were collected to measure soil bulk density in the 0-0.10 m and 0.10-0.20 m layers. Results showed autocorrelation and a cross correlation structure of soybean yield and soil penetration resistance data. Soil bulk density data, however, were only autocorrelated in the 0-0.10 m layer and not cross correlated with soybean yield. The results showed the higher efficiency of the autoregressive space-state models in relation to the equivalent simple and multiple linear regression models using Akaike's Information Criterion. The resulting values were comparatively lower than the values obtained by the regression models, for all combinations of explanatory variables.
Resumo:
The problem of prediction is considered in a multidimensional setting. Extending an idea presented by Barndorff-Nielsen and Cox, a predictive density for a multivariate random variable of interest is proposed. This density has the form of an estimative density plus a correction term. It gives simultaneous prediction regions with coverage error of smaller asymptotic order than the estimative density. A simulation study is also presented showing the magnitude of the improvement with respect to the estimative method.
Resumo:
Cardiovascular risk assessment might be improved with the addition of emerging, new tests derived from atherosclerosis imaging, laboratory tests or functional tests. This article reviews relative risk, odds ratios, receiver-operating curves, posttest risk calculations based on likelihood ratios, the net reclassification improvement and integrated discrimination. This serves to determine whether a new test has an added clinical value on top of conventional risk testing and how this can be verified statistically. Two clinically meaningful examples serve to illustrate novel approaches. This work serves as a review and basic work for the development of new guidelines on cardiovascular risk prediction, taking into account emerging tests, to be proposed by members of the 'Taskforce on Vascular Risk Prediction' under the auspices of the Working Group 'Swiss Atherosclerosis' of the Swiss Society of Cardiology in the future.
Resumo:
BACKGROUND AND PURPOSE: To determine whether infarct core or penumbra is the more significant predictor of outcome in acute ischemic stroke, and whether the results are affected by the statistical method used. METHODS: Clinical and imaging data were collected in 165 patients with acute ischemic stroke. We reviewed the noncontrast head computed tomography (CT) to determine the Alberta Score Program Early CT score and assess for hyperdense middle cerebral artery. We reviewed CT-angiogram for site of occlusion and collateral flow score. From perfusion-CT, we calculated the volumes of infarct core and ischemic penumbra. Recanalization status was assessed on early follow-up imaging. Clinical data included age, several time points, National Institutes of Health Stroke Scale at admission, treatment type, and modified Rankin score at 90 days. Two multivariate regression analyses were conducted to determine which variables predicted outcome best. In the first analysis, we did not include recanalization status among the potential predicting variables. In the second, we included recanalization status and its interaction between perfusion-CT variables. RESULTS: Among the 165 study patients, 76 had a good outcome (modified Rankin score ≤2) and 89 had a poor outcome (modified Rankin score >2). In our first analysis, the most important predictors were age (P<0.001) and National Institutes of Health Stroke Scale at admission (P=0.001). The imaging variables were not important predictors of outcome (P>0.05). In the second analysis, when the recanalization status and its interaction with perfusion-CT variables were included, recanalization status and perfusion-CT penumbra volume became the significant predictors (P<0.001). CONCLUSIONS: Imaging prediction of tissue fate, more specifically imaging of the ischemic penumbra, matters only if recanalization can also be predicted.
Resumo:
Abstract: Asthma prevalence in children and adolescents in Spain is 10-17%. It is the most common chronic illness during childhood. Prevalence has been increasing over the last 40 years and there is considerable evidence that, among other factors, continued exposure to cigarette smoke results in asthma in children. No statistical or simulation model exist to forecast the evolution of childhood asthma in Europe. Such a model needs to incorporate the main risk factors that can be managed by medical authorities, such as tobacco (OR = 1.44), to establish how they affect the present generation of children. A simulation model using conditional probability and discrete event simulation for childhood asthma was developed and validated by simulating realistic scenario. The parameters used for the model (input data) were those found in the bibliography, especially those related to the incidence of smoking in Spain. We also used data from a panel of experts from the Hospital del Mar (Barcelona) related to actual evolution and asthma phenotypes. The results obtained from the simulation established a threshold of a 15-20% smoking population for a reduction in the prevalence of asthma. This is still far from the current level in Spain, where 24% of people smoke. We conclude that more effort must be made to combat smoking and other childhood asthma risk factors, in order to significantly reduce the number of cases. Once completed, this simulation methodology can realistically be used to forecast the evolution of childhood asthma as a function of variation in different risk factors.
Resumo:
This paper presents the general regression neural networks (GRNN) as a nonlinear regression method for the interpolation of monthly wind speeds in complex Alpine orography. GRNN is trained using data coming from Swiss meteorological networks to learn the statistical relationship between topographic features and wind speed. The terrain convexity, slope and exposure are considered by extracting features from the digital elevation model at different spatial scales using specialised convolution filters. A database of gridded monthly wind speeds is then constructed by applying GRNN in prediction mode during the period 1968-2008. This study demonstrates that using topographic features as inputs in GRNN significantly reduces cross-validation errors with respect to low-dimensional models integrating only geographical coordinates and terrain height for the interpolation of wind speed. The spatial predictability of wind speed is found to be lower in summer than in winter due to more complex and weaker wind-topography relationships. The relevance of these relationships is studied using an adaptive version of the GRNN algorithm which allows to select the useful terrain features by eliminating the noisy ones. This research provides a framework for extending the low-dimensional interpolation models to high-dimensional spaces by integrating additional features accounting for the topographic conditions at multiple spatial scales. Copyright (c) 2012 Royal Meteorological Society.
Resumo:
In this commentary, we argue that the term 'prediction' is overly used when in fact, referring to foundational writings of de Finetti, the correspondent term should be inference. In particular, we intend (i) to summarize and clarify relevant subject matter on prediction from established statistical theory, and (ii) point out the logic of this understanding with respect practical uses of the term prediction. Written from an interdisciplinary perspective, associating statistics and forensic science as an example, this discussion also connects to related fields such as medical diagnosis and other areas of application where reasoning based on scientific results is practiced in societal relevant contexts. This includes forensic psychology that uses prediction as part of its vocabulary when dealing with matters that arise in the course of legal proceedings.
Resumo:
Automobile bodily injury (BI) claims remain unsettled for a long time after the accident. The estimation of an accurate reserve for Reported But Not Settled (RBNS) claims is therefore vital for insurers. In accordance with the recommendation included in the Solvency II project (CEIOPS, 2007) a statistical model is here implemented for RBNS reserve estimation. Lognormality on empirical compensation cost data is observed for different levels of BI severity. The individual claim provision is estimated by allocating the expected mean compensation for the predicted severity of the victim’s injury, for which the upper bound is also computed. The BI severity is predicted by means of a heteroscedastic multiple choice model, because empirical evidence has found that the variability in the latent severity of injured individuals travelling by car is not constant. It is shown that this methodology can improve the accuracy of RBNS reserve estimation at all stages, as compared to the subjective assessment that has traditionally been made by practitioners.
Resumo:
Asian rust of soybean [Glycine max (L.) Merril] is one of the most important fungal diseases of this crop worldwide. The recent introduction of Phakopsora pachyrhizi Syd. & P. Syd in the Americas represents a major threat to soybean production in the main growing regions, and significant losses have already been reported. P. pachyrhizi is extremely aggressive under favorable weather conditions, causing rapid plant defoliation. Epidemiological studies, under both controlled and natural environmental conditions, have been done for several decades with the aim of elucidating factors that affect the disease cycle as a basis for disease modeling. The recent spread of Asian soybean rust to major production regions in the world has promoted new development, testing and application of mathematical models to assess the risk and predict the disease. These efforts have included the integration of new data, epidemiological knowledge, statistical methods, and advances in computer simulation to develop models and systems with different spatial and temporal scales, objectives and audience. In this review, we present a comprehensive discussion on the models and systems that have been tested to predict and assess the risk of Asian soybean rust. Limitations, uncertainties and challenges for modelers are also discussed.
Resumo:
This study aimed to investigate the potential use of magnetic susceptibility (MS) as pedotransfer function to predict soil attributes under two sugarcane harvesting management systems. For each area of 1 ha (one with green sugarcane mechanized harvesting and other one with burnt sugarcane manual harvesting), 126 soil samples were collected and subjected to laboratory analysis to determine soil physical, chemical and mineralogical attributes and for measuring of MS. Data were submitted to descriptive statistics by calculating the mean and coefficient of variation. In order to compare the means in the different harvesting management systems it was carried out the Tukey test at a significance level of 5%. In order to investigate the correlation of the MS with other soil properties it was made the correlation test and aiming to assess how the MS contributes to the prediction of soil complex attributes it was made the multiple linear regressions. The results demonstrate that MS showed, in both sugarcane harvesting management systems, statistical correlation with chemical, physical and mineralogical soil attributes and it also showed potential to be used as pedotransfer function to predict attributes of the studied oxisol.