963 resultados para Regression methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Random coefficient regression models have been applied in differentfields and they constitute a unifying setup for many statisticalproblems. The nonparametric study of this model started with Beranand Hall (1992) and it has become a fruitful framework. In thispaper we propose and study statistics for testing a basic hypothesisconcerning this model: the constancy of coefficients. The asymptoticbehavior of the statistics is investigated and bootstrapapproximations are used in order to determine the critical values ofthe test statistics. A simulation study illustrates the performanceof the proposals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp). Results Model selection based on cross-validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross-validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the fixed design regression model, additional weights areconsidered for the Nadaraya--Watson and Gasser--M\"uller kernel estimators.We study their asymptotic behavior and the relationships between new andclassical estimators. For a simple family of weights, and considering theIMSE as global loss criterion, we show some possible theoretical advantages.An empirical study illustrates the performance of the weighted estimatorsin finite samples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this paper is to compare the performance of twopredictive radiological models, logistic regression (LR) and neural network (NN), with five different resampling methods. One hundred and sixty-seven patients with proven calvarial lesions as the only known disease were enrolled. Clinical and CT data were used for LR and NN models. Both models were developed with cross validation, leave-one-out and three different bootstrap algorithms. The final results of each model were compared with error rate and the area under receiver operating characteristic curves (Az). The neural network obtained statistically higher Az than LR with cross validation. The remaining resampling validation methods did not reveal statistically significant differences between LR and NN rules. The neural network classifier performs better than the one based on logistic regression. This advantage is well detected by three-fold cross-validation, but remains unnoticed when leave-one-out or bootstrap algorithms are used.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a common and tractable framework for analyzingdifferent definitions of fixed and random effects in a contant-slopevariable-intercept model. It is shown that, regardless of whethereffects (i) are treated as parameters or as an error term, (ii) areestimated in different stages of a hierarchical model, or whether (iii)correlation between effects and regressors is allowed, when the sameinformation on effects is introduced into all estimation methods, theresulting slope estimator is also the same across methods. If differentmethods produce different results, it is ultimately because differentinformation is being used for each methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper shows how recently developed regression-based methods for thedecomposition of health inequality can be extended to incorporateindividual heterogeneity in the responses of health to the explanatoryvariables. We illustrate our method with an application to the CanadianNPHS of 1994. Our strategy for the estimation of heterogeneous responsesis based on the quantile regression model. The results suggest that thereis an important degree of heterogeneity in the association of health toexplanatory variables which, in turn, accounts for a substantial percentageof inequality in observed health. A particularly interesting finding isthat the marginal response of health to income is zero for healthyindividuals but positive and significant for unhealthy individuals. Theheterogeneity in the income response reduces both overall health inequalityand income related health inequality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fluvial deposits are a challenge for modelling flow in sub-surface reservoirs. Connectivity and continuity of permeable bodies have a major impact on fluid flow in porous media. Contemporary object-based and multipoint statistics methods face a problem of robust representation of connected structures. An alternative approach to model petrophysical properties is based on machine learning algorithm ? Support Vector Regression (SVR). Semi-supervised SVR is able to establish spatial connectivity taking into account the prior knowledge on natural similarities. SVR as a learning algorithm is robust to noise and captures dependencies from all available data. Semi-supervised SVR applied to a synthetic fluvial reservoir demonstrated robust results, which are well matched to the flow performance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Whole pelvis intensity modulated radiotherapy (IMRT) is increasingly being used to treat cervical cancer aiming to reduce side effects. Encouraged by this, some groups have proposed the use of simultaneous integrated boost (SIB) to target the tumor, either to get a higher tumoricidal effect or to replace brachytherapy. Nevertheless, physiological organ movement and rapid tumor regression throughout treatment might substantially reduce any benefit of this approach. PURPOSE: To evaluate the clinical target volume - simultaneous integrated boost (CTV-SIB) regression and motion during chemo-radiotherapy (CRT) for cervical cancer, and to monitor treatment progress dosimetrically and volumetrically to ensure treatment goals are met. METHODS AND MATERIALS: Ten patients treated with standard doses of CRT and brachytherapy were retrospectively re-planned using a helical Tomotherapy - SIB technique for the hypothetical scenario of this feasibility study. Target and organs at risk (OAR) were contoured on deformable fused planning-computed tomography and megavoltage computed tomography images. The CTV-SIB volume regression was determined. The center of mass (CM) was used to evaluate the degree of motion. The Dice's similarity coefficient (DSC) was used to assess the spatial overlap of CTV-SIBs between scans. A cumulative dose-volume histogram modeled estimated delivered doses. RESULTS: The CTV-SIB relative reduction was between 31 and 70%. The mean maximum CM change was 12.5, 9, and 3 mm in the superior-inferior, antero-posterior, and right-left dimensions, respectively. The CTV-SIB-DSC approached 1 in the first week of treatment, indicating almost perfect overlap. CTV-SIB-DSC regressed linearly during therapy, and by the end of treatment was 0.5, indicating 50% discordance. Two patients received less than 95% of the prescribed dose. Much higher doses to the OAR were observed. A multiple regression analysis showed a significant interaction between CTV-SIB reduction and OAR dose increase. CONCLUSIONS: The CTV-SIB had important regression and motion during CRT, receiving lower therapeutic doses than expected. The OAR had unpredictable shifts and received higher doses. The use of SIB without frequent adaptation of the treatment plan exposes cervical cancer patients to an unpredictable risk of under-dosing the target and/or overdosing adjacent critical structures. In that scenario, brachytherapy continues to be the gold standard approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Detailed knowledge on water percolation into the soil in irrigated areas is fundamental for solving problems of drainage, pollution and the recharge of underground aquifers. The aim of this study was to evaluate the percolation estimated by time-domain-reflectometry (TDR) in a drainage lysimeter. We used Darcy's law with K(θ) functions determined by field and laboratory methods and by the change in water storage in the soil profile at 16 points of moisture measurement at different time intervals. A sandy clay soil was saturated and covered with plastic sheet to prevent evaporation and an internal drainage trial in a drainage lysimeter was installed. The relationship between the observed and estimated percolation values was evaluated by linear regression analysis. The results suggest that percolation in the field or laboratory can be estimated based on continuous monitoring with TDR, and at short time intervals, of the variations in soil water storage. The precision and accuracy of this approach are similar to those of the lysimeter and it has advantages over the other evaluated methods, of which the most relevant are the possibility of estimating percolation in short time intervals and exemption from the predetermination of soil hydraulic properties such as water retention and hydraulic conductivity. The estimates obtained by the Darcy-Buckingham equation for percolation levels using function K(θ) predicted by the method of Hillel et al. (1972) provided compatible water percolation estimates with those obtained in the lysimeter at time intervals greater than 1 h. The methods of Libardi et al. (1980), Sisson et al. (1980) and van Genuchten (1980) underestimated water percolation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: We assessed the impact of a multicomponent worksite health promotion program for0 reducing cardiovascular risk factors (CVRF) with short intervention, adjusting for regression towards the mean (RTM) affecting such nonexperimental study without control group. METHODS: A cohort of 4,198 workers (aged 42 +/- 10 years, range 16-76 years, 27% women) were analyzed at 3.7-year interval and stratified by each CVRF risk category (low/medium/high blood pressure [BP], total cholesterol [TC], body mass index [BMI], and smoking) with RTM and secular trend adjustments. Intervention consisted of 15 min CVRF screening and individualized counseling by health professionals to medium- and high-risk individuals, with eventual physician referral. RESULTS: High-risk groups participants improved diastolic BP (-3.4 mm Hg [95%CI: -5.1, -1.7]) in 190 hypertensive patients, TC (-0.58 mmol/l [-0.71, -0.44]) in 693 hypercholesterolemic patients, and smoking (-3.1 cig/day [-3.9, -2.3]) in 808 smokers, while systolic BP changes reflected RTM. Low-risk individuals without counseling deteriorated TC and BMI. Body weight increased uniformly in all risk groups (+0.35 kg/year). CONCLUSIONS: In real-world conditions, short intervention program participants in high-risk groups for diastolic BP, TC, and smoking improved their CVRF, whereas low-risk TC and BMI groups deteriorated. Future programs may include specific advises to low-risk groups to maintain a favorable CVRF profile.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A statewide study was performed to develop regional regression equations for estimating selected annual exceedance- probability statistics for ungaged stream sites in Iowa. The study area comprises streamgages located within Iowa and 50 miles beyond the State’s borders. Annual exceedanceprobability estimates were computed for 518 streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data through 2010. The estimation of the selected statistics included a Bayesian weighted least-squares/generalized least-squares regression analysis to update regional skew coefficients for the 518 streamgages. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low flows. Also, geographic information system software was used to measure 59 selected basin characteristics for each streamgage. Regional regression analysis, using generalized leastsquares regression, was used to develop a set of equations for each flood region in Iowa for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. A total of 394 streamgages were included in the development of regional regression equations for three flood regions (regions 1, 2, and 3) that were defined for Iowa based on landform regions and soil regions. Average standard errors of prediction range from 31.8 to 45.2 percent for flood region 1, 19.4 to 46.8 percent for flood region 2, and 26.5 to 43.1 percent for flood region 3. The pseudo coefficients of determination for the generalized leastsquares equations range from 90.8 to 96.2 percent for flood region 1, 91.5 to 97.9 percent for flood region 2, and 92.4 to 96.0 percent for flood region 3. The regression equations are applicable only to stream sites in Iowa with flows not significantly affected by regulation, diversion, channelization, backwater, or urbanization and with basin characteristics within the range of those used to develop the equations. These regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the eight selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided by the Web-based tool. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these eight selected statistics are provided for the streamgage.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spatial data analysis mapping and visualization is of great importance in various fields: environment, pollution, natural hazards and risks, epidemiology, spatial econometrics, etc. A basic task of spatial mapping is to make predictions based on some empirical data (measurements). A number of state-of-the-art methods can be used for the task: deterministic interpolations, methods of geostatistics: the family of kriging estimators (Deutsch and Journel, 1997), machine learning algorithms such as artificial neural networks (ANN) of different architectures, hybrid ANN-geostatistics models (Kanevski and Maignan, 2004; Kanevski et al., 1996), etc. All the methods mentioned above can be used for solving the problem of spatial data mapping. Environmental empirical data are always contaminated/corrupted by noise, and often with noise of unknown nature. That's one of the reasons why deterministic models can be inconsistent, since they treat the measurements as values of some unknown function that should be interpolated. Kriging estimators treat the measurements as the realization of some spatial randomn process. To obtain the estimation with kriging one has to model the spatial structure of the data: spatial correlation function or (semi-)variogram. This task can be complicated if there is not sufficient number of measurements and variogram is sensitive to outliers and extremes. ANN is a powerful tool, but it also suffers from the number of reasons. of a special type ? multiplayer perceptrons ? are often used as a detrending tool in hybrid (ANN+geostatistics) models (Kanevski and Maignank, 2004). Therefore, development and adaptation of the method that would be nonlinear and robust to noise in measurements, would deal with the small empirical datasets and which has solid mathematical background is of great importance. The present paper deals with such model, based on Statistical Learning Theory (SLT) - Support Vector Regression. SLT is a general mathematical framework devoted to the problem of estimation of the dependencies from empirical data (Hastie et al, 2004; Vapnik, 1998). SLT models for classification - Support Vector Machines - have shown good results on different machine learning tasks. The results of SVM classification of spatial data are also promising (Kanevski et al, 2002). The properties of SVM for regression - Support Vector Regression (SVR) are less studied. First results of the application of SVR for spatial mapping of physical quantities were obtained by the authorsin for mapping of medium porosity (Kanevski et al, 1999), and for mapping of radioactively contaminated territories (Kanevski and Canu, 2000). The present paper is devoted to further understanding of the properties of SVR model for spatial data analysis and mapping. Detailed description of the SVR theory can be found in (Cristianini and Shawe-Taylor, 2000; Smola, 1996) and basic equations for the nonlinear modeling are given in section 2. Section 3 discusses the application of SVR for spatial data mapping on the real case study - soil pollution by Cs137 radionuclide. Section 4 discusses the properties of the modelapplied to noised data or data with outliers.