950 resultados para Cross-validation


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: Health status measures usually have an asymmetric distribution and present a highpercentage of respondents with the best possible score (ceiling effect), specially when they areassessed in the overall population. Different methods to model this type of variables have beenproposed that take into account the ceiling effect: the tobit models, the Censored Least AbsoluteDeviations (CLAD) models or the two-part models, among others. The objective of this workwas to describe the tobit model, and compare it with the Ordinary Least Squares (OLS) model,that ignores the ceiling effect.Methods: Two different data sets have been used in order to compare both models: a) real datacomming from the European Study of Mental Disorders (ESEMeD), in order to model theEQ5D index, one of the measures of utilities most commonly used for the evaluation of healthstatus; and b) data obtained from simulation. Cross-validation was used to compare thepredicted values of the tobit model and the OLS models. The following estimators werecompared: the percentage of absolute error (R1), the percentage of squared error (R2), the MeanSquared Error (MSE) and the Mean Absolute Prediction Error (MAPE). Different datasets werecreated for different values of the error variance and different percentages of individuals withceiling effect. The estimations of the coefficients, the percentage of explained variance and theplots of residuals versus predicted values obtained under each model were compared.Results: With regard to the results of the ESEMeD study, the predicted values obtained with theOLS model and those obtained with the tobit models were very similar. The regressioncoefficients of the linear model were consistently smaller than those from the tobit model. In thesimulation study, we observed that when the error variance was small (s=1), the tobit modelpresented unbiased estimations of the coefficients and accurate predicted values, specially whenthe percentage of individuals wiht the highest possible score was small. However, when theerrror variance was greater (s=10 or s=20), the percentage of explained variance for the tobitmodel and the predicted values were more similar to those obtained with an OLS model.Conclusions: The proportion of variability accounted for the models and the percentage ofindividuals with the highest possible score have an important effect in the performance of thetobit model in comparison with the linear model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

1. Identifying those areas suitable for recolonization by threatened species is essential to support efficient conservation policies. Habitat suitability models (HSM) predict species' potential distributions, but the quality of their predictions should be carefully assessed when the species-environment equilibrium assumption is violated.2. We studied the Eurasian otter Lutra lutra, whose numbers are recovering in southern Italy. To produce widely applicable results, we chose standard HSM procedures and looked for the models' capacities in predicting the suitability of a recolonization area. We used two fieldwork datasets: presence-only data, used in the Ecological Niche Factor Analyses (ENFA), and presence-absence data, used in a Generalized Linear Model (GLM). In addition to cross-validation, we independently evaluated the models with data from a recolonization event, providing presences on a previously unoccupied river.3. Three of the models successfully predicted the suitability of the recolonization area, but the GLM built with data before the recolonization disagreed with these predictions, missing the recolonized river's suitability and badly describing the otter's niche. Our results highlighted three points of relevance to modelling practices: (1) absences may prevent the models from correctly identifying areas suitable for a species spread; (2) the selection of variables may lead to randomness in the predictions; and (3) the Area Under Curve (AUC), a commonly used validation index, was not well suited to the evaluation of model quality, whereas the Boyce Index (CBI), based on presence data only, better highlighted the models' fit to the recolonization observations.4. For species with unstable spatial distributions, presence-only models may work better than presence-absence methods in making reliable predictions of suitable areas for expansion. An iterative modelling process, using new occurrences from each step of the species spread, may also help in progressively reducing errors.5. Synthesis and applications. Conservation plans depend on reliable models of the species' suitable habitats. In non-equilibrium situations, such as the case for threatened or invasive species, models could be affected negatively by the inclusion of absence data when predicting the areas of potential expansion. Presence-only methods will here provide a better basis for productive conservation management practices.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper deals with the development and application of the generic methodology for automatic processing (mapping and classification) of environmental data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve the problem of spatial data mapping (regression). The Probabilistic Neural Network (PNN) is considered as an automatic tool for spatial classifications. The automatic tuning of isotropic and anisotropic GRNN/PNN models using cross-validation procedure is presented. Results are compared with the k-Nearest-Neighbours (k-NN) interpolation algorithm using independent validation data set. Real case studies are based on decision-oriented mapping and classification of radioactively contaminated territories.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Is it possible to build predictive models (PMs) of soil particle-size distribution (psd) in a region with complex geology and a young and unstable land-surface? The main objective of this study was to answer this question. A set of 339 soil samples from a small slope catchment in Southern Brazil was used to build PMs of psd in the surface soil layer. Multiple linear regression models were constructed using terrain attributes (elevation, slope, catchment area, convergence index, and topographic wetness index). The PMs explained more than half of the data variance. This performance is similar to (or even better than) that of the conventional soil mapping approach. For some size fractions, the PM performance can reach 70 %. Largest uncertainties were observed in geologically more complex areas. Therefore, significant improvements in the predictions can only be achieved if accurate geological data is made available. Meanwhile, PMs built on terrain attributes are efficient in predicting the particle-size distribution (psd) of soils in regions of complex geology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

PURPOSE: Ocular anatomy and radiation-associated toxicities provide unique challenges for external beam radiation therapy. For treatment planning, precise modeling of organs at risk and tumor volume are crucial. Development of a precise eye model and automatic adaptation of this model to patients' anatomy remain problematic because of organ shape variability. This work introduces the application of a 3-dimensional (3D) statistical shape model as a novel method for precise eye modeling for external beam radiation therapy of intraocular tumors. METHODS AND MATERIALS: Manual and automatic segmentations were compared for 17 patients, based on head computed tomography (CT) volume scans. A 3D statistical shape model of the cornea, lens, and sclera as well as of the optic disc position was developed. Furthermore, an active shape model was built to enable automatic fitting of the eye model to CT slice stacks. Cross-validation was performed based on leave-one-out tests for all training shapes by measuring dice coefficients and mean segmentation errors between automatic segmentation and manual segmentation by an expert. RESULTS: Cross-validation revealed a dice similarity of 95% ± 2% for the sclera and cornea and 91% ± 2% for the lens. Overall, mean segmentation error was found to be 0.3 ± 0.1 mm. Average segmentation time was 14 ± 2 s on a standard personal computer. CONCLUSIONS: Our results show that the solution presented outperforms state-of-the-art methods in terms of accuracy, reliability, and robustness. Moreover, the eye model shape as well as its variability is learned from a training set rather than by making shape assumptions (eg, as with the spherical or elliptical model). Therefore, the model appears to be capable of modeling nonspherically and nonelliptically shaped eyes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVE: To develop and validate a simple, integer-based score to predict functional outcome in acute ischemic stroke (AIS) using variables readily available after emergency room admission. METHODS: Logistic regression was performed in the derivation cohort of previously independent patients with AIS (Acute Stroke Registry and Analysis of Lausanne [ASTRAL]) to identify predictors of unfavorable outcome (3-month modified Rankin Scale score >2). An integer-based point-scoring system for each covariate of the fitted multivariate model was generated by their β-coefficients; the overall score was calculated as the sum of the weighted scores. The model was validated internally using a 2-fold cross-validation technique and externally in 2 independent cohorts (Athens and Vienna Stroke Registries). RESULTS: Age (A), severity of stroke (S) measured by admission NIH Stroke Scale score, stroke onset to admission time (T), range of visual fields (R), acute glucose (A), and level of consciousness (L) were identified as independent predictors of unfavorable outcome in 1,645 patients in ASTRAL. Their β-coefficients were multiplied by 4 and rounded to the closest integer to generate the score. The area under the receiver operating characteristic curve (AUC) of the score in the ASTRAL cohort was 0.850. The score was well calibrated in the derivation (p = 0.43) and validation cohorts (0.22 [Athens, n = 1,659] and 0.49 [Vienna, n = 653]). AUCs were 0.937 (Athens), 0.771 (Vienna), and 0.902 (when pooled). An ASTRAL score of 31 indicates a 50% likelihood of unfavorable outcome. CONCLUSIONS: The ASTRAL score is a simple integer-based score to predict functional outcome using 6 readily available items at hospital admission. It performed well in double external validation and may be a useful tool for clinical practice and stroke research.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work we explore the multivariate empirical mode decomposition combined with a Neural Network classifier as technique for face recognition tasks. Images are simultaneously decomposed by means of EMD and then the distance between the modes of the image and the modes of the representative image of each class is calculated using three different distance measures. Then, a neural network is trained using 10- fold cross validation in order to derive a classifier. Preliminary results (over 98 % of classification rate) are satisfactory and will justify a deep investigation on how to apply mEMD for face recognition.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Risk maps summarizing landscape suitability of novel areas for invading species can be valuable tools for preventing species' invasions or controlling their spread, but methods employed for development of such maps remain variable and unstandardized. We discuss several considerations in development of such models, including types of distributional information that should be used, the nature of explanatory variables that should be incorporated, and caveats regarding model testing and evaluation. We highlight that, in the case of invasive species, such distributional predictions should aim to derive the best hypothesis of the potential distribution of the species by using (1) all distributional information available, including information from both the native range and other invaded regions; (2) predictors linked as directly as is feasible to the physiological requirements of the species; and (3) modelling procedures that carefully avoid overfitting to the training data. Finally, model testing and evaluation should focus on well-predicted presences, and less on efficient prediction of absences; a k-fold regional cross-validation test is discussed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present a new indicator taxa approach to the prediction of climate change effects on biodiversity at the national level in Switzerland. As indicators, we select a set of the most widely distributed species that account for 95% of geographical variation in sampled species richness of birds, butterflies, and vascular plants. Species data come from a national program designed to monitor spatial and temporal trends in species richness. We examine some opportunities and limitations in using these data. We develop ecological niche models for the species as functions of both climate and land cover variables. We project these models to the future using climate predictions that correspond to two IPCC 3rd assessment scenarios for the development of 'greenhouse' gas emissions. We find that models that are calibrated with Swiss national monitoring data perform well in 10-fold cross-validation, but can fail to capture the hot-dry end of environmental gradients that constrain some species distributions. Models for indicator species in all three higher taxa predict that climate change will result in turnover in species composition even where there is little net change in predicted species richness. Indicator species from high elevations lose most areas of suitable climate even under the relatively mild B2 scenario. We project some areas to increase in the number of species for which climate conditions are suitable early in the current century, but these areas become less suitable for a majority of species by the end of the century. Selection of indicator species based on rank prevalence results in a set of models that predict observed species richness better than a similar set of species selected based on high rank of model AUC values. An indicator species approach based on selected species that are relatively common may facilitate the use of national monitoring data for predicting climate change effects on the distribution of biodiversity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVES: To test the validity of a simple, rapid, field-adapted, portable hand-held impedancemeter (HHI) for the estimation of lean body mass (LBM) and percentage body fat (%BF) in African women, and to develop specific predictive equations. DESIGN: Cross-sectional observational study. SETTINGS: Dakar, the capital city of Senegal, West Africa. SUBJECTS: A total sample of 146 women volunteered. Their mean age was of 31.0 y (s.d. 9.1), weight 60.9 kg (s.d. 13.1) and BMI 22.6 kg/m(2) (s.d. 4.5). METHODS: Body composition values estimated by HHI were compared to those measured by whole body densitometry performed by air displacement plethysmography (ADP). The specific density of LBM in black subjects was taken into account for the calculation of %BF from body density. RESULTS: : Estimations from HHI showed a large bias (mean difference) of 5.6 kg LBM (P<10(-4)) and -8.8 %BF (P<10(-4)) and errors (s.d. of the bias) of 2.6 kg LBM and 3.7 %BF. In order to correct for the bias, specific predictive equations were developed. With the HHI result as a single predictor, error values were of 1.9 kg LBM and 3.7 %BF in the prediction group (n=100), and of 2.2 kg LBM and 3.6 %BF in the cross-validation group (n=46). Addition of anthropometrical predictors was not necessary. CONCLUSIONS: The HHI analyser significantly overestimated LBM and underestimated %BF in African women. After correction for the bias, the body compartments could easily be estimated in African women by using the HHI result in an appropriate prediction equation with a good precision. It remains to be seen whether a combination of arm and leg impedancemetry in order to take into account lower limbs would further improve the prediction of body composition in Africans.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents the general regression neural networks (GRNN) as a nonlinear regression method for the interpolation of monthly wind speeds in complex Alpine orography. GRNN is trained using data coming from Swiss meteorological networks to learn the statistical relationship between topographic features and wind speed. The terrain convexity, slope and exposure are considered by extracting features from the digital elevation model at different spatial scales using specialised convolution filters. A database of gridded monthly wind speeds is then constructed by applying GRNN in prediction mode during the period 1968-2008. This study demonstrates that using topographic features as inputs in GRNN significantly reduces cross-validation errors with respect to low-dimensional models integrating only geographical coordinates and terrain height for the interpolation of wind speed. The spatial predictability of wind speed is found to be lower in summer than in winter due to more complex and weaker wind-topography relationships. The relevance of these relationships is studied using an adaptive version of the GRNN algorithm which allows to select the useful terrain features by eliminating the noisy ones. This research provides a framework for extending the low-dimensional interpolation models to high-dimensional spaces by integrating additional features accounting for the topographic conditions at multiple spatial scales. Copyright (c) 2012 Royal Meteorological Society.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Genetic variants influence the risk to develop certain diseases or give rise to differences in drug response. Recent progresses in cost-effective, high-throughput genome-wide techniques, such as microarrays measuring Single Nucleotide Polymorphisms (SNPs), have facilitated genotyping of large clinical and population cohorts. Combining the massive genotypic data with measurements of phenotypic traits allows for the determination of genetic differences that explain, at least in part, the phenotypic variations within a population. So far, models combining the most significant variants can only explain a small fraction of the variance, indicating the limitations of current models. In particular, researchers have only begun to address the possibility of interactions between genotypes and the environment. Elucidating the contributions of such interactions is a difficult task because of the large number of genetic as well as possible environmental factors.In this thesis, I worked on several projects within this context. My first and main project was the identification of possible SNP-environment interactions, where the phenotypes were serum lipid levels of patients from the Swiss HIV Cohort Study (SHCS) treated with antiretroviral therapy. Here the genotypes consisted of a limited set of SNPs in candidate genes relevant for lipid transport and metabolism. The environmental variables were the specific combinations of drugs given to each patient over the treatment period. My work explored bioinformatic and statistical approaches to relate patients' lipid responses to these SNPs, drugs and, importantly, their interactions. The goal of this project was to improve our understanding and to explore the possibility of predicting dyslipidemia, a well-known adverse drug reaction of antiretroviral therapy. Specifically, I quantified how much of the variance in lipid profiles could be explained by the host genetic variants, the administered drugs and SNP-drug interactions and assessed the predictive power of these features on lipid responses. Using cross-validation stratified by patients, we could not validate our hypothesis that models that select a subset of SNP-drug interactions in a principled way have better predictive power than the control models using "random" subsets. Nevertheless, all models tested containing SNP and/or drug terms, exhibited significant predictive power (as compared to a random predictor) and explained a sizable proportion of variance, in the patient stratified cross-validation context. Importantly, the model containing stepwise selected SNP terms showed higher capacity to predict triglyceride levels than a model containing randomly selected SNPs. Dyslipidemia is a complex trait for which many factors remain to be discovered, thus missing from the data, and possibly explaining the limitations of our analysis. In particular, the interactions of drugs with SNPs selected from the set of candidate genes likely have small effect sizes which we were unable to detect in a sample of the present size (<800 patients).In the second part of my thesis, I performed genome-wide association studies within the Cohorte Lausannoise (CoLaus). I have been involved in several international projects to identify SNPs that are associated with various traits, such as serum calcium, body mass index, two-hour glucose levels, as well as metabolic syndrome and its components. These phenotypes are all related to major human health issues, such as cardiovascular disease. I applied statistical methods to detect new variants associated with these phenotypes, contributing to the identification of new genetic loci that may lead to new insights into the genetic basis of these traits. This kind of research will lead to a better understanding of the mechanisms underlying these pathologies, a better evaluation of disease risk, the identification of new therapeutic leads and may ultimately lead to the realization of "personalized" medicine.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper deals with the development and application of the methodology for automatic mapping of pollution/contamination data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve this problem. The automatic tuning of isotropic and an anisotropic GRNN model using cross-validation procedure is presented. Results are compared with k-nearest-neighbours interpolation algorithm using independent validation data set. Quality of mapping is controlled by the analysis of raw data and the residuals using variography. Maps of probabilities of exceeding a given decision level and ?thick? isoline visualization of the uncertainties are presented as examples of decision-oriented mapping. Real case study is based on mapping of radioactively contaminated territories.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of this work was to select semivariogram models to estimate the population density of fig fly (Zaprionus indianus; Diptera: Drosophilidae) throughout the year, using ordinary kriging. Nineteen monitoring sites were demarcated in an area of 8,200 m2, cropped with six fruit tree species: persimmon, citrus, fig, guava, apple, and peach. During a 24 month period, 106 weekly evaluations were done in these sites. The average number of adult fig flies captured weekly per trap, during each month, was subjected to the circular, spherical, pentaspherical, exponential, Gaussian, rational quadratic, hole effect, K-Bessel, J-Bessel, and stable semivariogram models, using ordinary kriging interpolation. The models with the best fit were selected by cross-validation. Each data set (months) has a particular spatial dependence structure, which makes it necessary to define specific models of semivariograms in order to enhance the adjustment to the experimental semivariogram. Therefore, it was not possible to determine a standard semivariogram model; instead, six theoretical models were selected: circular, Gaussian, hole effect, K-Bessel, J-Bessel, and stable.