56 resultados para CROSS-VALIDATION
em Université de Lausanne, Switzerland
Resumo:
BACKGROUND/OBJECTIVES: (1) To cross-validate tetra- (4-BIA) and octopolar (8-BIA) bioelectrical impedance analysis vs dual-energy X-ray absorptiometry (DXA) for the assessment of total and appendicular body composition and (2) to evaluate the accuracy of external 4-BIA algorithms for the prediction of total body composition, in a representative sample of Swiss children. SUBJECTS/METHODS: A representative sample of 333 Swiss children aged 6-13 years from the Kinder-Sportstudie (KISS) (ISRCTN15360785). Whole-body fat-free mass (FFM) and appendicular lean tissue mass were measured with DXA. Body resistance (R) was measured at 50 kHz with 4-BIA and segmental body resistance at 5, 50, 250 and 500 kHz with 8-BIA. The resistance index (RI) was calculated as height(2)/R. Selection of predictors (gender, age, weight, RI4 and RI8) for BIA algorithms was performed using bootstrapped stepwise linear regression on 1000 samples. We calculated 95% confidence intervals (CI) of regression coefficients and measures of model fit using bootstrap analysis. Limits of agreement were used as measures of interchangeability of BIA with DXA. RESULTS: 8-BIA was more accurate than 4-BIA for the assessment of FFM (root mean square error (RMSE)=0.90 (95% CI 0.82-0.98) vs 1.12 kg (1.01-1.24); limits of agreement 1.80 to -1.80 kg vs 2.24 to -2.24 kg). 8-BIA also gave accurate estimates of appendicular body composition, with RMSE < or = 0.10 kg for arms and < or = 0.24 kg for legs. All external 4-BIA algorithms performed poorly with substantial negative proportional bias (r> or = 0.48, P<0.001). CONCLUSIONS: In a representative sample of young Swiss children (1) 8-BIA was superior to 4-BIA for the prediction of FFM, (2) external 4-BIA algorithms gave biased predictions of FFM and (3) 8-BIA was an accurate predictor of segmental body composition.
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
In the last five years, Deep Brain Stimulation (DBS) has become the most popular and effective surgical technique for the treatent of Parkinson's disease (PD). The Subthalamic Nucleus (STN) is the usual target involved when applying DBS. Unfortunately, the STN is in general not visible in common medical imaging modalities. Therefore, atlas-based segmentation is commonly considered to locate it in the images. In this paper, we propose a scheme that allows both, to perform a comparison between different registration algorithms and to evaluate their ability to locate the STN automatically. Using this scheme we can evaluate the expert variability against the error of the algorithms and we demonstrate that automatic STN location is possible and as accurate as the methods currently used.
Resumo:
In the last five years, Deep Brain Stimulation (DBS) has become the most popular and effective surgical technique for the treatent of Parkinson's disease (PD). The Subthalamic Nucleus (STN) is the usual target involved when applying DBS. Unfortunately, the STN is in general not visible in common medical imaging modalities. Therefore, atlas-based segmentation is commonly considered to locate it in the images. In this paper, we propose a scheme that allows both, to perform a comparison between different registration algorithms and to evaluate their ability to locate the STN automatically. Using this scheme we can evaluate the expert variability against the error of the algorithms and we demonstrate that automatic STN location is possible and as accurate as the methods currently used.
Resumo:
Recent studies have started to use media data to measure party positions and issue salience. The aim of this article is to compare and cross-validate this alternative approach with the more commonly used party manifestos, expert judgments and mass surveys. To this purpose, we present two methods to generate indicators of party positions and issue salience from media coverage: the core sentence approach and political claims analysis. Our cross-validation shows that with regard to party positions, indicators derived from the media converge with traditionally used measurements from party manifestos, mass surveys and expert judgments, but that salience indicators measure different underlying constructs. We conclude with a discussion of specific research questions for which media data offer potential advantages over more established methods.
Resumo:
Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.
Resumo:
BACKGROUND: Little information is available on the validity of simple and indirect body-composition methods in non-Western populations. Equations for predicting body composition are population-specific, and body composition differs between blacks and whites. OBJECTIVE: We tested the hypothesis that the validity of equations for predicting total body water (TBW) from bioelectrical impedance analysis measurements is likely to depend on the racial background of the group from which the equations were derived. DESIGN: The hypothesis was tested by comparing, in 36 African women, TBW values measured by deuterium dilution with those predicted by 23 equations developed in white, African American, or African subjects. These cross-validations in our African sample were also compared, whenever possible, with results from other studies in black subjects. RESULTS: Errors in predicting TBW showed acceptable values (1.3-1.9 kg) in all cases, whereas a large range of bias (0.2-6.1 kg) was observed independently of the ethnic origin of the sample from which the equations were derived. Three equations (2 from whites and 1 from blacks) showed nonsignificant bias and could be used in Africans. In all other cases, we observed either an overestimation or underestimation of TBW with variable bias values, regardless of racial background, yielding no clear trend for validity as a function of ethnic origin. CONCLUSIONS: The findings of this cross-validation study emphasize the need for further fundamental research to explore the causes of the poor validity of TBW prediction equations across populations rather than the need to develop new prediction equations for use in Africa.
Resumo:
Models predicting species spatial distribution are increasingly applied to wildlife management issues, emphasising the need for reliable methods to evaluate the accuracy of their predictions. As many available datasets (e.g. museums, herbariums, atlas) do not provide reliable information about species absences, several presence-only based analyses have been developed. However, methods to evaluate the accuracy of their predictions are few and have never been validated. The aim of this paper is to compare existing and new presenceonly evaluators to usual presence/absence measures. We use a reliable, diverse, presence/absence dataset of 114 plant species to test how common presence/absence indices (Kappa, MaxKappa, AUC, adjusted D-2) compare to presenceonly measures (AVI, CVI, Boyce index) for evaluating generalised linear models (GLM). Moreover we propose a new, threshold-independent evaluator, which we call "continuous Boyce index". All indices were implemented in the B10MAPPER software. We show that the presence-only evaluators are fairly correlated (p > 0.7) to the presence/absence ones. The Boyce indices are closer to AUC than to MaxKappa and are fairly insensitive to species prevalence. In addition, the Boyce indices provide predicted-toexpected ratio curves that offer further insights into the model quality: robustness, habitat suitability resolution and deviation from randomness. This information helps reclassifying predicted maps into meaningful habitat suitability classes. The continuous Boyce index is thus both a complement to usual evaluation of presence/absence models and a reliable measure of presence-only based predictions.
Resumo:
In occupational exposure assessment of airborne contaminants, exposure levels can either be estimated through repeated measurements of the pollutant concentration in air, expert judgment or through exposure models that use information on the conditions of exposure as input. In this report, we propose an empirical hierarchical Bayesian model to unify these approaches. Prior to any measurement, the hygienist conducts an assessment to generate prior distributions of exposure determinants. Monte-Carlo samples from these distributions feed two level-2 models: a physical, two-compartment model, and a non-parametric, neural network model trained with existing exposure data. The outputs of these two models are weighted according to the expert's assessment of their relevance to yield predictive distributions of the long-term geometric mean and geometric standard deviation of the worker's exposure profile (level-1 model). Bayesian inferences are then drawn iteratively from subsequent measurements of worker exposure. Any traditional decision strategy based on a comparison with occupational exposure limits (e.g. mean exposure, exceedance strategies) can then be applied. Data on 82 workers exposed to 18 contaminants in 14 companies were used to validate the model with cross-validation techniques. A user-friendly program running the model is available upon request.
Resumo:
OBJECTIVE: Mild neurocognitive disorders (MND) affect a subset of HIV+ patients under effective combination antiretroviral therapy (cART). In this study, we used an innovative multi-contrast magnetic resonance imaging (MRI) approach at high-field to assess the presence of micro-structural brain alterations in MND+ patients. METHODS: We enrolled 17 MND+ and 19 MND- patients with undetectable HIV-1 RNA and 19 healthy controls (HC). MRI acquisitions at 3T included: MP2RAGE for T1 relaxation times, Magnetization Transfer (MT), T2* and Susceptibility Weighted Imaging (SWI) to probe micro-structural integrity and iron deposition in the brain. Statistical analysis used permutation-based tests and correction for family-wise error rate. Multiple regression analysis was performed between MRI data and (i) neuropsychological results (ii) HIV infection characteristics. A linear discriminant analysis (LDA) based on MRI data was performed between MND+ and MND- patients and cross-validated with a leave-one-out test. RESULTS: Our data revealed loss of structural integrity and micro-oedema in MND+ compared to HC in the global white and cortical gray matter, as well as in the thalamus and basal ganglia. Multiple regression analysis showed a significant influence of sub-cortical nuclei alterations on the executive index of MND+ patients (p = 0.04 he and R(2) = 95.2). The LDA distinguished MND+ and MND- patients with a classification quality of 73% after cross-validation. CONCLUSION: Our study shows micro-structural brain tissue alterations in MND+ patients under effective therapy and suggests that multi-contrast MRI at high field is a powerful approach to discriminate between HIV+ patients on cART with and without mild neurocognitive deficits.
Resumo:
Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp). Results Model selection based on cross-validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross-validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.
Resumo:
The n-octanol/water partition coefficient (log Po/w) is a key physicochemical parameter for drug discovery, design, and development. Here, we present a physics-based approach that shows a strong linear correlation between the computed solvation free energy in implicit solvents and the experimental log Po/w on a cleansed data set of more than 17,500 molecules. After internal validation by five-fold cross-validation and data randomization, the predictive power of the most interesting multiple linear model, based on two GB/SA parameters solely, was tested on two different external sets of molecules. On the Martel druglike test set, the predictive power of the best model (N = 706, r = 0.64, MAE = 1.18, and RMSE = 1.40) is similar to six well-established empirical methods. On the 17-drug test set, our model outperformed all compared empirical methodologies (N = 17, r = 0.94, MAE = 0.38, and RMSE = 0.52). The physical basis of our original GB/SA approach together with its predictive capacity, computational efficiency (1 to 2 s per molecule), and tridimensional molecular graphics capability lay the foundations for a promising predictor, the implicit log P method (iLOGP), to complement the portfolio of drug design tools developed and provided by the SIB Swiss Institute of Bioinformatics.
Resumo:
The most widely used formula for estimating glomerular filtration rate (eGFR) in children is the Schwartz formula. It was revised in 2009 using iohexol clearances with measured GFR (mGFR) ranging between 15 and 75 ml/min × 1.73 m(2). Here we assessed the accuracy of the Schwartz formula using the inulin clearance (iGFR) method to evaluate its accuracy for children with less renal impairment comparing 551 iGFRs of 392 children with their Schwartz eGFRs. Serum creatinine was measured using the compensated Jaffe method. In order to find the best relationship between iGFR and eGFR, a linear quadratic regression model was fitted and a more accurate formula was derived. This quadratic formula was: 0.68 × (Height (cm)/serum creatinine (mg/dl))-0.0008 × (height (cm)/serum creatinine (mg/dl))(2)+0.48 × age (years)-(21.53 in males or 25.68 in females). This formula was validated using a split-half cross-validation technique and also externally validated with a new cohort of 127 children. Results show that the Schwartz formula is accurate until a height (Ht)/serum creatinine value of 251, corresponding to an iGFR of 103 ml/min × 1.73 m(2), but significantly unreliable for higher values. For an accuracy of 20 percent, the quadratic formula was significantly better than the Schwartz formula for all patients and for patients with a Ht/serum creatinine of 251 or greater. Thus, the new quadratic formula could replace the revised Schwartz formula, which is accurate for children with moderate renal failure but not for those with less renal impairment or hyperfiltration.
Resumo:
Neuroimaging studies typically compare experimental conditions using average brain responses, thereby overlooking the stimulus-related information conveyed by distributed spatio-temporal patterns of single-trial responses. Here, we take advantage of this rich information at a single-trial level to decode stimulus-related signals in two event-related potential (ERP) studies. Our method models the statistical distribution of the voltage topographies with a Gaussian Mixture Model (GMM), which reduces the dataset to a number of representative voltage topographies. The degree of presence of these topographies across trials at specific latencies is then used to classify experimental conditions. We tested the algorithm using a cross-validation procedure in two independent EEG datasets. In the first ERP study, we classified left- versus right-hemifield checkerboard stimuli for upper and lower visual hemifields. In a second ERP study, when functional differences cannot be assumed, we classified initial versus repeated presentations of visual objects. With minimal a priori information, the GMM model provides neurophysiologically interpretable features - vis à vis voltage topographies - as well as dynamic information about brain function. This method can in principle be applied to any ERP dataset testing the functional relevance of specific time periods for stimulus processing, the predictability of subject's behavior and cognitive states, and the discrimination between healthy and clinical populations.
Resumo:
BACKGROUND AND OBJECTIVES: The estimated GFR (eGFR) is important in clinical practice. To find the best formula for eGFR, this study assessed the best model of correlation between sinistrin clearance (iGFR) and the solely or combined cystatin C (CysC)- and serum creatinine (SCreat)-derived models. It also evaluated the accuracy of the combined Schwartz formula across all GFR levels. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: Two hundred thirty-eight iGFRs performed between January 2012 and April 2013 for 238 children were analyzed. Regression techniques were used to fit the different equations used for eGFR (i.e., logarithmic, inverse, linear, and quadratic). The performance of each model was evaluated using the Cohen κ correlation coefficient and the percentage reaching 30% accuracy was calculated. RESULTS: The best model of correlation between iGFRs and CysC is linear; however, it presents a low κ coefficient (0.24) and is far below the Kidney Disease Outcomes Quality Initiative targets to be validated, with only 84% of eGFRs reaching accuracy of 30%. SCreat and iGFRs showed the best correlation in a fitted quadratic model with a κ coefficient of 0.53 and 93% accuracy. Adding CysC significantly (P<0.001) increased the κ coefficient to 0.56 and the quadratic model accuracy to 97%. Therefore, a combined SCreat and CysC quadratic formula was derived and internally validated using the cross-validation technique. This quadratic formula significantly outperformed the combined Schwartz formula, which was biased for an iGFR≥91 ml/min per 1.73 m(2). CONCLUSIONS: This study allowed deriving a new combined SCreat and CysC quadratic formula that could replace the combined Schwartz formula, which is accurate only for children with moderate chronic kidney disease.