117 resultados para Cheminformatics, (Q)SAR, Cross-Validation, Visualization, 3D-Space
em Université de Lausanne, Switzerland
Resumo:
BACKGROUND/OBJECTIVES: (1) To cross-validate tetra- (4-BIA) and octopolar (8-BIA) bioelectrical impedance analysis vs dual-energy X-ray absorptiometry (DXA) for the assessment of total and appendicular body composition and (2) to evaluate the accuracy of external 4-BIA algorithms for the prediction of total body composition, in a representative sample of Swiss children. SUBJECTS/METHODS: A representative sample of 333 Swiss children aged 6-13 years from the Kinder-Sportstudie (KISS) (ISRCTN15360785). Whole-body fat-free mass (FFM) and appendicular lean tissue mass were measured with DXA. Body resistance (R) was measured at 50 kHz with 4-BIA and segmental body resistance at 5, 50, 250 and 500 kHz with 8-BIA. The resistance index (RI) was calculated as height(2)/R. Selection of predictors (gender, age, weight, RI4 and RI8) for BIA algorithms was performed using bootstrapped stepwise linear regression on 1000 samples. We calculated 95% confidence intervals (CI) of regression coefficients and measures of model fit using bootstrap analysis. Limits of agreement were used as measures of interchangeability of BIA with DXA. RESULTS: 8-BIA was more accurate than 4-BIA for the assessment of FFM (root mean square error (RMSE)=0.90 (95% CI 0.82-0.98) vs 1.12 kg (1.01-1.24); limits of agreement 1.80 to -1.80 kg vs 2.24 to -2.24 kg). 8-BIA also gave accurate estimates of appendicular body composition, with RMSE < or = 0.10 kg for arms and < or = 0.24 kg for legs. All external 4-BIA algorithms performed poorly with substantial negative proportional bias (r> or = 0.48, P<0.001). CONCLUSIONS: In a representative sample of young Swiss children (1) 8-BIA was superior to 4-BIA for the prediction of FFM, (2) external 4-BIA algorithms gave biased predictions of FFM and (3) 8-BIA was an accurate predictor of segmental body composition.
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
In the last five years, Deep Brain Stimulation (DBS) has become the most popular and effective surgical technique for the treatent of Parkinson's disease (PD). The Subthalamic Nucleus (STN) is the usual target involved when applying DBS. Unfortunately, the STN is in general not visible in common medical imaging modalities. Therefore, atlas-based segmentation is commonly considered to locate it in the images. In this paper, we propose a scheme that allows both, to perform a comparison between different registration algorithms and to evaluate their ability to locate the STN automatically. Using this scheme we can evaluate the expert variability against the error of the algorithms and we demonstrate that automatic STN location is possible and as accurate as the methods currently used.
Resumo:
In the last five years, Deep Brain Stimulation (DBS) has become the most popular and effective surgical technique for the treatent of Parkinson's disease (PD). The Subthalamic Nucleus (STN) is the usual target involved when applying DBS. Unfortunately, the STN is in general not visible in common medical imaging modalities. Therefore, atlas-based segmentation is commonly considered to locate it in the images. In this paper, we propose a scheme that allows both, to perform a comparison between different registration algorithms and to evaluate their ability to locate the STN automatically. Using this scheme we can evaluate the expert variability against the error of the algorithms and we demonstrate that automatic STN location is possible and as accurate as the methods currently used.
Resumo:
PURPOSE: Proper delineation of ocular anatomy in 3-dimensional (3D) imaging is a big challenge, particularly when developing treatment plans for ocular diseases. Magnetic resonance imaging (MRI) is presently used in clinical practice for diagnosis confirmation and treatment planning for treatment of retinoblastoma in infants, where it serves as a source of information, complementary to the fundus or ultrasonographic imaging. Here we present a framework to fully automatically segment the eye anatomy for MRI based on 3D active shape models (ASM), and we validate the results and present a proof of concept to automatically segment pathological eyes. METHODS AND MATERIALS: Manual and automatic segmentation were performed in 24 images of healthy children's eyes (3.29 ± 2.15 years of age). Imaging was performed using a 3-T MRI scanner. The ASM consists of the lens, the vitreous humor, the sclera, and the cornea. The model was fitted by first automatically detecting the position of the eye center, the lens, and the optic nerve, and then aligning the model and fitting it to the patient. We validated our segmentation method by using a leave-one-out cross-validation. The segmentation results were evaluated by measuring the overlap, using the Dice similarity coefficient (DSC) and the mean distance error. RESULTS: We obtained a DSC of 94.90 ± 2.12% for the sclera and the cornea, 94.72 ± 1.89% for the vitreous humor, and 85.16 ± 4.91% for the lens. The mean distance error was 0.26 ± 0.09 mm. The entire process took 14 seconds on average per eye. CONCLUSION: We provide a reliable and accurate tool that enables clinicians to automatically segment the sclera, the cornea, the vitreous humor, and the lens, using MRI. We additionally present a proof of concept for fully automatically segmenting eye pathology. This tool reduces the time needed for eye shape delineation and thus can help clinicians when planning eye treatment and confirming the extent of the tumor.
Resumo:
A fully-automated 3D image analysis method is proposed to segment lung nodules in HRCT. A specific gray-level mathematical morphology operator, the SMDC-connection cost, acting in the 3D space of the thorax volume is defined in order to discriminate lung nodules from other dense (vascular) structures. Applied to clinical data concerning patients with pulmonary carcinoma, the proposed method detects isolated, juxtavascular and peripheral nodules with sizes ranging from 2 to 20 mm diameter. The segmentation accuracy was objectively evaluated on real and simulated nodules. The method showed a sensitivity and a specificity ranging from 85% to 97% and from 90% to 98%, respectively.
Resumo:
PURPOSE: Ocular anatomy and radiation-associated toxicities provide unique challenges for external beam radiation therapy. For treatment planning, precise modeling of organs at risk and tumor volume are crucial. Development of a precise eye model and automatic adaptation of this model to patients' anatomy remain problematic because of organ shape variability. This work introduces the application of a 3-dimensional (3D) statistical shape model as a novel method for precise eye modeling for external beam radiation therapy of intraocular tumors. METHODS AND MATERIALS: Manual and automatic segmentations were compared for 17 patients, based on head computed tomography (CT) volume scans. A 3D statistical shape model of the cornea, lens, and sclera as well as of the optic disc position was developed. Furthermore, an active shape model was built to enable automatic fitting of the eye model to CT slice stacks. Cross-validation was performed based on leave-one-out tests for all training shapes by measuring dice coefficients and mean segmentation errors between automatic segmentation and manual segmentation by an expert. RESULTS: Cross-validation revealed a dice similarity of 95% ± 2% for the sclera and cornea and 91% ± 2% for the lens. Overall, mean segmentation error was found to be 0.3 ± 0.1 mm. Average segmentation time was 14 ± 2 s on a standard personal computer. CONCLUSIONS: Our results show that the solution presented outperforms state-of-the-art methods in terms of accuracy, reliability, and robustness. Moreover, the eye model shape as well as its variability is learned from a training set rather than by making shape assumptions (eg, as with the spherical or elliptical model). Therefore, the model appears to be capable of modeling nonspherically and nonelliptically shaped eyes.
Resumo:
Recent studies have started to use media data to measure party positions and issue salience. The aim of this article is to compare and cross-validate this alternative approach with the more commonly used party manifestos, expert judgments and mass surveys. To this purpose, we present two methods to generate indicators of party positions and issue salience from media coverage: the core sentence approach and political claims analysis. Our cross-validation shows that with regard to party positions, indicators derived from the media converge with traditionally used measurements from party manifestos, mass surveys and expert judgments, but that salience indicators measure different underlying constructs. We conclude with a discussion of specific research questions for which media data offer potential advantages over more established methods.
Resumo:
The paper deals with the development and application of the methodology for automatic mapping of pollution/contamination data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve this problem. The automatic tuning of isotropic and an anisotropic GRNN model using cross-validation procedure is presented. Results are compared with k-nearest-neighbours interpolation algorithm using independent validation data set. Quality of mapping is controlled by the analysis of raw data and the residuals using variography. Maps of probabilities of exceeding a given decision level and ?thick? isoline visualization of the uncertainties are presented as examples of decision-oriented mapping. Real case study is based on mapping of radioactively contaminated territories.
Resumo:
Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.
Resumo:
BACKGROUND: Little information is available on the validity of simple and indirect body-composition methods in non-Western populations. Equations for predicting body composition are population-specific, and body composition differs between blacks and whites. OBJECTIVE: We tested the hypothesis that the validity of equations for predicting total body water (TBW) from bioelectrical impedance analysis measurements is likely to depend on the racial background of the group from which the equations were derived. DESIGN: The hypothesis was tested by comparing, in 36 African women, TBW values measured by deuterium dilution with those predicted by 23 equations developed in white, African American, or African subjects. These cross-validations in our African sample were also compared, whenever possible, with results from other studies in black subjects. RESULTS: Errors in predicting TBW showed acceptable values (1.3-1.9 kg) in all cases, whereas a large range of bias (0.2-6.1 kg) was observed independently of the ethnic origin of the sample from which the equations were derived. Three equations (2 from whites and 1 from blacks) showed nonsignificant bias and could be used in Africans. In all other cases, we observed either an overestimation or underestimation of TBW with variable bias values, regardless of racial background, yielding no clear trend for validity as a function of ethnic origin. CONCLUSIONS: The findings of this cross-validation study emphasize the need for further fundamental research to explore the causes of the poor validity of TBW prediction equations across populations rather than the need to develop new prediction equations for use in Africa.
Resumo:
Models predicting species spatial distribution are increasingly applied to wildlife management issues, emphasising the need for reliable methods to evaluate the accuracy of their predictions. As many available datasets (e.g. museums, herbariums, atlas) do not provide reliable information about species absences, several presence-only based analyses have been developed. However, methods to evaluate the accuracy of their predictions are few and have never been validated. The aim of this paper is to compare existing and new presenceonly evaluators to usual presence/absence measures. We use a reliable, diverse, presence/absence dataset of 114 plant species to test how common presence/absence indices (Kappa, MaxKappa, AUC, adjusted D-2) compare to presenceonly measures (AVI, CVI, Boyce index) for evaluating generalised linear models (GLM). Moreover we propose a new, threshold-independent evaluator, which we call "continuous Boyce index". All indices were implemented in the B10MAPPER software. We show that the presence-only evaluators are fairly correlated (p > 0.7) to the presence/absence ones. The Boyce indices are closer to AUC than to MaxKappa and are fairly insensitive to species prevalence. In addition, the Boyce indices provide predicted-toexpected ratio curves that offer further insights into the model quality: robustness, habitat suitability resolution and deviation from randomness. This information helps reclassifying predicted maps into meaningful habitat suitability classes. The continuous Boyce index is thus both a complement to usual evaluation of presence/absence models and a reliable measure of presence-only based predictions.
Resumo:
In occupational exposure assessment of airborne contaminants, exposure levels can either be estimated through repeated measurements of the pollutant concentration in air, expert judgment or through exposure models that use information on the conditions of exposure as input. In this report, we propose an empirical hierarchical Bayesian model to unify these approaches. Prior to any measurement, the hygienist conducts an assessment to generate prior distributions of exposure determinants. Monte-Carlo samples from these distributions feed two level-2 models: a physical, two-compartment model, and a non-parametric, neural network model trained with existing exposure data. The outputs of these two models are weighted according to the expert's assessment of their relevance to yield predictive distributions of the long-term geometric mean and geometric standard deviation of the worker's exposure profile (level-1 model). Bayesian inferences are then drawn iteratively from subsequent measurements of worker exposure. Any traditional decision strategy based on a comparison with occupational exposure limits (e.g. mean exposure, exceedance strategies) can then be applied. Data on 82 workers exposed to 18 contaminants in 14 companies were used to validate the model with cross-validation techniques. A user-friendly program running the model is available upon request.
Resumo:
OBJECTIVE: Mild neurocognitive disorders (MND) affect a subset of HIV+ patients under effective combination antiretroviral therapy (cART). In this study, we used an innovative multi-contrast magnetic resonance imaging (MRI) approach at high-field to assess the presence of micro-structural brain alterations in MND+ patients. METHODS: We enrolled 17 MND+ and 19 MND- patients with undetectable HIV-1 RNA and 19 healthy controls (HC). MRI acquisitions at 3T included: MP2RAGE for T1 relaxation times, Magnetization Transfer (MT), T2* and Susceptibility Weighted Imaging (SWI) to probe micro-structural integrity and iron deposition in the brain. Statistical analysis used permutation-based tests and correction for family-wise error rate. Multiple regression analysis was performed between MRI data and (i) neuropsychological results (ii) HIV infection characteristics. A linear discriminant analysis (LDA) based on MRI data was performed between MND+ and MND- patients and cross-validated with a leave-one-out test. RESULTS: Our data revealed loss of structural integrity and micro-oedema in MND+ compared to HC in the global white and cortical gray matter, as well as in the thalamus and basal ganglia. Multiple regression analysis showed a significant influence of sub-cortical nuclei alterations on the executive index of MND+ patients (p = 0.04 he and R(2) = 95.2). The LDA distinguished MND+ and MND- patients with a classification quality of 73% after cross-validation. CONCLUSION: Our study shows micro-structural brain tissue alterations in MND+ patients under effective therapy and suggests that multi-contrast MRI at high field is a powerful approach to discriminate between HIV+ patients on cART with and without mild neurocognitive deficits.
Resumo:
Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp). Results Model selection based on cross-validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross-validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.