44 resultados para Analisi Discriminante, Teoria dei Network, Cross-Validation, Validazione.
Resumo:
The most widely used formula for estimating glomerular filtration rate (eGFR) in children is the Schwartz formula. It was revised in 2009 using iohexol clearances with measured GFR (mGFR) ranging between 15 and 75 ml/min × 1.73 m(2). Here we assessed the accuracy of the Schwartz formula using the inulin clearance (iGFR) method to evaluate its accuracy for children with less renal impairment comparing 551 iGFRs of 392 children with their Schwartz eGFRs. Serum creatinine was measured using the compensated Jaffe method. In order to find the best relationship between iGFR and eGFR, a linear quadratic regression model was fitted and a more accurate formula was derived. This quadratic formula was: 0.68 × (Height (cm)/serum creatinine (mg/dl))-0.0008 × (height (cm)/serum creatinine (mg/dl))(2)+0.48 × age (years)-(21.53 in males or 25.68 in females). This formula was validated using a split-half cross-validation technique and also externally validated with a new cohort of 127 children. Results show that the Schwartz formula is accurate until a height (Ht)/serum creatinine value of 251, corresponding to an iGFR of 103 ml/min × 1.73 m(2), but significantly unreliable for higher values. For an accuracy of 20 percent, the quadratic formula was significantly better than the Schwartz formula for all patients and for patients with a Ht/serum creatinine of 251 or greater. Thus, the new quadratic formula could replace the revised Schwartz formula, which is accurate for children with moderate renal failure but not for those with less renal impairment or hyperfiltration.
Resumo:
Neuroimaging studies typically compare experimental conditions using average brain responses, thereby overlooking the stimulus-related information conveyed by distributed spatio-temporal patterns of single-trial responses. Here, we take advantage of this rich information at a single-trial level to decode stimulus-related signals in two event-related potential (ERP) studies. Our method models the statistical distribution of the voltage topographies with a Gaussian Mixture Model (GMM), which reduces the dataset to a number of representative voltage topographies. The degree of presence of these topographies across trials at specific latencies is then used to classify experimental conditions. We tested the algorithm using a cross-validation procedure in two independent EEG datasets. In the first ERP study, we classified left- versus right-hemifield checkerboard stimuli for upper and lower visual hemifields. In a second ERP study, when functional differences cannot be assumed, we classified initial versus repeated presentations of visual objects. With minimal a priori information, the GMM model provides neurophysiologically interpretable features - vis à vis voltage topographies - as well as dynamic information about brain function. This method can in principle be applied to any ERP dataset testing the functional relevance of specific time periods for stimulus processing, the predictability of subject's behavior and cognitive states, and the discrimination between healthy and clinical populations.
Resumo:
BACKGROUND AND OBJECTIVES: The estimated GFR (eGFR) is important in clinical practice. To find the best formula for eGFR, this study assessed the best model of correlation between sinistrin clearance (iGFR) and the solely or combined cystatin C (CysC)- and serum creatinine (SCreat)-derived models. It also evaluated the accuracy of the combined Schwartz formula across all GFR levels. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: Two hundred thirty-eight iGFRs performed between January 2012 and April 2013 for 238 children were analyzed. Regression techniques were used to fit the different equations used for eGFR (i.e., logarithmic, inverse, linear, and quadratic). The performance of each model was evaluated using the Cohen κ correlation coefficient and the percentage reaching 30% accuracy was calculated. RESULTS: The best model of correlation between iGFRs and CysC is linear; however, it presents a low κ coefficient (0.24) and is far below the Kidney Disease Outcomes Quality Initiative targets to be validated, with only 84% of eGFRs reaching accuracy of 30%. SCreat and iGFRs showed the best correlation in a fitted quadratic model with a κ coefficient of 0.53 and 93% accuracy. Adding CysC significantly (P<0.001) increased the κ coefficient to 0.56 and the quadratic model accuracy to 97%. Therefore, a combined SCreat and CysC quadratic formula was derived and internally validated using the cross-validation technique. This quadratic formula significantly outperformed the combined Schwartz formula, which was biased for an iGFR≥91 ml/min per 1.73 m(2). CONCLUSIONS: This study allowed deriving a new combined SCreat and CysC quadratic formula that could replace the combined Schwartz formula, which is accurate only for children with moderate chronic kidney disease.
Resumo:
1. Identifying those areas suitable for recolonization by threatened species is essential to support efficient conservation policies. Habitat suitability models (HSM) predict species' potential distributions, but the quality of their predictions should be carefully assessed when the species-environment equilibrium assumption is violated.2. We studied the Eurasian otter Lutra lutra, whose numbers are recovering in southern Italy. To produce widely applicable results, we chose standard HSM procedures and looked for the models' capacities in predicting the suitability of a recolonization area. We used two fieldwork datasets: presence-only data, used in the Ecological Niche Factor Analyses (ENFA), and presence-absence data, used in a Generalized Linear Model (GLM). In addition to cross-validation, we independently evaluated the models with data from a recolonization event, providing presences on a previously unoccupied river.3. Three of the models successfully predicted the suitability of the recolonization area, but the GLM built with data before the recolonization disagreed with these predictions, missing the recolonized river's suitability and badly describing the otter's niche. Our results highlighted three points of relevance to modelling practices: (1) absences may prevent the models from correctly identifying areas suitable for a species spread; (2) the selection of variables may lead to randomness in the predictions; and (3) the Area Under Curve (AUC), a commonly used validation index, was not well suited to the evaluation of model quality, whereas the Boyce Index (CBI), based on presence data only, better highlighted the models' fit to the recolonization observations.4. For species with unstable spatial distributions, presence-only models may work better than presence-absence methods in making reliable predictions of suitable areas for expansion. An iterative modelling process, using new occurrences from each step of the species spread, may also help in progressively reducing errors.5. Synthesis and applications. Conservation plans depend on reliable models of the species' suitable habitats. In non-equilibrium situations, such as the case for threatened or invasive species, models could be affected negatively by the inclusion of absence data when predicting the areas of potential expansion. Presence-only methods will here provide a better basis for productive conservation management practices.
Resumo:
PURPOSE: Ocular anatomy and radiation-associated toxicities provide unique challenges for external beam radiation therapy. For treatment planning, precise modeling of organs at risk and tumor volume are crucial. Development of a precise eye model and automatic adaptation of this model to patients' anatomy remain problematic because of organ shape variability. This work introduces the application of a 3-dimensional (3D) statistical shape model as a novel method for precise eye modeling for external beam radiation therapy of intraocular tumors. METHODS AND MATERIALS: Manual and automatic segmentations were compared for 17 patients, based on head computed tomography (CT) volume scans. A 3D statistical shape model of the cornea, lens, and sclera as well as of the optic disc position was developed. Furthermore, an active shape model was built to enable automatic fitting of the eye model to CT slice stacks. Cross-validation was performed based on leave-one-out tests for all training shapes by measuring dice coefficients and mean segmentation errors between automatic segmentation and manual segmentation by an expert. RESULTS: Cross-validation revealed a dice similarity of 95% ± 2% for the sclera and cornea and 91% ± 2% for the lens. Overall, mean segmentation error was found to be 0.3 ± 0.1 mm. Average segmentation time was 14 ± 2 s on a standard personal computer. CONCLUSIONS: Our results show that the solution presented outperforms state-of-the-art methods in terms of accuracy, reliability, and robustness. Moreover, the eye model shape as well as its variability is learned from a training set rather than by making shape assumptions (eg, as with the spherical or elliptical model). Therefore, the model appears to be capable of modeling nonspherically and nonelliptically shaped eyes.
Resumo:
The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.
Resumo:
OBJECTIVE: To develop and validate a simple, integer-based score to predict functional outcome in acute ischemic stroke (AIS) using variables readily available after emergency room admission. METHODS: Logistic regression was performed in the derivation cohort of previously independent patients with AIS (Acute Stroke Registry and Analysis of Lausanne [ASTRAL]) to identify predictors of unfavorable outcome (3-month modified Rankin Scale score >2). An integer-based point-scoring system for each covariate of the fitted multivariate model was generated by their β-coefficients; the overall score was calculated as the sum of the weighted scores. The model was validated internally using a 2-fold cross-validation technique and externally in 2 independent cohorts (Athens and Vienna Stroke Registries). RESULTS: Age (A), severity of stroke (S) measured by admission NIH Stroke Scale score, stroke onset to admission time (T), range of visual fields (R), acute glucose (A), and level of consciousness (L) were identified as independent predictors of unfavorable outcome in 1,645 patients in ASTRAL. Their β-coefficients were multiplied by 4 and rounded to the closest integer to generate the score. The area under the receiver operating characteristic curve (AUC) of the score in the ASTRAL cohort was 0.850. The score was well calibrated in the derivation (p = 0.43) and validation cohorts (0.22 [Athens, n = 1,659] and 0.49 [Vienna, n = 653]). AUCs were 0.937 (Athens), 0.771 (Vienna), and 0.902 (when pooled). An ASTRAL score of 31 indicates a 50% likelihood of unfavorable outcome. CONCLUSIONS: The ASTRAL score is a simple integer-based score to predict functional outcome using 6 readily available items at hospital admission. It performed well in double external validation and may be a useful tool for clinical practice and stroke research.
Resumo:
Risk maps summarizing landscape suitability of novel areas for invading species can be valuable tools for preventing species' invasions or controlling their spread, but methods employed for development of such maps remain variable and unstandardized. We discuss several considerations in development of such models, including types of distributional information that should be used, the nature of explanatory variables that should be incorporated, and caveats regarding model testing and evaluation. We highlight that, in the case of invasive species, such distributional predictions should aim to derive the best hypothesis of the potential distribution of the species by using (1) all distributional information available, including information from both the native range and other invaded regions; (2) predictors linked as directly as is feasible to the physiological requirements of the species; and (3) modelling procedures that carefully avoid overfitting to the training data. Finally, model testing and evaluation should focus on well-predicted presences, and less on efficient prediction of absences; a k-fold regional cross-validation test is discussed.
Resumo:
We present a new indicator taxa approach to the prediction of climate change effects on biodiversity at the national level in Switzerland. As indicators, we select a set of the most widely distributed species that account for 95% of geographical variation in sampled species richness of birds, butterflies, and vascular plants. Species data come from a national program designed to monitor spatial and temporal trends in species richness. We examine some opportunities and limitations in using these data. We develop ecological niche models for the species as functions of both climate and land cover variables. We project these models to the future using climate predictions that correspond to two IPCC 3rd assessment scenarios for the development of 'greenhouse' gas emissions. We find that models that are calibrated with Swiss national monitoring data perform well in 10-fold cross-validation, but can fail to capture the hot-dry end of environmental gradients that constrain some species distributions. Models for indicator species in all three higher taxa predict that climate change will result in turnover in species composition even where there is little net change in predicted species richness. Indicator species from high elevations lose most areas of suitable climate even under the relatively mild B2 scenario. We project some areas to increase in the number of species for which climate conditions are suitable early in the current century, but these areas become less suitable for a majority of species by the end of the century. Selection of indicator species based on rank prevalence results in a set of models that predict observed species richness better than a similar set of species selected based on high rank of model AUC values. An indicator species approach based on selected species that are relatively common may facilitate the use of national monitoring data for predicting climate change effects on the distribution of biodiversity.
Resumo:
OBJECTIVES: To test the validity of a simple, rapid, field-adapted, portable hand-held impedancemeter (HHI) for the estimation of lean body mass (LBM) and percentage body fat (%BF) in African women, and to develop specific predictive equations. DESIGN: Cross-sectional observational study. SETTINGS: Dakar, the capital city of Senegal, West Africa. SUBJECTS: A total sample of 146 women volunteered. Their mean age was of 31.0 y (s.d. 9.1), weight 60.9 kg (s.d. 13.1) and BMI 22.6 kg/m(2) (s.d. 4.5). METHODS: Body composition values estimated by HHI were compared to those measured by whole body densitometry performed by air displacement plethysmography (ADP). The specific density of LBM in black subjects was taken into account for the calculation of %BF from body density. RESULTS: : Estimations from HHI showed a large bias (mean difference) of 5.6 kg LBM (P<10(-4)) and -8.8 %BF (P<10(-4)) and errors (s.d. of the bias) of 2.6 kg LBM and 3.7 %BF. In order to correct for the bias, specific predictive equations were developed. With the HHI result as a single predictor, error values were of 1.9 kg LBM and 3.7 %BF in the prediction group (n=100), and of 2.2 kg LBM and 3.6 %BF in the cross-validation group (n=46). Addition of anthropometrical predictors was not necessary. CONCLUSIONS: The HHI analyser significantly overestimated LBM and underestimated %BF in African women. After correction for the bias, the body compartments could easily be estimated in African women by using the HHI result in an appropriate prediction equation with a good precision. It remains to be seen whether a combination of arm and leg impedancemetry in order to take into account lower limbs would further improve the prediction of body composition in Africans.
Resumo:
This paper presents the general regression neural networks (GRNN) as a nonlinear regression method for the interpolation of monthly wind speeds in complex Alpine orography. GRNN is trained using data coming from Swiss meteorological networks to learn the statistical relationship between topographic features and wind speed. The terrain convexity, slope and exposure are considered by extracting features from the digital elevation model at different spatial scales using specialised convolution filters. A database of gridded monthly wind speeds is then constructed by applying GRNN in prediction mode during the period 1968-2008. This study demonstrates that using topographic features as inputs in GRNN significantly reduces cross-validation errors with respect to low-dimensional models integrating only geographical coordinates and terrain height for the interpolation of wind speed. The spatial predictability of wind speed is found to be lower in summer than in winter due to more complex and weaker wind-topography relationships. The relevance of these relationships is studied using an adaptive version of the GRNN algorithm which allows to select the useful terrain features by eliminating the noisy ones. This research provides a framework for extending the low-dimensional interpolation models to high-dimensional spaces by integrating additional features accounting for the topographic conditions at multiple spatial scales. Copyright (c) 2012 Royal Meteorological Society.
Resumo:
Genetic variants influence the risk to develop certain diseases or give rise to differences in drug response. Recent progresses in cost-effective, high-throughput genome-wide techniques, such as microarrays measuring Single Nucleotide Polymorphisms (SNPs), have facilitated genotyping of large clinical and population cohorts. Combining the massive genotypic data with measurements of phenotypic traits allows for the determination of genetic differences that explain, at least in part, the phenotypic variations within a population. So far, models combining the most significant variants can only explain a small fraction of the variance, indicating the limitations of current models. In particular, researchers have only begun to address the possibility of interactions between genotypes and the environment. Elucidating the contributions of such interactions is a difficult task because of the large number of genetic as well as possible environmental factors.In this thesis, I worked on several projects within this context. My first and main project was the identification of possible SNP-environment interactions, where the phenotypes were serum lipid levels of patients from the Swiss HIV Cohort Study (SHCS) treated with antiretroviral therapy. Here the genotypes consisted of a limited set of SNPs in candidate genes relevant for lipid transport and metabolism. The environmental variables were the specific combinations of drugs given to each patient over the treatment period. My work explored bioinformatic and statistical approaches to relate patients' lipid responses to these SNPs, drugs and, importantly, their interactions. The goal of this project was to improve our understanding and to explore the possibility of predicting dyslipidemia, a well-known adverse drug reaction of antiretroviral therapy. Specifically, I quantified how much of the variance in lipid profiles could be explained by the host genetic variants, the administered drugs and SNP-drug interactions and assessed the predictive power of these features on lipid responses. Using cross-validation stratified by patients, we could not validate our hypothesis that models that select a subset of SNP-drug interactions in a principled way have better predictive power than the control models using "random" subsets. Nevertheless, all models tested containing SNP and/or drug terms, exhibited significant predictive power (as compared to a random predictor) and explained a sizable proportion of variance, in the patient stratified cross-validation context. Importantly, the model containing stepwise selected SNP terms showed higher capacity to predict triglyceride levels than a model containing randomly selected SNPs. Dyslipidemia is a complex trait for which many factors remain to be discovered, thus missing from the data, and possibly explaining the limitations of our analysis. In particular, the interactions of drugs with SNPs selected from the set of candidate genes likely have small effect sizes which we were unable to detect in a sample of the present size (<800 patients).In the second part of my thesis, I performed genome-wide association studies within the Cohorte Lausannoise (CoLaus). I have been involved in several international projects to identify SNPs that are associated with various traits, such as serum calcium, body mass index, two-hour glucose levels, as well as metabolic syndrome and its components. These phenotypes are all related to major human health issues, such as cardiovascular disease. I applied statistical methods to detect new variants associated with these phenotypes, contributing to the identification of new genetic loci that may lead to new insights into the genetic basis of these traits. This kind of research will lead to a better understanding of the mechanisms underlying these pathologies, a better evaluation of disease risk, the identification of new therapeutic leads and may ultimately lead to the realization of "personalized" medicine.
Resumo:
BACKGROUND: Only few countries have cohorts enabling specific and up-to-date cardiovascular disease (CVD) risk estimation. Individual risk assessment based on study samples that differ too much from the target population could jeopardize the benefit of risk charts in general practice. Our aim was to provide up-to-date and valid CVD risk estimation for a Swiss population using a novel record linkage approach. METHODS: Anonymous record linkage was used to follow-up (for mortality, until 2008) 9,853 men and women aged 25-74 years who participated in the Swiss MONICA (MONItoring of trends and determinants in CVD) study of 1983-92. The linkage success was 97.8%, loss to follow-up 1990-2000 was 4.7%. Based on the ESC SCORE methodology (Weibull regression), we used age, sex, blood pressure, smoking, and cholesterol to generate three models. We compared the 1) original SCORE model with a 2) recalibrated and a 3) new model using the Brier score (BS) and cross-validation. RESULTS: Based on the cross-validated BS, the new model (BS = 14107×10(-6)) was somewhat more appropriate for risk estimation than the original (BS = 14190×10(-6)) and the recalibrated (BS = 14172×10(-6)) model. Particularly at younger age, derived absolute risks were consistently lower than those from the original and the recalibrated model which was mainly due to a smaller impact of total cholesterol. CONCLUSION: Using record linkage of observational and routine data is an efficient procedure to obtain valid and up-to-date CVD risk estimates for a specific population.
Resumo:
Multiple sclerosis (MS), a variable and diffuse disease affecting white and gray matter, is known to cause functional connectivity anomalies in patients. However, related studies published to-date are post hoc; our hypothesis was that such alterations could discriminate between patients and healthy controls in a predictive setting, laying the groundwork for imaging-based prognosis. Using functional magnetic resonance imaging resting state data of 22 minimally disabled MS patients and 14 controls, we developed a predictive model of connectivity alterations in MS: a whole-brain connectivity matrix was built for each subject from the slow oscillations (<0.11Hz) of region-averaged time series, and a pattern recognition technique was used to learn a discriminant function indicating which particular functional connections are most affected by disease. Classification performance using strict cross-validation yielded a sensitivity of 82% (above chance at p<0.005) and specificity of 86% (p<0.01) to distinguish between MS patients and controls. The most discriminative connectivity changes were found in subcortical and temporal regions, and contralateral connections were more discriminative than ipsilateral connections. The pattern of decreased discriminative connections can be summarized post hoc in an index that correlates positively (ρ=0.61) with white matter lesion load, possibly indicating functional reorganisation to cope with increasing lesion load. These results are consistent with a subtle but widespread impact of lesions in white matter and in gray matter structures serving as high-level integrative hubs. These findings suggest that predictive models of resting state fMRI can reveal specific anomalies due to MS with high sensitivity and specificity, potentially leading to new non-invasive markers.
Resumo:
PURPOSE: To improve the risk stratification of patients with rhabdomyosarcoma (RMS) through the use of clinical and molecular biologic data. PATIENTS AND METHODS: Two independent data sets of gene-expression profiling for 124 and 101 patients with RMS were used to derive prognostic gene signatures by using a meta-analysis. These and a previously published metagene signature were evaluated by using cross validation analyses. A combined clinical and molecular risk-stratification scheme that incorporated the PAX3/FOXO1 fusion gene status was derived from 287 patients with RMS and evaluated. RESULTS: We showed that our prognostic gene-expression signature and the one previously published performed well with reproducible and significant effects. However, their effect was reduced when cross validated or tested in independent data and did not add new prognostic information over the fusion gene status, which is simpler to assay. Among nonmetastatic patients, patients who were PAX3/FOXO1 positive had a significantly poorer outcome compared with both alveolar-negative and PAX7/FOXO1-positive patients. Furthermore, a new clinicomolecular risk score that incorporated fusion gene status (negative and PAX3/FOXO1 and PAX7/FOXO1 positive), Intergroup Rhabdomyosarcoma Study TNM stage, and age showed a significant increase in performance over the current risk-stratification scheme. CONCLUSION: Gene signatures can improve current stratification of patients with RMS but will require complex assays to be developed and extensive validation before clinical application. A significant majority of their prognostic value was encapsulated by the fusion gene status. A continuous risk score derived from the combination of clinical parameters with the presence or absence of PAX3/FOXO1 represents a robust approach to improving current risk-adapted therapy for RMS.