879 resultados para Predictive regression
Resumo:
While for many years the diagnosis and therapy of colon cancer did not change drastically, recently new drugs (irinotecan and oxaliplatin, used in adjuvant or neo-adjuvant approaches) and even more recently the introduction of therapies targeting the epidermal growth factor receptor (EGFR) through the monoclonal antibodies cetuximab and panitumumab, are revolutionizing the field. The finding that only patients with a tumor with a wild type (non mutated) KRAS gene respond to anti-EGFR therapy has also affected the way pathologists address colorectal cancer. Molecular analysis of the KRAS gene has become almost a routine in a very short period of time. Pathologists will have to be prepared for a new era: from standard morphology based diagnostic procedures to the prediction of response to therapy using molecular tools.
Resumo:
Summary. Genetic polymorphisms near IL28B are associated with spontaneous and treatment-induced clearance of hepatitis C virus (HCV). Our objective was to assess the predictive value of IL28B polymorphisms in the treatment of chronic hepatitis C of patients with HCV genotypes 4, for which data are currently limited. We analysed the association of IL28B polymorphisms with the virological response to treatment among 182 naïve chronic hepatitis C patients with HCV genotype 4, all from Syria. Associations of alleles with the response patterns were evaluated by univariate analysis and multivariate logistic regression, accounting for all relevant covariates. Sustained virological response (SVR) was achieved in 26% of rs8099917 TG/GG carriers compared with 60% of TT carriers (P < 0.0001) and 35% of rs12979860 CT/TT carriers compared with 62% of CC carriers (P = 0.0011). By multivariate analysis, the association between rs8099917 and SVR remained significant (OR = 0.19, 95% CI 0.07-0.50, for TG/GG vs TT, P = 0.0007), with the only significant covariate being advanced fibrosis (OR = 0.13, 95% CI 0.04-0.37, P = 0.0002). In conclusion, IL28B polymorphisms are the strongest predictors of response to therapy among chronic hepatitis C patients with HCV genotype 4.
Resumo:
The relationship between inflammation and cancer is well established in several tumor types, including bladder cancer. We performed an association study between 886 inflammatory-gene variants and bladder cancer risk in 1,047 cases and 988 controls from the Spanish Bladder Cancer (SBC)/EPICURO Study. A preliminary exploration with the widely used univariate logistic regression approach did not identify any significant SNP after correcting for multiple testing. We further applied two more comprehensive methods to capture the complexity of bladder cancer genetic susceptibility: Bayesian Threshold LASSO (BTL), a regularized regression method, and AUC-Random Forest, a machine-learning algorithm. Both approaches explore the joint effect of markers. BTL analysis identified a signature of 37 SNPs in 34 genes showing an association with bladder cancer. AUC-RF detected an optimal predictive subset of 56 SNPs. 13 SNPs were identified by both methods in the total population. Using resources from the Texas Bladder Cancer study we were able to replicate 30% of the SNPs assessed. The associations between inflammatory SNPs and bladder cancer were reexamined among non-smokers to eliminate the effect of tobacco, one of the strongest and most prevalent environmental risk factor for this tumor. A 9 SNP-signature was detected by BTL. Here we report, for the first time, a set of SNP in inflammatory genes jointly associated with bladder cancer risk. These results highlight the importance of the complex structure of genetic susceptibility associated with cancer risk.
Resumo:
BACKGROUND AND PURPOSE: Several prognostic scores have been developed to predict the risk of symptomatic intracranial hemorrhage (sICH) after ischemic stroke thrombolysis. We compared the performance of these scores in a multicenter cohort. METHODS: We merged prospectively collected data of patients with consecutive ischemic stroke who received intravenous thrombolysis in 7 stroke centers. We identified and evaluated 6 scores that can provide an estimate of the risk of sICH in hyperacute settings: MSS (Multicenter Stroke Survey); HAT (Hemorrhage After Thrombolysis); SEDAN (blood sugar, early infarct signs, [hyper]dense cerebral artery sign, age, NIH Stroke Scale); GRASPS (glucose at presentation, race [Asian], age, sex [male], systolic blood pressure at presentation, and severity of stroke at presentation [NIH Stroke Scale]); SITS (Safe Implementation of Thrombolysis in Stroke); and SPAN (stroke prognostication using age and NIH Stroke Scale)-100 positive index. We included only patients with available variables for all scores. We calculated the area under the receiver operating characteristic curve (AUC-ROC) and also performed logistic regression and the Hosmer-Lemeshow test. RESULTS: The final cohort comprised 3012 eligible patients, of whom 221 (7.3%) had sICH per National Institute of Neurological Disorders and Stroke, 141 (4.7%) per European Cooperative Acute Stroke Study II, and 86 (2.9%) per Safe Implementation of Thrombolysis in Stroke criteria. The performance of the scores assessed with AUC-ROC for predicting European Cooperative Acute Stroke Study II sICH was: MSS, 0.63 (95% confidence interval, 0.58-0.68); HAT, 0.65 (0.60-0.70); SEDAN, 0.70 (0.66-0.73); GRASPS, 0.67 (0.62-0.72); SITS, 0.64 (0.59-0.69); and SPAN-100 positive index, 0.56 (0.50-0.61). SEDAN had significantly higher AUC-ROC values compared with all other scores, except for GRASPS where the difference was nonsignificant. SPAN-100 performed significantly worse compared with other scores. The discriminative ranking of the scores was the same for the National Institute of Neurological Disorders and Stroke, and Safe Implementation of Thrombolysis in Stroke definitions, with SEDAN performing best, GRASPS second, and SPAN-100 worst. CONCLUSIONS: SPAN-100 had the worst predictive power, and SEDAN constantly the highest predictive power. However, none of the scores had better than moderate performance.
Resumo:
RATIONALE: An objective and simple prognostic model for patients with pulmonary embolism could be helpful in guiding initial intensity of treatment. OBJECTIVES: To develop a clinical prediction rule that accurately classifies patients with pulmonary embolism into categories of increasing risk of mortality and other adverse medical outcomes. METHODS: We randomly allocated 15,531 inpatient discharges with pulmonary embolism from 186 Pennsylvania hospitals to derivation (67%) and internal validation (33%) samples. We derived our prediction rule using logistic regression with 30-day mortality as the primary outcome, and patient demographic and clinical data routinely available at presentation as potential predictor variables. We externally validated the rule in 221 inpatients with pulmonary embolism from Switzerland and France. MEASUREMENTS: We compared mortality and nonfatal adverse medical outcomes across the derivation and two validation samples. MAIN RESULTS: The prediction rule is based on 11 simple patient characteristics that were independently associated with mortality and stratifies patients with pulmonary embolism into five severity classes, with 30-day mortality rates of 0-1.6% in class I, 1.7-3.5% in class II, 3.2-7.1% in class III, 4.0-11.4% in class IV, and 10.0-24.5% in class V across the derivation and validation samples. Inpatient death and nonfatal complications were <or= 1.1% among patients in class I and <or= 1.9% among patients in class II. CONCLUSIONS: Our rule accurately classifies patients with pulmonary embolism into classes of increasing risk of mortality and other adverse medical outcomes. Further validation of the rule is important before its implementation as a decision aid to guide the initial management of patients with pulmonary embolism.
Resumo:
A statewide study was conducted to develop regression equations for estimating flood-frequency discharges for ungaged stream sites in Iowa. Thirty-eight selected basin characteristics were quantified and flood-frequency analyses were computed for 291 streamflow-gaging stations in Iowa and adjacent States. A generalized-skew-coefficient analysis was conducted to determine whether generalized skew coefficients could be improved for Iowa. Station skew coefficients were computed for 239 gaging stations in Iowa and adjacent States, and an isoline map of generalized-skew-coefficient values was developed for Iowa using variogram modeling and kriging methods. The skew map provided the lowest mean square error for the generalized-skew- coefficient analysis and was used to revise generalized skew coefficients for flood-frequency analyses for gaging stations in Iowa. Regional regression analysis, using generalized least-squares regression and data from 241 gaging stations, was used to develop equations for three hydrologic regions defined for the State. The regression equations can be used to estimate flood discharges that have recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years for ungaged stream sites in Iowa. One-variable equations were developed for each of the three regions and multi-variable equations were developed for two of the regions. Two sets of equations are presented for two of the regions because one-variable equations are considered easy for users to apply and the predictive accuracies of multi-variable equations are greater. Standard error of prediction for the one-variable equations ranges from about 34 to 45 percent and for the multi-variable equations range from about 31 to 42 percent. A region-of-influence regression method was also investigated for estimating flood-frequency discharges for ungaged stream sites in Iowa. A comparison of regional and region-of-influence regression methods, based on ease of application and root mean square errors, determined the regional regression method to be the better estimation method for Iowa. Techniques for estimating flood-frequency discharges for streams in Iowa are presented for determining ( 1) regional regression estimates for ungaged sites on ungaged streams; (2) weighted estimates for gaged sites; and (3) weighted estimates for ungaged sites on gaged streams. The technique for determining regional regression estimates for ungaged sites on ungaged streams requires determining which of four possible examples applies to the location of the stream site and its basin. Illustrations for determining which example applies to an ungaged stream site and for applying both the one-variable and multi-variable regression equations are provided for the estimation techniques.
Resumo:
PURPOSE: Attention deficit and hyperactivity disorder (ADHD) is one of the most frequent disorders in childhood and adolescence. Both neurocognitive and environmental factors have been related to ADHD. The current study contributes to the documentation of the predictive relation between early attachment deprivation and ADHD. METHOD: Data were collected from 641 adopted adolescents (53.2 % girls) aged 11-16 years in five countries, using the DSM oriented scale for ADHD of the Child Behavior Checklist (CBCL) (Achenbach and Rescorla, Manual for the ASEBA school-age forms and profiles. University of Vermont, Research Center for Children, Youth and Families, Burlington, 2001). The influence of attachment deprivation on ADHD symptoms was initially tested taking into consideration several key variables that have been reported as influencing ADHD at the adoptee level (age, gender, length of time in the adoptive family, parents' educational level and marital status), and at the level of the country of origin and country of adoption (poverty, quality of health services and values). The analyses were computed using the multilevel modeling technique. RESULTS: The results showed that an increase in the level of ADHD symptoms was predicted by the duration of exposure to early attachment deprivation, estimated from the age of adoption, after controlling for the influence of adoptee and country variables. The effect of the age of adoption was also demonstrated to be specific to the level of ADHD symptoms in comparison to both the externalizing and internalizing behavior scales of the CBCL. CONCLUSION: Deprivation of stable and sensitive care in infancy may have long-lasting consequences for children's development.
Resumo:
Indirect topographic variables have been used successfully as surrogates for disturbance processes in plant species distribution models (SDM) in mountain environments. However, no SDM studies have directly tested the performance of disturbance variables. In this study, we developed two disturbance variables: a geomorphic index (GEO) and an index of snow redistribution by wind (SNOW). These were developed in order to assess how they improved both the fit and predictive power of presenceabsence SDM based on commonly used topoclimatic (TC) variables for 91 plants in the Western Swiss Alps. The individual contribution of the disturbance variables was compared to TC variables. Maps of models were prepared to spatially test the effect of disturbance variables. On average, disturbance variables significantly improved the fit but not the predictive power of the TC models and their individual contribution was weak (5.6% for GEO and 3.3% for SNOW). However their maximum individual contribution was important (24.7% and 20.7%). Finally, maps including disturbance variables (i) were significantly divergent from TC models in terms of predicted suitable surfaces and connectivity between potential habitats, and (ii) were interpreted as more ecologically relevant. Disturbance variables did not improve the transferability of models at the local scale in a complex mountain system, and the performance and contribution of these variables were highly species-specific. However, improved spatial projections and change in connectivity are important issues when preparing projections under climate change because the future range size of the species will determine the sensitivity to changing conditions.
Resumo:
Knowledge about spatial biodiversity patterns is a basic criterion for reserve network design. Although herbarium collections hold large quantities of information, the data are often scattered and cannot supply complete spatial coverage. Alternatively, herbarium data can be used to fit species distribution models and their predictions can be used to provide complete spatial coverage and derive species richness maps. Here, we build on previous effort to propose an improved compositionalist framework for using species distribution models to better inform conservation management. We illustrate the approach with models fitted with six different methods and combined using an ensemble approach for 408 plant species in a tropical and megadiverse country (Ecuador). As a complementary view to the traditional richness hotspots methodology, consisting of a simple stacking of species distribution maps, the compositionalist modelling approach used here combines separate predictions for different pools of species to identify areas of alternative suitability for conservation. Our results show that the compositionalist approach better captures the established protected areas than the traditional richness hotspots strategies and allows the identification of areas in Ecuador that would optimally complement the current protection network. Further studies should aim at refining the approach with more groups and additional species information.
Resumo:
PURPOSE: A homozygous mutation in the H6 family homeobox 1 (HMX1) gene is responsible for a new oculoauricular defect leading to eye and auricular developmental abnormalities as well as early retinal degeneration (MIM 612109). However, the HMX1 pathway remains poorly understood, and in the first approach to better understand the pathway's function, we sought to identify the target genes. METHODS: We developed a predictive promoter model (PPM) approach using a comparative transcriptomic analysis in the retina at P15 of a mouse model lacking functional Hmx1 (dmbo mouse) and its respective wild-type. This PPM was based on the hypothesis that HMX1 binding site (HMX1-BS) clusters should be more represented in promoters of HMX1 target genes. The most differentially expressed genes in the microarray experiment that contained HMX1-BS clusters were used to generate the PPM, which was then statistically validated. Finally, we developed two genome-wide target prediction methods: one that focused on conserving PPM features in human and mouse and one that was based on the co-occurrence of HMX1-BS pairs fitting the PPM, in human or in mouse, independently. RESULTS: The PPM construction revealed that sarcoglycan, gamma (35kDa dystrophin-associated glycoprotein) (Sgcg), teashirt zinc finger homeobox 2 (Tshz2), and solute carrier family 6 (neurotransmitter transporter, glycine) (Slc6a9) genes represented Hmx1 targets in the mouse retina at P15. Moreover, the genome-wide target prediction revealed that mouse genes belonging to the retinal axon guidance pathway were targeted by Hmx1. Expression of these three genes was experimentally validated using a quantitative reverse transcription PCR approach. The inhibitory activity of Hmx1 on Sgcg, as well as protein tyrosine phosphatase, receptor type, O (Ptpro) and Sema3f, two targets identified by the PPM, were validated with luciferase assay. CONCLUSIONS: Gene expression analysis between wild-type and dmbo mice allowed us to develop a PPM that identified the first target genes of Hmx1.
Resumo:
PURPOSE: To evaluate the prognostic and predictive value of Ki-67 labeling index (LI) in a trial comparing letrozole (Let) with tamoxifen (Tam) as adjuvant therapy in postmenopausal women with early breast cancer. PATIENTS AND METHODS: Breast International Group (BIG) trial 1-98 randomly assigned 8,010 patients to four treatment arms comparing Let and Tam with sequences of each agent. Of 4,922 patients randomly assigned to receive 5 years of monotherapy with either agent, 2,685 had primary tumor material available for central pathology assessment of Ki-67 LI by immunohistochemistry and had tumors confirmed to express estrogen receptors after central review. The prognostic and predictive value of centrally measured Ki-67 LI on disease-free survival (DFS) were assessed among these patients using proportional hazards modeling, with Ki-67 LI values dichotomized at the median value of 11%. RESULTS: Higher values of Ki-67 LI were associated with adverse prognostic factors and with worse DFS (hazard ratio [HR; high:low] = 1.8; 95% CI, 1.4 to 2.3). The magnitude of the treatment benefit for Let versus Tam was greater among patients with high tumor Ki-67 LI (HR [Let:Tam] = 0.53; 95% CI, 0.39 to 0.72) than among patients with low tumor Ki-67 LI (HR [Let:Tam] = 0.81; 95% CI, 0.57 to 1.15; interaction P = .09). CONCLUSION: Ki-67 LI is confirmed as a prognostic factor in this study. High Ki-67 LI levels may identify a patient group that particularly benefits from initial Let adjuvant therapy.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.