990 resultados para predictive modeling


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: To compare the predictive accuracy of the original and recalibrated Framingham risk function on current morbidity from coronary heart disease (CHD) and mortality data from the Swiss population. METHODS: Data from the CoLaus study, a cross-sectional, population-based study conducted between 2003 and 2006 on 5,773 participants aged 35-74 without CHD were used to recalibrate the Framingham risk function. The predicted number of events from each risk function were compared with those issued from local MONICA incidence rates and official mortality data from Switzerland. RESULTS: With the original risk function, 57.3%, 21.2%, 16.4% and 5.1% of men and 94.9%, 3.8%, 1.2% and 0.1% of women were at very low (<6%), low (6-10%), intermediate (10-20%) and high (>20%) risk, respectively. With the recalibrated risk function, the corresponding values were 84.7%, 10.3%, 4.3% and 0.6% in men and 99.5%, 0.4%, 0.0% and 0.1% in women, respectively. The number of CHD events over 10 years predicted by the original Framingham risk function was 2-3 fold higher than predicted by mortality+case fatality or by MONICA incidence rates (men: 191 vs. 92 and 51 events, respectively). The recalibrated risk function provided more reasonable estimates, albeit slightly overestimated (92 events, 5-95th percentile: 26-223 events); sensitivity analyses showed that the magnitude of the overestimation was between 0.4 and 2.2 in men, and 0.7 and 3.3 in women. CONCLUSION: The recalibrated Framingham risk function provides a reasonable alternative to assess CHD risk in men, but not in women.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Based on provious (Hemelrijk 1998; Puga-González, Hildenbrant & Hemelrijk 2009), we have developed an agent-based model and software, called A-KinGDom, which allows us to simulate the emergence of the social structure in a group of non-human primates. The model includes dominance and affiliative interactions and incorporate s two main innovations (preliminary dominance interactions and a kinship factor), which allow us to define four different attack and affiliative strategies. In accordance with these strategies, we compared the data obtained under four simulation conditions with the results obtained in a provious study (Dolado & Beltran 2012) involving empirical observations of a captive group of mangabeys (Cercocebus torquatus)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The present study investigates the predictive value of the early appearance of simultaneous pointing-speech combinations. An experimental task was used to obtain a communicative productive sample from nineteen children at 1;0 and 1;3. Infant’s communicative productions, in combination with gaze joint engagement patterns, were analyzed in relation to different social conditions. The results show a significant effect of age and social condition on infants’ communicative productions. Gesture-speech combinations seem to work as a strong communicative resource to attract the adult’s attention in social demanding communicative contexts. Gaze joint engagement was used in combination with simultaneous pointing-speech combinations to attract adults’ attention during social demanding conditions. Finally, the use of simultaneous pointing-speech combinations at 1;0 in demanding conditions predicted greater expressive vocabulary acquisition at 1;3 and 1;6. These results indicate that the use of gesture-speech combinations may be considered a significant step towards the early integration of language components.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Visual analog scales (VAS) are used to assess readiness to changeconstructs, which are often considered critical for change.Objective: We studied whether 3 constructs -readiness to change, importance of changing and confidence inability to change- predict risk status 6 months later in 20 year-old men with either orboth of two behaviors: risky drinking and smoking. Methods: 577 participants in abrief intervention randomized trial were assessed at baseline and 6 months later onalcohol and tobacco consumption and with three 1-10 VAS (readiness, importance,confidence) for each behavior. For each behavior, we used one regression model foreach constructs. Models controlled for receipt of a brief intervention and used thelowest level (1-4) in each construct as the reference group (vs medium (5-7) and high(8-10) levels).Results: Among the 475 risky drinkers, mean (SD) readiness, importance and confidence to change drinking were 4.0 (3.1), 2.8 (2.2) and 7.2 (3.0).Readiness was not associated with being alcohol-risk free 6 months later (OR 1.3[0.7; 2.2] and 1.4 [0.8; 2.6] for medium and high readiness). High importance andhigh confidence were associated with being risk free (OR 0.9 [0.5; 1.8] and 2.9 [1.2;7.5] for medium and high importance; 2.1 [1.0;4.8] and 2.8 [1.5;5.6] for medium andhigh confidence). Among the 320 smokers, mean readiness, importance andconfidence to change smoking were 4.6 (2.6), 5.3 (2.6) and 5.9 (2.6). Neitherreadiness nor importance were associated with being smoking free (OR 2.1 [0.9; 4.7]and 2.1 [0.8; 5.8] for medium and high readiness; 1.4 [0.6; 3.4] and 2.1 [0.8; 5.4] formedium and high importance). High confidence was associated with being smokingfree (OR 2.2 [0.8;6.6] and 3.4 [1.2;9.8] for medium and high confidence).Conclusions: For drinking and smoking, high confidence in ability to change wasassociated -with similar magnitude- with a favorable outcome. This points to thevalue of confidence as an important predictor of successful change.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A human in vivo toxicokinetic model was built to allow a better understanding of the toxicokinetics of folpet fungicide and its key ring biomarkers of exposure: phthalimide (PI), phthalamic acid (PAA) and phthalic acid (PA). Both PI and the sum of ring metabolites, expressed as PA equivalents (PAeq), may be used as biomarkers of exposure. The conceptual representation of the model was based on the analysis of the time course of these biomarkers in volunteers orally and dermally exposed to folpet. In the model, compartments were also used to represent the body burden of folpet and experimentally relevant PI, PAA and PA ring metabolites in blood and in key tissues as well as in excreta, hence urinary and feces. The time evolution of these biomarkers in each compartment of the model was then mathematically described by a system of coupled differential equations. The mathematical parameters of the model were then determined from best fits to the time courses of PI and PAeq in blood and urine of five volunteers administered orally 1 mg kg(-1) and dermally 10 mg kg(-1) of folpet. In the case of oral administration, the mean elimination half-life of PI from blood (through feces, urine or metabolism) was found to be 39.9 h as compared with 28.0 h for PAeq. In the case of a dermal application, mean elimination half-life of PI and PAeq was estimated to be 34.3 and 29.3 h, respectively. The average final fractions of administered dose recovered in urine as PI over the 0-96 h period were 0.030 and 0.002%, for oral and dermal exposure, respectively. Corresponding values for PAeq were 24.5 and 1.83%, respectively. Finally, the average clearance rate of PI from blood calculated from the oral and dermal data was 0.09 ± 0.03 and 0.13 ± 0.05 ml h(-1) while the volume of distribution was 4.30 ± 1.12 and 6.05 ± 2.22 l, respectively. It was not possible to obtain the corresponding values from PAeq data owing to the lack of blood time course data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Several methods and approaches for measuring parameters to determine fecal sources of pollution in water have been developed in recent years. No single microbial or chemical parameter has proved sufficient to determine the source of fecal pollution. Combinations of parameters involving at least one discriminating indicator and one universal fecal indicator offer the most promising solutions for qualitative and quantitative analyses. The universal (nondiscriminating) fecal indicator provides quantitative information regarding the fecal load. The discriminating indicator contributes to the identification of a specific source. The relative values of the parameters derived from both kinds of indicators could provide information regarding the contribution to the total fecal load from each origin. It is also essential that both parameters characteristically persist in the environment for similar periods. Numerical analysis, such as inductive learning methods, could be used to select the most suitable and the lowest number of parameters to develop predictive models. These combinations of parameters provide information on factors affecting the models, such as dilution, specific types of animal source, persistence of microbial tracers, and complex mixtures from different sources. The combined use of the enumeration of somatic coliphages and the enumeration of Bacteroides-phages using different host specific strains (one from humans and another from pigs), both selected using the suggested approach, provides a feasible model for quantitative and qualitative analyses of fecal source identification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The O6-methylguanine-DNA-methyltransferase (MGMT) promoter methylation status is a predictive parameter for the response of malignant gliomas to alkylating agents such as temozolomide. First clinical trials with temozolomide plus bevacizumab therapy in metastatic melanoma patients are ongoing, although the predictive value of the MGMT promoter methylation status in this setting remains unclear. We assessed MGMT promoter methylation in formalin-fixed, primary tumor tissue of metastatic melanoma patients treated with first-line temozolomide and bevacizumab from the trial SAKK 50/07 by methylation-specific polymerase chain reaction. In addition, the MGMT expression levels were also analyzed by MGMT immunohistochemistry. Eleven of 42 primary melanomas (26%) revealed a methylated MGMT promoter. Promoter methylation was significantly associated with response rates CR + PR versus SD + PD according to RECIST (response evaluation criteria in solid tumors) (p<0.05) with a trend to prolonged median progression-free survival (8.1 versus 3.4 months, p>0.05). Immunohistochemically different protein expression patterns with heterogeneous and homogeneous nuclear MGMT expression were identified. Negative MGMT expression levels were associated with overall disease stabilization CR + PR + SD versus PD (p=0.05). There was only a poor correlation between MGMT methylation and lack of MGMT expression. A significant proportion of melanomas have a methylated MGMT promoter. The MGMT promoter methylation status may be a promising predictive marker for temozolomide therapy in metastatic melanoma patients. Larger sample sizes may help to validate significant differences in survival type endpoints.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High-energy charged particles in the van Allen radiation belts and in solar energetic particle events can damage satellites on orbit leading to malfunctions and loss of satellite service. Here we describe some recent results from the SPACECAST project on modelling and forecasting the radiation belts, and modelling solar energetic particle events. We describe the SPACECAST forecasting system that uses physical models that include wave-particle interactions to forecast the electron radiation belts up to 3 h ahead. We show that the forecasts were able to reproduce the >2 MeV electron flux at GOES 13 during the moderate storm of 7-8 October 2012, and the period following a fast solar wind stream on 25-26 October 2012 to within a factor of 5 or so. At lower energies of 10- a few 100 keV we show that the electron flux at geostationary orbit depends sensitively on the high-energy tail of the source distribution near 10 RE on the nightside of the Earth, and that the source is best represented by a kappa distribution. We present a new model of whistler mode chorus determined from multiple satellite measurements which shows that the effects of wave-particle interactions beyond geostationary orbit are likely to be very significant. We also present radial diffusion coefficients calculated from satellite data at geostationary orbit which vary with Kp by over four orders of magnitude. We describe a new automated method to determine the position at the shock that is magnetically connected to the Earth for modelling solar energetic particle events and which takes into account entropy, and predict the form of the mean free path in the foreshock, and particle injection efficiency at the shock from analytical theory which can be tested in simulations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Turtle Mountain in Alberta, Canada has become an important field laboratory for testing different techniques related to the characterization and monitoring of large slope mass movements as the stability of large portions of the eastern face of the mountain is still questionable. In order to better quantify the volumes potentially unstable and the most probable failure mechanisms and potential consequences, structural analysis and runout modeling were preformed. The structural features of the eastern face were investigated using a high resolution digital elevation model (HRDEM). According to displacement datasets and structural observations, potential failure mechanisms affecting different portions of the mountain have been assessed. The volumes of the different potentially unstable blocks have been calculated using the Sloping Local Base Level (SLBL) method. Based on the volume estimation, two and three dimensional dynamic runout analyses have been performed. Calibration of this analysis is based on the experience from the adjacent Frank Slide and other similar rock avalanches. The results will be used to improve the contingency plans within the hazard area.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Intrauterine growth restriction (IUGR) is one of the leading causes of perinatal mortality and morbidity. Nowadays, this condition is detected in the 3rt and last trimester of gestation when the pathology is already established and success of therapeutic strategies are limited. As the physiopathology of the disease suggests that the problem stems from poor placental implantation, it would be quite advantageous to identify women at increased risk in the first or second trimester of gestation because it then might be possible to offer treatment interventions or at least to establish increased surveillance for high risk pregnancies. Maternal levels of pregnancy-associated plasma protein-A (PAPP-A) and free β human chorionic gonadotropin (free βhCG) has been shown to be effective in first trimester screening for chromosomal abnormalities, primarily trisomies 21, 13 and 18. Previous studies evaluating PAPP-A and free βhCG measured in the first trimester in relation with IUGR have provided conflicting results. Moreover, it has been suggested that black ethnicity is another important predictive factor for fetal growth restriction.Objective: To analyse the association between first trimester serum analytes (PAPP-A and free βhCG) and ethnicity with Intrauterine Growth Restriction.Methods: The study consists in a retrospective cohort, including all singleton pregnancies with complete outcome data that had undergone first trimester screening (PAPP-A and free βhCG) at 11-13+6weeks of gestation between 1/1/2010 - 31/12/2012 in Hospital Universitari Dr Josep Trueta. Biochemical markers are converted to multiples of the median (MoMs) and percentiles 5 and 10 are calculated. The association between free βhCG and PAPP-A with the incidence of IUGR is evaluated in combination with maternal ethnicity. Bivariate and logistic regression analyses are performed to adjust this association for co variables