Biblioteca Digital

38 resultados para Techniques of data analysis

em Université de Lausanne, Switzerland

Subclinical hypothyroidism and the risk of coronary heart disease and mortality: an individual participant data analysis from nine prospective cohort studies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Texte intégral: http://www.springerlink.com/content/3q68180337551r47/fulltext.pdf

Veja mais

Subclinical thyroid dysfunction and the risk of heart failure events: an individual participant data analysis from 6 prospective cohorts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: American College of Cardiology/American Heart Association guidelines for the diagnosis and management of heart failure recommend investigating exacerbating conditions such as thyroid dysfunction, but without specifying the impact of different thyroid-stimulation hormone (TSH) levels. Limited prospective data exist on the association between subclinical thyroid dysfunction and heart failure events. METHODS AND RESULTS: We performed a pooled analysis of individual participant data using all available prospective cohorts with thyroid function tests and subsequent follow-up of heart failure events. Individual data on 25 390 participants with 216 248 person-years of follow-up were supplied from 6 prospective cohorts in the United States and Europe. Euthyroidism was defined as TSH of 0.45 to 4.49 mIU/L, subclinical hypothyroidism as TSH of 4.5 to 19.9 mIU/L, and subclinical hyperthyroidism as TSH <0.45 mIU/L, the last two with normal free thyroxine levels. Among 25 390 participants, 2068 (8.1%) had subclinical hypothyroidism and 648 (2.6%) had subclinical hyperthyroidism. In age- and sex-adjusted analyses, risks of heart failure events were increased with both higher and lower TSH levels (P for quadratic pattern <0.01); the hazard ratio was 1.01 (95% confidence interval, 0.81-1.26) for TSH of 4.5 to 6.9 mIU/L, 1.65 (95% confidence interval, 0.84-3.23) for TSH of 7.0 to 9.9 mIU/L, 1.86 (95% confidence interval, 1.27-2.72) for TSH of 10.0 to 19.9 mIU/L (P for trend <0.01) and 1.31 (95% confidence interval, 0.88-1.95) for TSH of 0.10 to 0.44 mIU/L and 1.94 (95% confidence interval, 1.01-3.72) for TSH <0.10 mIU/L (P for trend=0.047). Risks remained similar after adjustment for cardiovascular risk factors. CONCLUSION: Risks of heart failure events were increased with both higher and lower TSH levels, particularly for TSH ≥10 and <0.10 mIU/L.

Veja mais

Advanced geostatistical and machine-learning models for spatial data analysis of radioactively contaminated regions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.

Veja mais

Aliskiren monotherapy does not cause paradoxical blood pressure rises: meta-analysis of data from 8 clinical trials.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Angiotensin receptor blockers, angiotensin-converting enzyme inhibitors, and diuretics all cause reactive rises in plasma renin concentration, but particularly high levels have been reported with aliskiren. This prompted speculation that blockade of plasma renin activity with aliskiren could be overwhelmed, leading to paradoxical increases in blood pressure. This meta-analysis of data from 4877 patients from 8 randomized, double-blind, placebo- and/or active-controlled trials examined this hypothesis. The analysis focused on the incidence of paradoxical blood pressure increases above predefined thresholds, after > or =4 weeks of treatment with 300 mg of aliskiren, angiotensin receptor blockers (300 mg of irbesartan, 100 mg of losartan, or 320 mg of valsartan), 10 mg of ramipril, 25 mg of hydrochlorothiazide, or placebo. There were no significant differences in the frequency of increases in systolic (>10 mm Hg; P=0.30) or diastolic (>5 mm Hg; P=0.65) pressure among those treated with aliskiren (3.9% and 3.1%, respectively), angiotensin receptor blockers (4.0% and 3.7%), ramipril (5.7% and 2.6%), or hydrochlorothiazide (4.4% and 2.7%). Increases in blood pressure were considerably more frequent in the placebo group (12.6% and 11.4%; P<0.001). None of the 536 patients with plasma renin activity data who received 300 mg of aliskiren exhibited an increase in systolic pressure >10 mm Hg that was associated with an increase in plasma renin activity >0.1 ng/mL per hour. In conclusion, the incidence of blood pressure increases with aliskiren was similar to that during treatment with other antihypertensive drugs. Blood pressure rises on aliskiren treatment were not associated with increases in plasma renin activity. This meta-analysis found no evidence that aliskiren uniquely causes paradoxical rises in blood pressure.

Veja mais

Effect of white-matter lesions on the risk of periprocedural stroke after carotid artery stenting versus endarterectomy in the International Carotid Stenting Study (ICSS): a prespecified analysis of data from a randomised trial.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Findings from randomised trials have shown a higher early risk of stroke after carotid artery stenting than after carotid endarterectomy. We assessed whether white-matter lesions affect the perioperative risk of stroke in patients treated with carotid artery stenting versus carotid endarterectomy. METHODS: Patients with symptomatic carotid artery stenosis included in the International Carotid Stenting Study (ICSS) were randomly allocated to receive carotid artery stenting or carotid endarterectomy. Copies of baseline brain imaging were analysed by two investigators, who were masked to treatment, for the severity of white-matter lesions using the age-related white-matter changes (ARWMC) score. Randomisation was done with a computer-generated sequence (1:1). Patients were divided into two groups using the median ARWMC. We analysed the risk of stroke within 30 days of revascularisation using a per-protocol analysis. ICSS is registered with controlled-trials.com, number ISRCTN 25337470. FINDINGS: 1036 patients (536 randomly allocated to carotid artery stenting, 500 to carotid endarterectomy) had baseline imaging available. Median ARWMC score was 7, and patients were dichotomised into those with a score of 7 or more and those with a score of less than 7. In patients treated with carotid artery stenting, those with an ARWMC score of 7 or more had an increased risk of stroke compared with those with a score of less than 7 (HR for any stroke 2·76, 95% CI 1·17-6·51; p=0·021; HR for non-disabling stroke 3·00, 1·10-8·36; p=0·031), but we did not see a similar association in patients treated with carotid endarterectomy (HR for any stroke 1·18, 0·40-3·55; p=0·76; HR for disabling or fatal stroke 1·41, 0·38-5·26; p=0·607). Carotid artery stenting was associated with a higher risk of stroke compared with carotid endarterectomy in patients with an ARWMC score of 7 or more (HR for any stroke 2·98, 1·29-6·93; p=0·011; HR for non-disabling stroke 6·34, 1·45-27·71; p=0·014), but there was no risk difference in patients with an ARWMC score of less than 7. INTERPRETATION: The presence of white-matter lesions on brain imaging should be taken into account when selecting patients for carotid revascularisation. Carotid artery stenting should be avoided in patients with more extensive white-matter lesions, but might be an acceptable alternative to carotid endarterectomy in patients with less extensive lesions. FUNDING: Medical Research Council, the Stroke Association, Sanofi-Synthélabo, the European Union Research Framework Programme 5.

Veja mais

How to interprete subclinical thyroid dysfunction in the prevention and management of heart failure events? Individual participant data analysis from six prospective cohorts

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Guidelines of the Diagnosis and Management of Heart Failure (HF) recommend investigating exacerbating conditions, such as thyroid dysfunction, but without specifying impact of different TSH levels. Limited prospective data exist regarding the association between subclinical thyroid dysfunction and HF events. Methods: We performed a pooled analysis of individual participant data using all available prospective cohorts with thyroid function tests and subsequent follow-up of HF events. Individual data on 25,390 participants with 216,247 person-years of follow-up were supplied from 6 prospective cohorts in the United States and Europe. Euthyroidism was defined as TSH 0.45-4.49 mIU/L, subclinical hypothyroidism as TSH 4.5-19.9 mIU/L and subclinical hyperthyroidism as TSH <0.45 mIU/L, both with normal free thyroxine levels. HF events were defined as acute HF events, hospitalization or death related to HF events. Results: Among 25,390 participants, 2068 had subclinical hypothyroidism (8.1%) and 648 subclinical hyperthyroidism (2.6%). In age- and gender-adjusted analyses, risks of HF events were increased with both higher and lower TSH levels (P for quadratic pattern<0.01): hazard ratio (HR) was 1.01 (95% confidence interval [CI] 0.81-1.26) for TSH 4.5-6.9 mIU/L, 1.65 (CI 0.84-3.23) for TSH 7.0-9.9 mIU/L, 1.86 (CI 1.27-2.72) for TSH 10.0-19.9 mIUL/L (P for trend <0.01), and was 1.31 (CI 0.88-1.95) for TSH 0.10-0.44 mIU/L and 1.94 (CI 1.01-3.72) for TSH <0.10 mIU/L (P for trend=0.047). Risks remained similar after adjustment for cardiovascular risk factors. Conclusion: Risks of HF events were increased with both higher and lower TSH levels, particularly for TSH ≥10 mIU/L and for TSH <0.10 mIU/L. Our findings might help to interpret TSH levels in the prevention and investigation of HF.

Veja mais

Thyroid antibody status, subclinical hypothyroidism, and the risk of coronary heart disease: an individual participant data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

CONTEXT: Subclinical hypothyroidism has been associated with increased risk of coronary heart disease (CHD), particularly with thyrotropin levels of 10.0 mIU/L or greater. The measurement of thyroid antibodies helps predict the progression to overt hypothyroidism, but it is unclear whether thyroid autoimmunity independently affects CHD risk. OBJECTIVE: The objective of the study was to compare the CHD risk of subclinical hypothyroidism with and without thyroid peroxidase antibodies (TPOAbs). DATA SOURCES AND STUDY SELECTION: A MEDLINE and EMBASE search from 1950 to 2011 was conducted for prospective cohorts, reporting baseline thyroid function, antibodies, and CHD outcomes. DATA EXTRACTION: Individual data of 38 274 participants from six cohorts for CHD mortality followed up for 460 333 person-years and 33 394 participants from four cohorts for CHD events. DATA SYNTHESIS: Among 38 274 adults (median age 55 y, 63% women), 1691 (4.4%) had subclinical hypothyroidism, of whom 775 (45.8%) had positive TPOAbs. During follow-up, 1436 participants died of CHD and 3285 had CHD events. Compared with euthyroid individuals, age- and gender-adjusted risks of CHD mortality in subclinical hypothyroidism were similar among individuals with and without TPOAbs [hazard ratio (HR) 1.15, 95% confidence interval (CI) 0.87-1.53 vs HR 1.26, CI 1.01-1.58, P for interaction = .62], as were risks of CHD events (HR 1.16, CI 0.87-1.56 vs HR 1.26, CI 1.02-1.56, P for interaction = .65). Risks of CHD mortality and events increased with higher thyrotropin, but within each stratum, risks did not differ by TPOAb status. CONCLUSIONS: CHD risk associated with subclinical hypothyroidism did not differ by TPOAb status, suggesting that biomarkers of thyroid autoimmunity do not add independent prognostic information for CHD outcomes.

Veja mais

Is the adiposity-associated FTO gene variant related to all-cause mortality independent of adiposity? Meta-analysis of data from 169,551 Caucasian adults.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Previously, a single nucleotide polymorphism (SNP), rs9939609, in the FTO gene showed a much stronger association with all-cause mortality than expected from its association with body mass index (BMI), body fat mass index (FMI) and waist circumference (WC). This finding implies that the SNP has strong pleiotropic effects on adiposity and adiposity-independent pathological pathways that leads to increased mortality. To investigate this further, we conducted a meta-analysis of similar data from 34 longitudinal studies including 169,551 adult Caucasians among whom 27,100 died during follow-up. Linear regression showed that the minor allele of the FTO SNP was associated with greater BMI (n = 169,551; 0.32 kg m(-2) ; 95% CI 0.28-0.32, P < 1 × 10(-32) ), WC (n = 152,631; 0.76 cm; 0.68-0.84, P < 1 × 10(-32) ) and FMI (n = 48,192; 0.17 kg m(-2) ; 0.13-0.22, P = 1.0 × 10(-13) ). Cox proportional hazard regression analyses for mortality showed that the hazards ratio (HR) for the minor allele of the FTO SNPs was 1.02 (1.00-1.04, P = 0.097), but the apparent excess risk was eliminated after adjustment for BMI and WC (HR: 1.00; 0.98-1.03, P = 0.662) and for FMI (HR: 1.00; 0.96-1.04, P = 0.932). In conclusion, this study does not support that the FTO SNP is associated with all-cause mortality independently of the adiposity phenotypes.

Veja mais

Subclinical Hypothyroidism and the Risk of Stroke Events and Fatal Stroke: An Individual Participant Data Analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: The objective was to determine the risk of stroke associated with subclinical hypothyroidism. DATA SOURCES AND STUDY SELECTION: Published prospective cohort studies were identified through a systematic search through November 2013 without restrictions in several databases. Unpublished studies were identified through the Thyroid Studies Collaboration. We collected individual participant data on thyroid function and stroke outcome. Euthyroidism was defined as TSH levels of 0.45-4.49 mIU/L, and subclinical hypothyroidism was defined as TSH levels of 4.5-19.9 mIU/L with normal T4 levels. DATA EXTRACTION AND SYNTHESIS: We collected individual participant data on 47 573 adults (3451 subclinical hypothyroidism) from 17 cohorts and followed up from 1972-2014 (489 192 person-years). Age- and sex-adjusted pooled hazard ratios (HRs) for participants with subclinical hypothyroidism compared to euthyroidism were 1.05 (95% confidence interval [CI], 0.91-1.21) for stroke events (combined fatal and nonfatal stroke) and 1.07 (95% CI, 0.80-1.42) for fatal stroke. Stratified by age, the HR for stroke events was 3.32 (95% CI, 1.25-8.80) for individuals aged 18-49 years. There was an increased risk of fatal stroke in the age groups 18-49 and 50-64 years, with a HR of 4.22 (95% CI, 1.08-16.55) and 2.86 (95% CI, 1.31-6.26), respectively (p trend 0.04). We found no increased risk for those 65-79 years old (HR, 1.00; 95% CI, 0.86-1.18) or ≥ 80 years old (HR, 1.31; 95% CI, 0.79-2.18). There was a pattern of increased risk of fatal stroke with higher TSH concentrations. CONCLUSIONS: Although no overall effect of subclinical hypothyroidism on stroke could be demonstrated, an increased risk in subjects younger than 65 years and those with higher TSH concentrations was observed.

Veja mais

Multilayer perceptron with local constraint as an emerging method in spatial data analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of Geographic Information Systems has revolutionalized the handling and the visualization of geo-referenced data and has underlined the critic role of spatial analysis. The usual tools for such a purpose are geostatistics which are widely used in Earth science. Geostatistics are based upon several hypothesis which are not always verified in practice. On the other hand, Artificial Neural Network (ANN) a priori can be used without special assumptions and are known to be flexible. This paper proposes to discuss the application of ANN in the case of the interpolation of a geo-referenced variable.

Veja mais

Use of high-resolution geophysical data to characterize heterogeneous aquifers: influence of data integration method on hydrological predictions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The integration of geophysical data into the subsurface characterization problem has been shown in many cases to significantly improve hydrological knowledge by providing information at spatial scales and locations that is unattainable using conventional hydrological measurement techniques. The investigation of exactly how much benefit can be brought by geophysical data in terms of its effect on hydrological predictions, however, has received considerably less attention in the literature. Here, we examine the potential hydrological benefits brought by a recently introduced simulated annealing (SA) conditional stochastic simulation method designed for the assimilation of diverse hydrogeophysical data sets. We consider the specific case of integrating crosshole ground-penetrating radar (GPR) and borehole porosity log data to characterize the porosity distribution in saturated heterogeneous aquifers. In many cases, porosity is linked to hydraulic conductivity and thus to flow and transport behavior. To perform our evaluation, we first generate a number of synthetic porosity fields exhibiting varying degrees of spatial continuity and structural complexity. Next, we simulate the collection of crosshole GPR data between several boreholes in these fields, and the collection of porosity log data at the borehole locations. The inverted GPR data, together with the porosity logs, are then used to reconstruct the porosity field using the SA-based method, along with a number of other more elementary approaches. Assuming that the grid-cell-scale relationship between porosity and hydraulic conductivity is unique and known, the porosity realizations are then used in groundwater flow and contaminant transport simulations to assess the benefits and limitations of the different approaches.

Veja mais

On the Schoenberg transformations in data analysis: theory and illustrations

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.

Veja mais

Building the niche through time: using 13,000 years of data to predict the effects of climate change on three tree species in Europe

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aim Species distribution models (SDMs) based on current species ranges underestimate the potential distribution when projected in time and/or space. A multi-temporal model calibration approach has been suggested as an alternative, and we evaluate this using 13,000 years of data. Location Europe. Methods We used fossil-based records of presence for Picea abies, Abies alba and Fagus sylvatica and six climatic variables for the period 13,000 to 1000yr bp. To measure the contribution of each 1000-year time step to the total niche of each species (the niche measured by pooling all the data), we employed a principal components analysis (PCA) calibrated with data over the entire range of possible climates. Then we projected both the total niche and the partial niches from single time frames into the PCA space, and tested if the partial niches were more similar to the total niche than random. Using an ensemble forecasting approach, we calibrated SDMs for each time frame and for the pooled database. We projected each model to current climate and evaluated the results against current pollen data. We also projected all models into the future. Results Niche similarity between the partial and the total-SDMs was almost always statistically significant and increased through time. SDMs calibrated from single time frames gave different results when projected to current climate, providing evidence of a change in the species realized niches through time. Moreover, they predicted limited climate suitability when compared with the total-SDMs. The same results were obtained when projected to future climates. Main conclusions The realized climatic niche of species differed for current and future climates when SDMs were calibrated considering different past climates. Building the niche as an ensemble through time represents a way forward to a better understanding of a species' range and its ecology in a changing climate.

Veja mais

Topo-climatic data: Analysis, modelling and geovisualization

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Machine learning for geospatial data : algorithms, software tools and case studies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Veja mais

38 resultados para Techniques of data analysis

em Université de Lausanne, Switzerland

Filtro por publicador