36 resultados para acoustic sensor data analysis
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.
Resumo:
Multisensory stimuli can improve performance, facilitating RTs on sensorimotor tasks. This benefit is referred to as the redundant signals effect (RSE) and can exceed predictions on the basis of probability summation, indicative of integrative processes. Although an RSE exceeding probability summation has been repeatedly observed in humans and nonprimate animals, there are scant and inconsistent data from nonhuman primates performing similar protocols. Rather, existing paradigms have instead focused on saccadic eye movements. Moreover, the extant results in monkeys leave unresolved how stimulus synchronicity and intensity impact performance. Two trained monkeys performed a simple detection task involving arm movements to auditory, visual, or synchronous auditory-visual multisensory pairs. RSEs in excess of predictions on the basis of probability summation were observed and thus forcibly follow from neural response interactions. Parametric variation of auditory stimulus intensity revealed that in both animals, RT facilitation was limited to situations where the auditory stimulus intensity was below or up to 20 dB above perceptual threshold, despite the visual stimulus always being suprathreshold. No RT facilitation or even behavioral costs were obtained with auditory intensities 30-40 dB above threshold. The present study demonstrates the feasibility and the suitability of behaving monkeys for investigating links between psychophysical and neurophysiologic instantiations of multisensory interactions.
Resumo:
General Introduction This thesis can be divided into two main parts :the first one, corresponding to the first three chapters, studies Rules of Origin (RoOs) in Preferential Trade Agreements (PTAs); the second part -the fourth chapter- is concerned with Anti-Dumping (AD) measures. Despite wide-ranging preferential access granted to developing countries by industrial ones under North-South Trade Agreements -whether reciprocal, like the Europe Agreements (EAs) or NAFTA, or not, such as the GSP, AGOA, or EBA-, it has been claimed that the benefits from improved market access keep falling short of the full potential benefits. RoOs are largely regarded as a primary cause of the under-utilization of improved market access of PTAs. RoOs are the rules that determine the eligibility of goods to preferential treatment. Their economic justification is to prevent trade deflection, i.e. to prevent non-preferred exporters from using the tariff preferences. However, they are complex, cost raising and cumbersome, and can be manipulated by organised special interest groups. As a result, RoOs can restrain trade beyond what it is needed to prevent trade deflection and hence restrict market access in a statistically significant and quantitatively large proportion. Part l In order to further our understanding of the effects of RoOs in PTAs, the first chapter, written with Pr. Olivier Cadot, Celine Carrère and Pr. Jaime de Melo, describes and evaluates the RoOs governing EU and US PTAs. It draws on utilization-rate data for Mexican exports to the US in 2001 and on similar data for ACP exports to the EU in 2002. The paper makes two contributions. First, we construct an R-index of restrictiveness of RoOs along the lines first proposed by Estevadeordal (2000) for NAFTA, modifying it and extending it for the EU's single-list (SL). This synthetic R-index is then used to compare Roos under NAFTA and PANEURO. The two main findings of the chapter are as follows. First, it shows, in the case of PANEURO, that the R-index is useful to summarize how countries are differently affected by the same set of RoOs because of their different export baskets to the EU. Second, it is shown that the Rindex is a relatively reliable statistic in the sense that, subject to caveats, after controlling for the extent of tariff preference at the tariff-line level, it accounts for differences in utilization rates at the tariff line level. Finally, together with utilization rates, the index can be used to estimate total compliance costs of RoOs. The second chapter proposes a reform of preferential Roos with the aim of making them more transparent and less discriminatory. Such a reform would make preferential blocs more "cross-compatible" and would therefore facilitate cumulation. It would also contribute to move regionalism toward more openness and hence to make it more compatible with the multilateral trading system. It focuses on NAFTA, one of the most restrictive FTAs (see Estevadeordal and Suominen 2006), and proposes a way forward that is close in spirit to what the EU Commission is considering for the PANEURO system. In a nutshell, the idea is to replace the current array of RoOs by a single instrument- Maximum Foreign Content (MFC). An MFC is a conceptually clear and transparent instrument, like a tariff. Therefore changing all instruments into an MFC would bring improved transparency pretty much like the "tariffication" of NTBs. The methodology for this exercise is as follows: In step 1, I estimate the relationship between utilization rates, tariff preferences and RoOs. In step 2, I retrieve the estimates and invert the relationship to get a simulated MFC that gives, line by line, the same utilization rate as the old array of Roos. In step 3, I calculate the trade-weighted average of the simulated MFC across all lines to get an overall equivalent of the current system and explore the possibility of setting this unique instrument at a uniform rate across lines. This would have two advantages. First, like a uniform tariff, a uniform MFC would make it difficult for lobbies to manipulate the instrument at the margin. This argument is standard in the political-economy literature and has been used time and again in support of reductions in the variance of tariffs (together with standard welfare considerations). Second, uniformity across lines is the only way to eliminate the indirect source of discrimination alluded to earlier. Only if two countries face uniform RoOs and tariff preference will they face uniform incentives irrespective of their initial export structure. The result of this exercise is striking: the average simulated MFC is 25% of good value, a very low (i.e. restrictive) level, confirming Estevadeordal and Suominen's critical assessment of NAFTA's RoOs. Adopting a uniform MFC would imply a relaxation from the benchmark level for sectors like chemicals or textiles & apparel, and a stiffening for wood products, papers and base metals. Overall, however, the changes are not drastic, suggesting perhaps only moderate resistance to change from special interests. The third chapter of the thesis considers whether Europe Agreements of the EU, with the current sets of RoOs, could be the potential model for future EU-centered PTAs. First, I have studied and coded at the six-digit level of the Harmonised System (HS) .both the old RoOs -used before 1997- and the "Single list" Roos -used since 1997. Second, using a Constant Elasticity Transformation function where CEEC exporters smoothly mix sales between the EU and the rest of the world by comparing producer prices on each market, I have estimated the trade effects of the EU RoOs. The estimates suggest that much of the market access conferred by the EAs -outside sensitive sectors- was undone by the cost-raising effects of RoOs. The chapter also contains an analysis of the evolution of the CEECs' trade with the EU from post-communism to accession. Part II The last chapter of the thesis is concerned with anti-dumping, another trade-policy instrument having the effect of reducing market access. In 1995, the Uruguay Round introduced in the Anti-Dumping Agreement (ADA) a mandatory "sunset-review" clause (Article 11.3 ADA) under which anti-dumping measures should be reviewed no later than five years from their imposition and terminated unless there was a serious risk of resumption of injurious dumping. The last chapter, written with Pr. Olivier Cadot and Pr. Jaime de Melo, uses a new database on Anti-Dumping (AD) measures worldwide to assess whether the sunset-review agreement had any effect. The question we address is whether the WTO Agreement succeeded in imposing the discipline of a five-year cycle on AD measures and, ultimately, in curbing their length. Two methods are used; count data analysis and survival analysis. First, using Poisson and Negative Binomial regressions, the count of AD measures' revocations is regressed on (inter alia) the count of "initiations" lagged five years. The analysis yields a coefficient on measures' initiations lagged five years that is larger and more precisely estimated after the agreement than before, suggesting some effect. However the coefficient estimate is nowhere near the value that would give a one-for-one relationship between initiations and revocations after five years. We also find that (i) if the agreement affected EU AD practices, the effect went the wrong way, the five-year cycle being quantitatively weaker after the agreement than before; (ii) the agreement had no visible effect on the United States except for aone-time peak in 2000, suggesting a mopping-up of old cases. Second, the survival analysis of AD measures around the world suggests a shortening of their expected lifetime after the agreement, and this shortening effect (a downward shift in the survival function postagreement) was larger and more significant for measures targeted at WTO members than for those targeted at non-members (for which WTO disciplines do not bind), suggesting that compliance was de jure. A difference-in-differences Cox regression confirms this diagnosis: controlling for the countries imposing the measures, for the investigated countries and for the products' sector, we find a larger increase in the hazard rate of AD measures covered by the Agreement than for other measures.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
Until recently, the hard X-ray, phase-sensitive imaging technique called grating interferometry was thought to provide information only in real space. However, by utilizing an alternative approach to data analysis we demonstrated that the angular resolved ultra-small angle X-ray scattering distribution can be retrieved from experimental data. Thus, reciprocal space information is accessible by grating interferometry in addition to real space. Naturally, the quality of the retrieved data strongly depends on the performance of the employed analysis procedure, which involves deconvolution of periodic and noisy data in this context. The aim of this article is to compare several deconvolution algorithms to retrieve the ultra-small angle X-ray scattering distribution in grating interferometry. We quantitatively compare the performance of three deconvolution procedures (i.e., Wiener, iterative Wiener and Lucy-Richardson) in case of realistically modeled, noisy and periodic input data. The simulations showed that the algorithm of Lucy-Richardson is the more reliable and more efficient as a function of the characteristics of the signals in the given context. The availability of a reliable data analysis procedure is essential for future developments in grating interferometry.
Resumo:
The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.
Resumo:
The use of synthetic combinatorial peptide libraries in positional scanning format (PS-SCL) has emerged recently as an alternative approach for the identification of peptides recognized by T lymphocytes. The choice of both the PS-SCL used for screening experiments and the method used for data analysis are crucial for implementing this approach. With this aim, we tested the recognition of different PS-SCL by a tyrosinase 368-376-specific CTL clone and analyzed the data obtained with a recently developed biometric data analysis based on a model of independent and additive contribution of individual amino acids to peptide antigen recognition. Mixtures defined with amino acids present at the corresponding positions in the native sequence were among the most active for all of the libraries. Somewhat surprisingly, a higher number of native amino acids were identifiable by using amidated COOH-terminal rather than free COOH-terminal PS-SCL. Also, our data clearly indicate that when using PS-SCL longer than optimal, frame shifts occur frequently and should be taken into account. Biometric analysis of the data obtained with the amidated COOH-terminal nonapeptide library allowed the identification of the native ligand as the sequence with the highest score in a public human protein database. However, the adequacy of the PS-SCL data for the identification for the peptide ligand varied depending on the PS-SCL used. Altogether these results provide insight into the potential of PS-SCL for the identification of CTL-defined tumor-derived antigenic sequences and may significantly implement our ability to interpret the results of these analyses.
Resumo:
Quantitative information from magnetic resonance imaging (MRI) may substantiate clinical findings and provide additional insight into the mechanism of clinical interventions in therapeutic stroke trials. The PERFORM study is exploring the efficacy of terutroban versus aspirin for secondary prevention in patients with a history of ischemic stroke. We report on the design of an exploratory longitudinal MRI follow-up study that was performed in a subgroup of the PERFORM trial. An international multi-centre longitudinal follow-up MRI study was designed for different MR systems employing safety and efficacy readouts: new T2 lesions, new DWI lesions, whole brain volume change, hippocampal volume change, changes in tissue microstructure as depicted by mean diffusivity and fractional anisotropy, vessel patency on MR angiography, and the presence of and development of new microbleeds. A total of 1,056 patients (men and women ≥ 55 years) were included. The data analysis included 3D reformation, image registration of different contrasts, tissue segmentation, and automated lesion detection. This large international multi-centre study demonstrates how new MRI readouts can be used to provide key information on the evolution of cerebral tissue lesions and within the macrovasculature after atherothrombotic stroke in a large sample of patients.
Resumo:
The focus of my PhD research was the concept of modularity. In the last 15 years, modularity has become a classic term in different fields of biology. On the conceptual level, a module is a set of interacting elements that remain mostly independent from the elements outside of the module. I used modular analysis techniques to study gene expression evolution in vertebrates. In particular, I identified ``natural'' modules of gene expression in mouse and human, and I showed that expression of organ-specific and system-specific genes tends to be conserved between such distance vertebrates as mammals and fishes. Also with a modular approach, I studied patterns of developmental constraints on transcriptome evolution. I showed that none of the two commonly accepted models of the evolution of embryonic development (``evo-devo'') are exclusively valid. In particular, I found that the conservation of the sequences of regulatory regions is highest during mid-development of zebrafish, and thus it supports the ``hourglass model''. In contrast, events of gene duplication and new gene introduction are most rare in early development, which supports the ``early conservation model''. In addition to the biological insights on transcriptome evolution, I have also discussed in detail the advantages of modular approaches in large-scale data analysis. Moreover, I re-analyzed several studies (published in high-ranking journals), and showed that their conclusions do not hold out under a detailed analysis. This demonstrates that complex analysis of high-throughput data requires a co-operation between biologists, bioinformaticians, and statisticians.
Resumo:
Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.
Resumo:
Geophysical techniques can help to bridge the inherent gap with regard to spatial resolution and the range of coverage that plagues classical hydrological methods. This has lead to the emergence of the new and rapidly growing field of hydrogeophysics. Given the differing sensitivities of various geophysical techniques to hydrologically relevant parameters and their inherent trade-off between resolution and range the fundamental usefulness of multi-method hydrogeophysical surveys for reducing uncertainties in data analysis and interpretation is widely accepted. A major challenge arising from such endeavors is the quantitative integration of the resulting vast and diverse database in order to obtain a unified model of the probed subsurface region that is internally consistent with all available data. To address this problem, we have developed a strategy towards hydrogeophysical data integration based on Monte-Carlo-type conditional stochastic simulation that we consider to be particularly suitable for local-scale studies characterized by high-resolution and high-quality datasets. Monte-Carlo-based optimization techniques are flexible and versatile, allow for accounting for a wide variety of data and constraints of differing resolution and hardness and thus have the potential of providing, in a geostatistical sense, highly detailed and realistic models of the pertinent target parameter distributions. Compared to more conventional approaches of this kind, our approach provides significant advancements in the way that the larger-scale deterministic information resolved by the hydrogeophysical data can be accounted for, which represents an inherently problematic, and as of yet unresolved, aspect of Monte-Carlo-type conditional simulation techniques. We present the results of applying our algorithm to the integration of porosity log and tomographic crosshole georadar data to generate stochastic realizations of the local-scale porosity structure. Our procedure is first tested on pertinent synthetic data and then applied to corresponding field data collected at the Boise Hydrogeophysical Research Site near Boise, Idaho, USA.
Resumo:
INTRODUCTION: infants hospitalised in neonatology are inevitably exposed to pain repeatedly. Premature infants are particularly vulnerable, because they are hypersensitive to pain and demonstrate diminished behavioural responses to pain. They are therefore at risk of developing short and long-term complications if pain remains untreated. CONTEXT: compared to acute pain, there is limited evidence in the literature on prolonged pain in infants. However, the prevalence is reported between 20 and 40 %. OBJECTIVE : this single case study aimed to identify the bio-contextual characteristics of neonates who experienced prolonged pain. METHODS : this study was carried out in the neonatal unit of a tertiary referral centre in Western Switzerland. A retrospective data analysis of seven infants' profile, who experienced prolonged pain ,was performed using five different data sources. RESULTS : the mean gestational age of the seven infants was 32weeks. The main diagnosis included prematurity and respiratory distress syndrome. The total observations (N=55) showed that the participants had in average 21.8 (SD 6.9) painful procedures that were estimated to be of moderate to severe intensity each day. Out of the 164 recorded pain scores (2.9 pain assessment/day/infant), 14.6 % confirmed acute pain. Out of those experiencing acute pain, analgesia was given in 16.6 % of them and 79.1 % received no analgesia. CONCLUSION: this study highlighted the difficulty in managing pain in neonates who are exposed to numerous painful procedures. Pain in this population remains underevaluated and as a result undertreated.Results of this study showed that nursing documentation related to pain assessment is not systematic.Regular assessment and documentation of acute and prolonged pain are recommended. This could be achieved with clear guidelines on the Assessment Intervention Reassessment (AIR) cyclewith validated measures adapted to neonates. The adequacy of pain assessment is a pre-requisite for appropriate pain relief in neonates.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.
Resumo:
We conducted this study to determine the relative influence of various mechanical and patient-related factors on the incidence of dislocation after primary total hip asthroplasty (THA). Of 2,023 THAs, 21 patients who had at least 1 dislocation were compared with a control group of 21 patients without dislocation, matched for age, gender, pathology, and year of surgery. Implant positioning, seniority of the surgeon, American Society of Anesthesiologists (ASA) score, and diminished motor coordination were recorded. Data analysis included univariate and multivariate methods. The dislocation risk was 6.9 times higher if total anteversion was not between 40 degrees and 60 degrees and 10 times higher in patients with high ASA scores. Surgeons should pay attention to total anteversion (cup and stem) of THA. The ASA score should be part of the preoperative assessment of the dislocation risk.