87 resultados para solve
Resumo:
Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.
Resumo:
The Multiscale Finite Volume (MsFV) method has been developed to efficiently solve reservoir-scale problems while conserving fine-scale details. The method employs two grid levels: a fine grid and a coarse grid. The latter is used to calculate a coarse solution to the original problem, which is interpolated to the fine mesh. The coarse system is constructed from the fine-scale problem using restriction and prolongation operators that are obtained by introducing appropriate localization assumptions. Through a successive reconstruction step, the MsFV method is able to provide an approximate, but fully conservative fine-scale velocity field. For very large problems (e.g. one billion cell model), a two-level algorithm can remain computational expensive. Depending on the upscaling factor, the computational expense comes either from the costs associated with the solution of the coarse problem or from the construction of the local interpolators (basis functions). To ensure numerical efficiency in the former case, the MsFV concept can be reapplied to the coarse problem, leading to a new, coarser level of discretization. One challenge in the use of a multilevel MsFV technique is to find an efficient reconstruction step to obtain a conservative fine-scale velocity field. In this work, we introduce a three-level Multiscale Finite Volume method (MlMsFV) and give a detailed description of the reconstruction step. Complexity analyses of the original MsFV method and the new MlMsFV method are discussed, and their performances in terms of accuracy and efficiency are compared.
Resumo:
We study the dynamics of a water-oil meniscus moving from a smaller to a larger pore. The process is characterised by an abrupt change in the configuration, yielding a sudden energy release. A theoretic study for static conditions provides analytical solutions of the surface energy content of the system. Although the configuration after the sudden energy release is energetically more convenient, an energy barrier must be overcome before the process can happen spontaneously. The energy barrier depends on the system geometry and on the flow parameters. The analytical results are compared to numerical simulations that solve the full Navier-Stokes equation in the pore space and employ the Volume Of Fluid (VOF) method to track the evolution of the interface. First, the numerical simulations of a quasi-static process are validated by comparison with the analytical solutions for a static meniscus, then numerical simulations with varying injection velocity are used to investigate dynamic effects on the configuration change. During the sudden energy jump the system exhibits an oscillatory behaviour. Extension to more complex geometries might elucidate the mechanisms leading to a dynamic capillary pressure and to bifurcations in final distributions of fluid phases in porous
Resumo:
Photopolymerization is commonly used in a broad range of bioapplications, such as drug delivery, tissue engineering, and surgical implants, where liquid materials are injected and then hardened by means of illumination to create a solid polymer network. However, photopolymerization using a probe, e.g., needle guiding both the liquid and the curing illumination, has not been thoroughly investigated. We present a Monte Carlo model that takes into account the dynamic absorption and scattering parameters as well as solid-liquid boundaries of the photopolymer to yield the shape and volume of minimally invasively injected, photopolymerized hydrogels. In the first part of the article, our model is validated using a set of well-known poly(ethylene glycol) dimethacrylate hydrogels showing an excellent agreement between simulated and experimental volume-growth-rates. In the second part, in situ experimental results and simulations for photopolymerization in tissue cavities are presented. It was found that a cavity with a volume of 152 mm3 can be photopolymerized from the output of a 0.28-mm2 fiber by adding scattering lipid particles while only a volume of 38 mm3 (25%) was achieved without particles. The proposed model provides a simple and robust method to solve complex photopolymerization problems, where the dimension of the light source is much smaller than the volume of the photopolymerizable hydrogel.
Resumo:
Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.
Resumo:
Résumé La structure, ou l'architecture, des êtres vivants définit le cadre dans lequel la physique de la vie s'accomplit. La connaissance de cette structure dans ses moindres détails est un but essentiel de la biologie. Son étude est toutefois entravée par des limitations techniques. Malgré son potentiel théorique, la microscopie électronique n'atteint pas une résolution atomique lorsqu'elle est appliquée ä la matièxe biologique. Cela est dû en grande partie au fait qu'elle contient beaucoup d'eau qui ne résiste pas au vide du microscope. Elle doit donc être déshydratée avant d'être introduite dans un microscope conventionnel. Des artéfacts d'agrégation en découlent inévitablement. La cryo-microscopie électronique des sections vitreuses (CEMOVIS) a ëté développée afin de résoudre cela. Les spécimens sont vitrifiés, c.-à-d. que leur eau est immobilisée sans cristalliser par le froid. Ils sont ensuite coupés en sections ultrafines et celles-ci sont observées à basse température. Les spécimens sont donc observés sous forme hydratée et non fixée; ils sont proches de leur état natif. Durant longtemps, CEMOVIS était très difficile à exécuter mais ce n'est plus le cas. Durant cette thèse, CEMOVIS a été appliqué à différents spécimens. La synapse du système nerveux central a été étudiée. La présence dans la fente synaptique d'une forte densité de molécules organisées de manière périodique a été démontrée. Des particules luminales ont été trouvées dans Ies microtubules cérébraux. Les microtubules ont servi d'objets-test et ont permis de démontrer que des détails moléculaires de l'ordre du nm sont préservés. La compréhension de la structure de l'enveloppe cellulaire des bactéries Grampositives aété améliorée. Nos observations ont abouti à l'élaboration d'un nouveau modèle hypothétique de la synthèse de la paroi. Nous avons aussi focalisé notre attention sur le nucléoïde bactérien et cela a suscité un modèle de la fonction des différents états structuraux du nucléoïde. En conclusion, cette thèse a démontré que CEMOVIS est une excellente méthode poux étudier la structure d'échantillons biologiques à haute résolution. L'étude de la structure de divers aspects des êtres vivants a évoqué des hypothèses quant à la compréhension de leur fonctionnement. Summary The structure, or the architecture, of living beings defines the framework in which the physics of life takes place. Understanding it in its finest details is an essential goal of biology. Its study is however hampered by technical limitations. Despite its theoretical potential, electron microscopy cannot resolve individual atoms in biological matter. This is in great part due to the fact. that it contains a lot of water that cannot stand the vacuum of the microscope. It must therefore be dehydrated before being introduced in a conventional mìcroscope. Aggregation artefacts unavoidably happen. Cryo-electron microscopy of vitreous sections (CEMOVIS) has been developed to solve this problem. Specimens are vitrified, i.e. they are rapidly cooled and their water is immobilised without crystallising by the cold. They are then. sectioned in ultrathin slices, which are observed at low temperatures. Specimens are therefore observed in hydrated and unfixed form; they are close to their native state. For a long time, CEMOVIS was extremely tedious but this is not the case anymore. During this thesis, CEMOVIS was applied to different specimens. Synapse of central nervous system was studied. A high density of periodically-organised molecules was shown in the synaptic cleft. Luminal particles were found in brain microtubules. Microtubules, used as test specimen, permitted to demonstrate that molecular details of the order of nm .are preserved. The understanding of the structure of cell envelope of Gram-positive bacteria was improved. Our observations led to the elaboration of a new hypothetic model of cell wall synthesis. We also focused our attention on bacterial nucleoids and this also gave rise to a functional model of nucleoid structural states. In conclusion, this thesis demonstrated that CEMOVIS is an excellent method for studying the structure of bìologìcal specimens at high resolution. The study of the structure of various aspects of living beings evoked hypothesis for their functioning.
Resumo:
The majority of Crohn's disease patients will develop a complicated disease course over time which is characterized by the occurrence of stricturing and penetrating disease. Penetrating disease comprises internal fistulas (e.g. enteroenteric) and perianal disease. A complicated disease course may be associated with considerable morbidity and professional and personal disabilities. Treatment options for fibrostenotic Crohn's disease comprise endoscopic balloon dilation, stricturoplasties and surgical resection. Treatment of symptomatic perianal fistulizing disease is based on antibiotics, immunomodulators and anti-TNF drugs. Surgical measures include fistula drainage by means of setons, temporary ileostomy or a proctectomy. The presence of internal fistulas often necessitates surgical measures. A close collaboration between the gastroenterologist and the surgeon is mandatory to solve these interdisciplinary challenges.
Resumo:
Cocktail parties, busy streets, and other noisy environments pose a difficult challenge to the auditory system: how to focus attention on selected sounds while ignoring others? Neurons of primary auditory cortex, many of which are sharply tuned to sound frequency, could help solve this problem by filtering selected sound information based on frequency-content. To investigate whether this occurs, we used high-resolution fMRI at 7 tesla to map the fine-scale frequency-tuning (1.5 mm isotropic resolution) of primary auditory areas A1 and R in six human participants. Then, in a selective attention experiment, participants heard low (250 Hz)- and high (4000 Hz)-frequency streams of tones presented at the same time (dual-stream) and were instructed to focus attention onto one stream versus the other, switching back and forth every 30 s. Attention to low-frequency tones enhanced neural responses within low-frequency-tuned voxels relative to high, and when attention switched the pattern quickly reversed. Thus, like a radio, human primary auditory cortex is able to tune into attended frequency channels and can switch channels on demand.
Resumo:
The multiscale finite-volume (MSFV) method has been derived to efficiently solve large problems with spatially varying coefficients. The fine-scale problem is subdivided into local problems that can be solved separately and are coupled by a global problem. This algorithm, in consequence, shares some characteristics with two-level domain decomposition (DD) methods. However, the MSFV algorithm is different in that it incorporates a flux reconstruction step, which delivers a fine-scale mass conservative flux field without the need for iterating. This is achieved by the use of two overlapping coarse grids. The recently introduced correction function allows for a consistent handling of source terms, which makes the MSFV method a flexible algorithm that is applicable to a wide spectrum of problems. It is demonstrated that the MSFV operator, used to compute an approximate pressure solution, can be equivalently constructed by writing the Schur complement with a tangential approximation of a single-cell overlapping grid and incorporation of appropriate coarse-scale mass-balance equations.
Resumo:
Methicillin-resistant Staphylococcus aureus (MRSA) have developed resistance to virtually all non-experimental antibiotics. They are intrinsically resistant to beta-lactams by virtue of newly acquired low-affinity penicillin-binding protein 2A (PBP2A). Because PBP2A can build the wall when other PBPs are blocked by beta-lactams, designing beta-lactams capable of blocking this additional target should help solve the issue. Older molecules including penicillin G, amoxicillin and ampicillin had relatively good PBP2A affinities, and successfully treated experimental endocarditis caused by MRSA, provided that the bacterial penicillinase could be inhibited. Newer anti-PBP2A beta-lactams with over 10-fold greater PBP2A affinities and low minimal inhibitory concentrations were developed, primarily in the cephem and carbapenem classes. They are also very resistant to penicillinase. Most have demonstrated anti-MRSA activity in animal models of infection, and two--the carbapenem CS-023 and the cephalosporin ceftopibrole medocaril--have proceeded to Phase II and Phase III clinical evaluation. Thus, clinically useful anti-MRSA beta-lactams are imminent.
Resumo:
A haplotype is an m-long binary vector. The XOR-genotype of two haplotypes is the m-vector of their coordinate-wise XOR. We study the following problem: Given a set of XOR-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes can be mapped onto a perfect phylogeny (PP) tree. The question is motivated by studying population evolution in human genetics and is a variant of the PP haplotyping problem that has received intensive attention recently. Unlike the latter problem, in which the input is '' full '' genotypes, here, we assume less informative input and so may be more economical to obtain experimentally. Building on ideas of Gusfield, we show how to solve the problem in polynomial time by a reduction to the graph realization problem. The actual haplotypes are not uniquely determined by the tree they map onto and the tree itself may or may not be unique. We show that tree uniqueness implies uniquely determined haplotypes, up to inherent degrees of freedom, and give a sufficient condition for the uniqueness. To actually determine the haplotypes given the tree, additional information is necessary. We show that two or three full genotypes suffice to reconstruct all the haplotypes and present a linear algorithm for identifying those genotypes.
Resumo:
A clear and rigorous definition of muscle moment-arms in the context of musculoskeletal systems modelling is presented, using classical mechanics and screw theory. The definition provides an alternative to the tendon excursion method, which can lead to incorrect moment-arms if used inappropriately due to its dependency on the choice of joint coordinates. The definition of moment-arms, and the presented construction method, apply to musculoskeletal models in which the bones are modelled as rigid bodies, the joints are modelled as ideal mechanical joints and the muscles are modelled as massless, frictionless cables wrapping over the bony protrusions, approximated using geometric surfaces. In this context, the definition is independent of any coordinate choice. It is then used to solve a muscle-force estimation problem for a simple 2D conceptual model and compared with an incorrect application of the tendon excursion method. The relative errors between the two solutions vary between 0% and 100%.
Resumo:
There is increasing evidence to suggest that the presence of mesoscopic heterogeneities constitutes an important seismic attenuation mechanism in porous rocks. As a consequence, centimetre-scale perturbations of the rock physical properties should be taken into account for seismic modelling whenever detailed and accurate responses of specific target structures are desired, which is, however, computationally prohibitive. A convenient way to circumvent this problem is to use an upscaling procedure to replace each of the heterogeneous porous media composing the geological model by corresponding equivalent visco-elastic solids and to solve the visco-elastic equations of motion for the inferred equivalent model. While the overall qualitative validity of this procedure is well established, there are as of yet no quantitative analyses regarding the equivalence of the seismograms resulting from the original poro-elastic and the corresponding upscaled visco-elastic models. To address this issue, we compare poro-elastic and visco-elastic solutions for a range of marine-type models of increasing complexity. We found that despite the identical dispersion and attenuation behaviour of the heterogeneous poro-elastic and the equivalent visco-elastic media, the seismograms may differ substantially due to diverging boundary conditions, where there exist additional options for the poro-elastic case. In particular, we observe that at the fluid/porous-solid interface, the poro- and visco-elastic seismograms agree for closed-pore boundary conditions, but differ significantly for open-pore boundary conditions. This is an important result which has potentially far-reaching implications for wave-equation-based algorithms in exploration geophysics involving fluid/porous-solid interfaces, such as, for example, wavefield decomposition.
Resumo:
Les problèmes d'écoulements multiphasiques en média poreux sont d'un grand intérêt pour de nombreuses applications scientifiques et techniques ; comme la séquestration de C02, l'extraction de pétrole et la dépollution des aquifères. La complexité intrinsèque des systèmes multiphasiques et l'hétérogénéité des formations géologiques sur des échelles multiples représentent un challenge majeur pour comprendre et modéliser les déplacements immiscibles dans les milieux poreux. Les descriptions à l'échelle supérieure basées sur la généralisation de l'équation de Darcy sont largement utilisées, mais ces méthodes sont sujettes à limitations pour les écoulements présentant de l'hystérèse. Les avancées récentes en terme de performances computationnelles et le développement de méthodes précises pour caractériser l'espace interstitiel ainsi que la distribution des phases ont favorisé l'utilisation de modèles qui permettent une résolution fine à l'échelle du pore. Ces modèles offrent un aperçu des caractéristiques de l'écoulement qui ne peuvent pas être facilement observées en laboratoire et peuvent être utilisé pour expliquer la différence entre les processus physiques et les modèles à l'échelle macroscopique existants. L'objet premier de la thèse se porte sur la simulation numérique directe : les équations de Navier-Stokes sont résolues dans l'espace interstitiel et la méthode du volume de fluide (VOF) est employée pour suivre l'évolution de l'interface. Dans VOF, la distribution des phases est décrite par une fonction fluide pour l'ensemble du domaine et des conditions aux bords particulières permettent la prise en compte des propriétés de mouillage du milieu poreux. Dans la première partie de la thèse, nous simulons le drainage dans une cellule Hele-Shaw 2D avec des obstacles cylindriques. Nous montrons que l'approche proposée est applicable même pour des ratios de densité et de viscosité très importants et permet de modéliser la transition entre déplacement stable et digitation visqueuse. Nous intéressons ensuite à l'interprétation de la pression capillaire à l'échelle macroscopique. Nous montrons que les techniques basées sur la moyenne spatiale de la pression présentent plusieurs limitations et sont imprécises en présence d'effets visqueux et de piégeage. Au contraire, une définition basée sur l'énergie permet de séparer les contributions capillaires des effets visqueux. La seconde partie de la thèse est consacrée à l'investigation des effets d'inertie associés aux reconfigurations irréversibles du ménisque causé par l'interface des instabilités. Comme prototype pour ces phénomènes, nous étudions d'abord la dynamique d'un ménisque dans un pore angulaire. Nous montrons que, dans un réseau de pores cubiques, les sauts et reconfigurations sont si fréquents que les effets d'inertie mènent à différentes configurations des fluides. A cause de la non-linéarité du problème, la distribution des fluides influence le travail des forces de pression, qui, à son tour, provoque une chute de pression dans la loi de Darcy. Cela suggère que ces phénomènes devraient être pris en compte lorsque que l'on décrit l'écoulement multiphasique en média poreux à l'échelle macroscopique. La dernière partie de la thèse s'attache à démontrer la validité de notre approche par une comparaison avec des expériences en laboratoire : un drainage instable dans un milieu poreux quasi 2D (une cellule Hele-Shaw avec des obstacles cylindriques). Plusieurs simulations sont tournées sous différentes conditions aux bords et en utilisant différents modèles (modèle intégré 2D et modèle 3D) afin de comparer certaines quantités macroscopiques avec les observations au laboratoire correspondantes. Malgré le challenge de modéliser des déplacements instables, où, par définition, de petites perturbations peuvent grandir sans fin, notre approche numérique apporte de résultats satisfaisants pour tous les cas étudiés. - Problems involving multiphase flow in porous media are of great interest in many scientific and engineering applications including Carbon Capture and Storage, oil recovery and groundwater remediation. The intrinsic complexity of multiphase systems and the multi scale heterogeneity of geological formations represent the major challenges to understand and model immiscible displacement in porous media. Upscaled descriptions based on generalization of Darcy's law are widely used, but they are subject to several limitations for flow that exhibit hysteric and history- dependent behaviors. Recent advances in high performance computing and the development of accurate methods to characterize pore space and phase distribution have fostered the use of models that allow sub-pore resolution. These models provide an insight on flow characteristics that cannot be easily achieved by laboratory experiments and can be used to explain the gap between physical processes and existing macro-scale models. We focus on direct numerical simulations: we solve the Navier-Stokes equations for mass and momentum conservation in the pore space and employ the Volume Of Fluid (VOF) method to track the evolution of the interface. In the VOF the distribution of the phases is described by a fluid function (whole-domain formulation) and special boundary conditions account for the wetting properties of the porous medium. In the first part of this thesis we simulate drainage in a 2-D Hele-Shaw cell filled with cylindrical obstacles. We show that the proposed approach can handle very large density and viscosity ratios and it is able to model the transition from stable displacement to viscous fingering. We then focus on the interpretation of the macroscopic capillary pressure showing that pressure average techniques are subject to several limitations and they are not accurate in presence of viscous effects and trapping. On the contrary an energy-based definition allows separating viscous and capillary contributions. In the second part of the thesis we investigate inertia effects associated with abrupt and irreversible reconfigurations of the menisci caused by interface instabilities. As a prototype of these phenomena we first consider the dynamics of a meniscus in an angular pore. We show that in a network of cubic pores, jumps and reconfigurations are so frequent that inertia effects lead to different fluid configurations. Due to the non-linearity of the problem, the distribution of the fluids influences the work done by pressure forces, which is in turn related to the pressure drop in Darcy's law. This suggests that these phenomena should be taken into account when upscaling multiphase flow in porous media. The last part of the thesis is devoted to proving the accuracy of the numerical approach by validation with experiments of unstable primary drainage in a quasi-2D porous medium (i.e., Hele-Shaw cell filled with cylindrical obstacles). We perform simulations under different boundary conditions and using different models (2-D integrated and full 3-D) and we compare several macroscopic quantities with the corresponding experiment. Despite the intrinsic challenges of modeling unstable displacement, where by definition small perturbations can grow without bounds, the numerical method gives satisfactory results for all the cases studied.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.