109 resultados para Acceleration data structure
Resumo:
The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project (http://www.EuroPhenome.org) is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline (http://www.empress.har.mrc.ac.uk/) which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies.
Resumo:
Introduction Societies of ants, bees, wasps and termites dominate many terrestrial ecosystems (Wilson 1971). Their evolutionary and ecological success is based upon the regulation of internal conflicts (e.g. Ratnieks et al. 2006), control of diseases (e.g. Schmid-Hempel 1998) and individual skills and collective intelligence in resource acquisition, nest building and defence (e.g. Camazine 2001). Individuals in social species can pass on their genes not only directly trough their own offspring, but also indirectly by favouring the reproduction of relatives. The inclusive fitness theory of Hamilton (1963; 1964) provides a powerful explanation for the evolution of reproductive altruism and cooperation in groups with related individuals. The same theory also led to the realization that insect societies are subject to internal conflicts over reproduction. Relatedness of less-than-one is not sufficient to eliminate all incentive for individual selfishness. This would indeed require a relatedness of one, as found among cells of an organism (Hardin 1968; Keller 1999). The challenge for evolutionary biology is to understand how groups can prevent or reduce the selfish exploitation of resources by group members, and how societies with low relatedness are maintained. In social insects the evolutionary shift from single- to multiple queens colonies modified the relatedness structure, the dispersal, and the mode of colony founding (e.g. (Crozier & Pamilo 1996). In ants, the most common, and presumably ancestral mode of reproduction is the emission of winged males and females, which found a new colony independently after mating and dispersal flights (Hölldobler & Wilson 1990). The alternative reproductive tactic for ant queens in multiple-queen colonies (polygyne) is to seek to be re-accepted in their natal colonies, where they may remain as additional reproductives or subsequently disperse on foot with part of the colony (budding) (Bourke & Franks 1995; Crozier & Pamilo 1996; Hölldobler & Wilson 1990). Such ant colonies can contain up to several hundred reproductive queens with an even more numerous workforce (Cherix 1980; Cherix 1983). As a consequence in polygynous ants the relatedness among nestmates is very low, and workers raise brood of queens to which they are only distantly related (Crozier & Pamilo 1996; Queller & Strassmann 1998). Therefore workers could increase their inclusive fitness by preferentially caring for their closest relatives and discriminate against less related or foreign individuals (Keller 1997; Queller & Strassmann 2002; Tarpy et al. 2004). However, the bulk of the evidence suggests that social insects do not behave nepotistically, probably because of the costs entailed by decreased colony efficiency or discrimination errors (Keller 1997). Recently, the consensus that nepotistic behaviour does not occur in insect colonies was challenged by a study in the ant Formica fusca (Hannonen & Sundström 2003b) showing that the reproductive share of queens more closely related to workers increases during brood development. However, this pattern can be explained either by nepotism with workers preferentially rearing the brood of more closely related queens or intrinsic differences in the viability of eggs laid by queens. In the first chapter, we designed an experiment to disentangle nepotism and differences in brood viability. We tested if workers prefer to rear their kin when given the choice between highly related and unrelated brood in the ant F. exsecta. We also looked for differences in egg viability among queens and simulated if such differences in egg viability may mistakenly lead to the conclusion that workers behave nepotistically. The acceptance of queens in polygnous ants raises the question whether the varying degree of relatedness affects their share in reproduction. In such colonies workers should favour nestmate queens over foreign queens. Numerous studies have investigated reproductive skew and partitioning of reproduction among queens (Bourke et al. 1997; Fournier et al. 2004; Fournier & Keller 2001; Hammond et al. 2006; Hannonen & Sundström 2003a; Heinze et al. 2001; Kümmerli & Keller 2007; Langer et al. 2004; Pamilo & Seppä 1994; Ross 1988; Ross 1993; Rüppell et al. 2002), yet almost no information is available on whether differences among queens in their relatedness to other colony members affects their share in reproduction. Such data are necessary to compare the relative reproductive success of dispersing and non-dispersing individuals. Moreover, information on whether there is a difference in reproductive success between resident and dispersing queens is also important for our understanding of the genetic structure of ant colonies and the dynamics of within group conflicts. In chapter two, we created single-queen colonies and then introduced a foreign queens originating from another colony kept under similar conditions in order to estimate the rate of queen acceptance into foreign established colonies, and to quantify the reproductive share of resident and introduced queens. An increasing number of studies have investigated the discrimination ability between ant workers (e.g. Holzer et al. 2006; Pedersen et al. 2006), but few have addressed the recognition and discrimination behaviour of workers towards reproductive individuals entering colonies (Bennett 1988; Brown et al. 2003; Evans 1996; Fortelius et al. 1993; Kikuchi et al. 2007; Rosengren & Pamilo 1986; Stuart et al. 1993; Sundström 1997; Vásquez & Silverman in press). These studies are important, because accepting new queens will generally have a large impact on colony kin structure and inclusive fitness of workers (Heinze & Keller 2000). In chapter three, we examined whether resident workers reject young foreign queens that enter into their nest. We introduced mated queens into their natal nest, a foreign-female producing nest, or a foreign male-producing nest and measured their survival. In addition, we also introduced young virgin and mated queens into their natal nest to examine whether the mating status of the queens influences their survival and acceptance by workers. On top of polgyny, some ant species have evolved an extraordinary social organization called 'unicoloniality' (Hölldobler & Wilson 1977; Pedersen et al. 2006). In unicolonial ants, intercolony borders are absent and workers and queens mix among the physically separated nests, such that nests form one large supercolony. Super-colonies can become very large, so that direct cooperative interactions are impossible between individuals of distant nests. Unicoloniality is an evolutionary paradox and a potential problem for kin selection theory because the mixing of queens and workers between nests leads to extremely low relatedness among nestmates (Bourke & Franks 1995; Crozier & Pamilo 1996; Keller 1995). A better understanding of the evolution and maintenance of unicoloniality requests detailed information on the discrimination behavior, dispersal, population structure, and the scale of competition. Cryptic genetic population structure may provide important information on the relevant scale to be considered when measuring relatedness and the role of kin selection. Theoretical studies have shown that relatedness should be measured at the level of the `economic neighborhood', which is the scale at which intraspecific competition generally takes place (Griffin & West 2002; Kelly 1994; Queller 1994; Taylor 1992). In chapter four, we conducted alarge-scale study to determine whether the unicolonial ant Formica paralugubris forms populations that are organised in discrete supercolonies or whether there is a continuous gradation in the level of aggression that may correlate with genetic isolation by distance and/or spatial distance between nests. In chapter five, we investigated the fine-scale population structure in three populations of F. paralugubris. We have developed mitochondria) markers, which together with the nuclear markers allowed us to detect cryptic genetic clusters of nests, to obtain more precise information on the genetic differentiation within populations, and to separate male and female gene flow. These new data provide important information on the scale to be considered when measuring relatedness in native unicolonial populations.
Resumo:
Joint inversion of crosshole ground-penetrating radar and seismic data can improve model resolution and fidelity of the resultant individual models. Model coupling obtained by minimizing or penalizing some measure of structural dissimilarity between models appears to be the most versatile approach because only weak assumptions about petrophysical relationships are required. Nevertheless, experimental results and petrophysical arguments suggest that when porosity variations are weak in saturated unconsolidated environments, then radar wave speed is approximately linearly related to seismic wave speed. Under such circumstances, model coupling also can be achieved by incorporating cross-covariances in the model regularization. In two case studies, structural similarity is imposed by penalizing models for which the model cross-gradients are nonzero. A first case study demonstrates improvements in model resolution by comparing the resulting models with borehole information, whereas a second case study uses point-spread functions. Although radar seismic wavespeed crossplots are very similar for the two case studies, the models plot in different portions of the graph, suggesting variances in porosity. Both examples display a close, quasilinear relationship between radar seismic wave speed in unconsolidated environments that is described rather well by the corresponding lower Hashin-Shtrikman (HS) bounds. Combining crossplots of the joint inversion models with HS bounds can constrain porosity and pore structure better than individual inversion results can.
Resumo:
Our understanding of how genotype determines phenotype in primary dystonia is limited. Familial young-onset primary dystonia is commonly due to the DYT1 gene mutation. A critical question, given the 30% penetrance of clinical symptoms in DYT1 mutation carriers, is why the same genotype leads to differential clinical expression and whether non-DYT1 adult-onset primary dystonia, with and without family history share pathophysiological mechanisms with DYT1 dystonia. This study examines the relationship between dystonic phenotype and the DYT1 gene mutation by monitoring whole-brain structure using voxel-based morphometry. We acquired magnetic resonance imaging data of symptomatic and asymptomatic DYT1 mutation carriers, of non-DYT1 primary dystonia patients, with and without family history and control subjects with normal DYT1 alleles. By crossing the factors genotype and phenotype we demonstrate a significant interaction in terms of brain anatomy confined to the basal ganglia bilaterally. The explanation for this effect differs according to both gene and dystonia status: non-DYT1 adult-onset dystonia patients and asymptomatic DYT1 carriers have significantly larger basal ganglia compared to healthy subjects and symptomatic DYT1 mutation carriers. There is a significant negative correlation between severity of dystonia and basal ganglia size in DYT1 mutation carriers. We propose that differential pathophysiological and compensatory mechanisms lead to brain structure changes in non-DYT1 primary adult-onset dystonias and DYT1 gene carriers. Given the range of age of onset, there may be differential genetic modulation of brain development that in turn determines clinical expression. Alternatively, a DYT1 gene dependent primary defect of motor circuit development may lead to stress-induced remodelling of the basal ganglia and hence dystonia.
Resumo:
The cross-recognition of peptides by cytotoxic T lymphocytes is a key element in immunology and in particular in peptide based immunotherapy. Here we develop three-dimensional (3D) quantitative structure-activity relationships (QSARs) to predict cross-recognition by Melan-A-specific cytotoxic T lymphocytes of peptides bound to HLA A*0201 (hereafter referred to as HLA A2). First, we predict the structure of a set of self- and pathogen-derived peptides bound to HLA A2 using a previously developed ab initio structure prediction approach [Fagerberg et al., J. Mol. Biol., 521-46 (2006)]. Second, shape and electrostatic energy calculations are performed on a 3D grid to produce similarity matrices which are combined with a genetic neural network method [So et al., J. Med. Chem., 4347-59 (1997)] to generate 3D-QSAR models. The models are extensively validated using several different approaches. During the model generation, the leave-one-out cross-validated correlation coefficient (q (2)) is used as the fitness criterion and all obtained models are evaluated based on their q (2) values. Moreover, the best model obtained for a partitioned data set is evaluated by its correlation coefficient (r = 0.92 for the external test set). The physical relevance of all models is tested using a functional dependence analysis and the robustness of the models obtained for the entire data set is confirmed using y-randomization. Finally, the validated models are tested for their utility in the setting of rational peptide design: their ability to discriminate between peptides that only contain side chain substitutions in a single secondary anchor position is evaluated. In addition, the predicted cross-recognition of the mono-substituted peptides is confirmed experimentally in chromium-release assays. These results underline the utility of 3D-QSARs in peptide mimetic design and suggest that the properties of the unbound epitope are sufficient to capture most of the information to determine the cross-recognition.
Resumo:
Geophysical techniques can help to bridge the inherent gap with regard to spatial resolution and the range of coverage that plagues classical hydrological methods. This has lead to the emergence of the new and rapidly growing field of hydrogeophysics. Given the differing sensitivities of various geophysical techniques to hydrologically relevant parameters and their inherent trade-off between resolution and range the fundamental usefulness of multi-method hydrogeophysical surveys for reducing uncertainties in data analysis and interpretation is widely accepted. A major challenge arising from such endeavors is the quantitative integration of the resulting vast and diverse database in order to obtain a unified model of the probed subsurface region that is internally consistent with all available data. To address this problem, we have developed a strategy towards hydrogeophysical data integration based on Monte-Carlo-type conditional stochastic simulation that we consider to be particularly suitable for local-scale studies characterized by high-resolution and high-quality datasets. Monte-Carlo-based optimization techniques are flexible and versatile, allow for accounting for a wide variety of data and constraints of differing resolution and hardness and thus have the potential of providing, in a geostatistical sense, highly detailed and realistic models of the pertinent target parameter distributions. Compared to more conventional approaches of this kind, our approach provides significant advancements in the way that the larger-scale deterministic information resolved by the hydrogeophysical data can be accounted for, which represents an inherently problematic, and as of yet unresolved, aspect of Monte-Carlo-type conditional simulation techniques. We present the results of applying our algorithm to the integration of porosity log and tomographic crosshole georadar data to generate stochastic realizations of the local-scale porosity structure. Our procedure is first tested on pertinent synthetic data and then applied to corresponding field data collected at the Boise Hydrogeophysical Research Site near Boise, Idaho, USA.
Resumo:
The transcription factors TFIIB, Brf1, and Brf2 share related N-terminal zinc ribbon and core domains. TFIIB bridges RNA polymerase II (Pol II) with the promoter-bound preinitiation complex, whereas Brf1 and Brf2 are involved, as part of activities also containing TBP and Bdp1 and referred to here as Brf1-TFIIIB and Brf2-TFIIIB, in the recruitment of Pol III. Brf1-TFIIIB recruits Pol III to type 1 and 2 promoters and Brf2-TFIIIB to type 3 promoters such as the human U6 promoter. Brf1 and Brf2 both have a C-terminal extension absent in TFIIB, but their C-terminal extensions are unrelated. In yeast Brf1, the C-terminal extension interacts with the TBP/TATA box complex and contributes to the recruitment of Bdp1. Here we have tested truncated Brf2, as well as Brf2/TFIIB chimeric proteins for U6 transcription and for assembly of U6 preinitiation complexes. Our results characterize functions of various human Brf2 domains and reveal that the C-terminal domain is required for efficient association of the protein with U6 promoter-bound TBP and SNAP(c), a type 3 promoter-specific transcription factor, and for efficient recruitment of Bdp1. This in turn suggests that the C-terminal extensions in Brf1 and Brf2 are crucial to specific recruitment of Pol III over Pol II.
Resumo:
Wood ant species show differences in their social structure, especially in the level of polygyny (number of laying queens per nest) and polydomy (number of nest per colony), both within and between species. We demonstrate here for the first time that Formica lugubris displays two different social forms in close proximity in alpine unmanaged forests of the Swiss National Park. The genetic data (7 microsatellite loci) and field data indicate that one population is mostly monogynous to weakly polygynous (r = 0.438) and monodomous, the second one being polygynous (r = 0.113) and polydomous. Within this latter population new nests are founded by budding, leading to the observed high density of nests. These two different social structures, possibly being two expressions of a same continuum, could be explained by several ecological or environmental factors (e.g. habitat saturation, resource competition) and also historical effects.
Resumo:
BACKGROUND: There is an ever-increasing volume of data on host genes that are modulated during HIV infection, influence disease susceptibility or carry genetic variants that impact HIV infection. We created GuavaH (Genomic Utility for Association and Viral Analyses in HIV, http://www.GuavaH.org), a public resource that supports multipurpose analysis of genome-wide genetic variation and gene expression profile across multiple phenotypes relevant to HIV biology. FINDINGS: We included original data from 8 genome and transcriptome studies addressing viral and host responses in and ex vivo. These studies cover phenotypes such as HIV acquisition, plasma viral load, disease progression, viral replication cycle, latency and viral-host genome interaction. This represents genome-wide association data from more than 4,000 individuals, exome sequencing data from 392 individuals, in vivo transcriptome microarray data from 127 patients/conditions, and 60 sets of RNA-seq data. Additionally, GuavaH allows visualization of protein variation in ~8,000 individuals from the general population. The publicly available GuavaH framework supports queries on (i) unique single nucleotide polymorphism across different HIV related phenotypes, (ii) gene structure and variation, (iii) in vivo gene expression in the setting of human infection (CD4+ T cells), and (iv) in vitro gene expression data in models of permissive infection, latency and reactivation. CONCLUSIONS: The complexity of the analysis of host genetic influences on HIV biology and pathogenesis calls for comprehensive motors of research on curated data. The tool developed here allows queries and supports validation of the rapidly growing body of host genomic information pertinent to HIV research.
Resumo:
Surface geological mapping, laboratory measurements of rock properties, and seismic reflection data are integrated through three-dimensional seismic modeling to determine the likely cause of upper crustal reflections and to elucidate the deep structure of the Penninic Alps in eastern Switzerland. Results indicate that the principal upper crustal reflections recorded on the south end of Swiss seismic line NFP20-EAST can be explained by the subsurface geometry of stacked basement nappes. In addition, modeling results provide improvements to structural maps based solely on surface trends and suggest the presence of previously unrecognized rock units in the subsurface. Construction of the initial model is based upon extrapolation of plunging surface. structures; velocities and densities are established by laboratory measurements of corresponding rock units. Iterative modification produces a best fit model that refines the definition of the subsurface geometry of major structures. We conclude that most reflections from the upper 20 km can be ascribed to the presence of sedimentary cover rocks (especially carbonates) and ophiolites juxtaposed against crystalline basement nappes. Thus, in this area, reflections appear to be principally due to first-order lithologic contrasts. This study also demonstrates not only the importance of three-dimensional effects (sideswipe) in interpreting seismic data, but also that these effects can be considered quantitatively through three-dimensional modeling.
Resumo:
Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending the corresponding approaches to the regional scale represents a major, and as-of-yet largely unresolved, challenge. To address this problem, we have developed an upscaling procedure based on a Bayesian sequential simulation approach. This method is then applied to the stochastic integration of low-resolution, regional-scale electrical resistivity tomography (ERT) data in combination with high-resolution, local-scale downhole measurements of the hydraulic and electrical conductivities. Finally, the overall viability of this upscaling approach is tested and verified by performing and comparing flow and transport simulation through the original and the upscaled hydraulic conductivity fields. Our results indicate that the proposed procedure does indeed allow for obtaining remarkably faithful estimates of the regional-scale hydraulic conductivity structure and correspondingly reliable predictions of the transport characteristics over relatively long distances.
Resumo:
A plant species' genetic population structure is the result of a complex combination of its life history, ecological preferences, position in the ecosystem and historical factors. As a result, many different statistical methods exist that measure different aspects of species' genetic structure. However, little is known about how these methods are interrelated and how they are related to a species' ecology and life history. In this study, we used the IntraBioDiv amplified fragment length polymorphisms data set from 27 high-alpine species to calculate eight genetic summary statistics that we jointly correlate to a set of six ecological and life-history traits. We found that there is a large amount of redundancy among the calculated summary statistics and that there is a significant association with the matrix of species traits. In a multivariate analysis, two main aspects of population structure were visible among the 27 species. The first aspect is related to the species' dispersal capacities and the second is most likely related to the species' postglacial recolonization of the Alps. Furthermore, we found that some summary statistics, most importantly Mantel's r and Jost's D, show different behaviour than expected based on theory. We therefore advise caution in drawing too strong conclusions from these statistics.
Resumo:
Studies of the structural basis of protein thermostability have produced a confusing picture. Small sets of proteins have been analyzed from a variety of thermophilic species, suggesting different structural features as responsible for protein thermostability. Taking advantage of the recent advances in structural genomics, we have compiled a relatively large protein structure dataset, which was constructed very carefully and selectively; that is, the dataset contains only experimentally determined structures of proteins from one specific organism, the hyperthermophilic bacterium Thermotoga maritima, and those of close homologs from mesophilic bacteria. In contrast to the conclusions of previous studies, our analyses show that oligomerization order, hydrogen bonds, and secondary structure play minor roles in adaptation to hyperthermophily in bacteria. On the other hand, the data exhibit very significant increases in the density of salt-bridges and in compactness for proteins from T.maritima. The latter effect can be measured by contact order or solvent accessibility, and network analysis shows a specific increase in highly connected residues in this thermophile. These features account for changes in 96% of the protein pairs studied. Our results provide a clear picture of protein thermostability in one species, and a framework for future studies of thermal adaptation.
Resumo:
Formica lugubris apparaît comme une espèce hautement polycalique dans le Jura suisse et forme des super-colonies. La super-colonie étudiée comprend environ 1200 nids répartis sur 70 hectares. L'étude détaillée de 12 hectares permet de définir 4 types de nids:les nids principaux, secondaires, saisonniers etcommençants, ainsi que trois sortes de voies de communication:les routes de liaisons permanentes visibles sur le terrain, les pistes de liaisons non-permanentes non marquées sur le terrain etles chemins d'approvisionnement permanents marqués dans le terrain. L'auteur présente la phénologie deF. lugubris qui est fortement influencée par le climat de cette région avec une période moyenne d'activité de 150 jours. D'autre part, les premières données sur le régime alimentaire (analyse des proies récoltées par les fourmis) diffèrent considérablement des données connues pour les autres espèces du groupe rufa, notamment par le nombre élevé de pucerons, d'où l'idée d'une régulation des populations de pucerons par les fourmis. Enfin l'auteur aborde le problème de la faible densité de l'avifaune en relation avec les fourmis. Il semble que le climat et les ressources alimentaires conduisent les fourmis àune nouvelle stratégie écologique qui s'exprimerait par la création de super-colonies. Formica lugubris appears as a highly polycalic species in the Swiss Jura and creates super-colonies. The super-colony studied possesses about 1200 nests on about 70 hectares. The detailed study of 12 hectares allows the discrimination of 4 types of nests:the main nests, the secondary nests, the seasonal nests andthe starting nests, as well as 3 types of ant tracks:the constant connection routes visible on the soil, thenon-constant connection tracks not marked on the soil andthe constant foraging routes marked on the soil. The author presents the phenology ofF. lugubris who is strongly influenced by the climate of the region with a mean activity period of about 150 days. On the other hand, the first results about diet (analysis of the preys collected by the ants) differ considerably from the wellknown data for the others species of the rufa group, especially by the high number of aphids, which may be inferred the notion of a regulation of aphids population by the ants. Finally the author approaches the problem of the low density of avifauna in relation to the ants. It seems that climate and food resources lead the ants toa new ecological strategy which would express itself by the creation of super-colonies.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.