882 resultados para large scale data gathering


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Polistine wasps are important in Neotropical ecosystems due to their ubiquity and diversity. Inventories have not adequately considered spatial attributes of collected specimens. Spatial data on biodiversity are important for study and mitigation of anthropogenic impacts over natural ecosystems and for protecting species. We described and analyzed local-scale spatial patterns of collecting records of wasp species, as well as spatial variation of diversity descriptors in a 2500-hectare area of an Amazon forest in Brazil. Rare species comprised the largest fraction of the fauna. Close range spatial effects were detected for most of the more common species, with clustering of presence-data at short distances. Larger spatial lag effects could also be identified in some species, constituting probably cases of exogenous autocorrelation and candidates for explanations based on environmental factors. In a few cases, significant or near significant correlations were found between five species (of Agelaia, Angiopolybia, and Mischocyttarus) and three studied environmental variables: distance to nearest stream, terrain altitude, and the type of forest canopy. However, association between these factors and biodiversity variables were generally low. When used as predictors of polistine richness in a linear multiple regression, only the coefficient for the forest canopy variable resulted significant. Some level of prediction of wasp diversity variables can be attained based on environmental variables, especially vegetation structure. Large-scale landscape and regional studies should be scheduled to address this issue.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A recurring task in the analysis of mass genome annotation data from high-throughput technologies is the identification of peaks or clusters in a noisy signal profile. Examples of such applications are the definition of promoters on the basis of transcription start site profiles, the mapping of transcription factor binding sites based on ChIP-chip data and the identification of quantitative trait loci (QTL) from whole genome SNP profiles. Input to such an analysis is a set of genome coordinates associated with counts or intensities. The output consists of a discrete number of peaks with respective volumes, extensions and center positions. We have developed for this purpose a flexible one-dimensional clustering tool, called MADAP, which we make available as a web server and as standalone program. A set of parameters enables the user to customize the procedure to a specific problem. The web server, which returns results in textual and graphical form, is useful for small to medium-scale applications, as well as for evaluation and parameter tuning in view of large-scale applications, requiring a local installation. The program written in C++ can be freely downloaded from ftp://ftp.epd.unil.ch/pub/software/unix/madap. The MADAP web server can be accessed at http://www.isrec.isb-sib.ch/madap/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Microstructure imaging from diffusion magnetic resonance (MR) data represents an invaluable tool to study non-invasively the morphology of tissues and to provide a biological insight into their microstructural organization. In recent years, a variety of biophysical models have been proposed to associate particular patterns observed in the measured signal with specific microstructural properties of the neuronal tissue, such as axon diameter and fiber density. Despite very appealing results showing that the estimated microstructure indices agree very well with histological examinations, existing techniques require computationally very expensive non-linear procedures to fit the models to the data which, in practice, demand the use of powerful computer clusters for large-scale applications. In this work, we present a general framework for Accelerated Microstructure Imaging via Convex Optimization (AMICO) and show how to re-formulate this class of techniques as convenient linear systems which, then, can be efficiently solved using very fast algorithms. We demonstrate this linearization of the fitting problem for two specific models, i.e. ActiveAx and NODDI, providing a very attractive alternative for parameter estimation in those techniques; however, the AMICO framework is general and flexible enough to work also for the wider space of microstructure imaging methods. Results demonstrate that AMICO represents an effective means to accelerate the fit of existing techniques drastically (up to four orders of magnitude faster) while preserving accuracy and precision in the estimated model parameters (correlation above 0.9). We believe that the availability of such ultrafast algorithms will help to accelerate the spread of microstructure imaging to larger cohorts of patients and to study a wider spectrum of neurological disorders.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Expression data contribute significantly to the biological value of the sequenced human genome, providing extensive information about gene structure and the pattern of gene expression. ESTs, together with SAGE libraries and microarray experiment information, provide a broad and rich view of the transcriptome. However, it is difficult to perform large-scale expression mining of the data generated by these diverse experimental approaches. Not only is the data stored in disparate locations, but there is frequent ambiguity in the meaning of terms used to describe the source of the material used in the experiment. Untangling semantic differences between the data provided by different resources is therefore largely reliant on the domain knowledge of a human expert. We present here eVOC, a system which associates labelled target cDNAs for microarray experiments, or cDNA libraries and their associated transcripts with controlled terms in a set of hierarchical vocabularies. eVOC consists of four orthogonal controlled vocabularies suitable for describing the domains of human gene expression data including Anatomical System, Cell Type, Pathology and Developmental Stage. We have curated and annotated 7016 cDNA libraries represented in dbEST, as well as 104 SAGE libraries,with expression information,and provide this as an integrated, public resource that allows the linking of transcripts and libraries with expression terms. Both the vocabularies and the vocabulary-annotated libraries can be retrieved from http://www.sanbi.ac.za/evoc/. Several groups are involved in developing this resource with the aim of unifying transcript expression information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Simulated-annealing-based conditional simulations provide a flexible means of quantitatively integrating diverse types of subsurface data. Although such techniques are being increasingly used in hydrocarbon reservoir characterization studies, their potential in environmental, engineering and hydrological investigations is still largely unexploited. Here, we introduce a novel simulated annealing (SA) algorithm geared towards the integration of high-resolution geophysical and hydrological data which, compared to more conventional approaches, provides significant advancements in the way that large-scale structural information in the geophysical data is accounted for. Model perturbations in the annealing procedure are made by drawing from a probability distribution for the target parameter conditioned to the geophysical data. This is the only place where geophysical information is utilized in our algorithm, which is in marked contrast to other approaches where model perturbations are made through the swapping of values in the simulation grid and agreement with soft data is enforced through a correlation coefficient constraint. Another major feature of our algorithm is the way in which available geostatistical information is utilized. Instead of constraining realizations to match a parametric target covariance model over a wide range of spatial lags, we constrain the realizations only at smaller lags where the available geophysical data cannot provide enough information. Thus we allow the larger-scale subsurface features resolved by the geophysical data to have much more due control on the output realizations. Further, since the only component of the SA objective function required in our approach is a covariance constraint at small lags, our method has improved convergence and computational efficiency over more traditional methods. Here, we present the results of applying our algorithm to the integration of porosity log and tomographic crosshole georadar data to generate stochastic realizations of the local-scale porosity structure. Our procedure is first tested on a synthetic data set, and then applied to data collected at the Boise Hydrogeophysical Research Site.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rural intersections account for 30% of crashes in rural areas and 6% of all fatal crashes, representing a significant but poorly understood safety problem. Transportation agencies have traditionally implemented countermeasures to address rural intersection crashes but frequently do not understand the dynamic interaction between the driver and roadway and the driver factors leading to these types of crashes. The Second Strategic Highway Research Program (SHRP 2) conducted a large-scale naturalistic driving study (NDS) using instrumented vehicles. The study has provided a significant amount of on-road driving data for a range of drivers. The present study utilizes the SHRP 2 NDS data as well as SHRP 2 Roadway Information Database (RID) data to observe driver behavior at rural intersections first hand using video, vehicle kinematics, and roadway data to determine how roadway, driver, environmental, and vehicle factors interact to affect driver safety at rural intersections. A model of driver braking behavior was developed using a dataset of vehicle activity traces for several rural stop-controlled intersections. The model was developed using the point at which a driver reacts to the upcoming intersection by initiating braking as its dependent variable, with the driver’s age, type and direction of turning movement, and countermeasure presence as independent variables. Countermeasures such as on-pavement signing and overhead flashing beacons were found to increase the braking point distance, a finding that provides insight into the countermeasures’ effect on safety at rural intersections. The results of this model can lead to better roadway design, more informed selection of traffic control and countermeasures, and targeted information that can inform policy decisions. Additionally, a model of gap acceptance was attempted but was ultimately not developed due to the small size of the dataset. However, a protocol for data reduction for a gap acceptance model was determined. This protocol can be utilized in future studies to develop a gap acceptance model that would provide additional insight into the roadway, vehicle, environmental, and driver factors that play a role in whether a driver accepts or rejects a gap.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Context. The understanding of Galaxy evolution can be facilitated by the use of population synthesis models, which allow to test hypotheses on the star formation history, star evolution, as well as chemical and dynamical evolution of the Galaxy. Aims. The new version of the Besanc¸on Galaxy Model (hereafter BGM) aims to provide a more flexible and powerful tool to investigate the Initial Mass Function (IMF) and Star Formation Rate (SFR) of the Galactic disc. Methods. We present a new strategy for the generation of thin disc stars which assumes the IMF, SFR and evolutionary tracks as free parameters. We have updated most of the ingredients for the star count production and, for the first time, binary stars are generated in a consistent way. We keep in this new scheme the local dynamical self-consistency as in Bienayme et al (1987). We then compare simulations from the new model with Tycho-2 data and the local luminosity function, as a first test to verify and constrain the new ingredients. The effects of changing thirteen different ingredients of the model are systematically studied. Results. For the first time, a full sky comparison is performed between BGM and data. This strategy allows to constrain the IMF slope at high masses which is found to be close to 3.0, excluding a shallower slope such as Salpeter"s one. The SFR is found decreasing whatever IMF is assumed. The model is compatible with a local dark matter density of 0.011 M pc−3 implying that there is no compelling evidence for significant amount of dark matter in the disc. While the model is fitted to Tycho2 data, a magnitude limited sample with V<11, we check that it is still consistent with fainter stars. Conclusions. The new model constitutes a new basis for further comparisons with large scale surveys and is being prepared to become a powerful tool for the analysis of the Gaia mission data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Résumé: L'automatisation du séquençage et de l'annotation des génomes, ainsi que l'application à large échelle de méthodes de mesure de l'expression génique, génèrent une quantité phénoménale de données pour des organismes modèles tels que l'homme ou la souris. Dans ce déluge de données, il devient très difficile d'obtenir des informations spécifiques à un organisme ou à un gène, et une telle recherche aboutit fréquemment à des réponses fragmentées, voir incomplètes. La création d'une base de données capable de gérer et d'intégrer aussi bien les données génomiques que les données transcriptomiques peut grandement améliorer la vitesse de recherche ainsi que la qualité des résultats obtenus, en permettant une comparaison directe de mesures d'expression des gènes provenant d'expériences réalisées grâce à des techniques différentes. L'objectif principal de ce projet, appelé CleanEx, est de fournir un accès direct aux données d'expression publiques par le biais de noms de gènes officiels, et de représenter des données d'expression produites selon des protocoles différents de manière à faciliter une analyse générale et une comparaison entre plusieurs jeux de données. Une mise à jour cohérente et régulière de la nomenclature des gènes est assurée en associant chaque expérience d'expression de gène à un identificateur permanent de la séquence-cible, donnant une description physique de la population d'ARN visée par l'expérience. Ces identificateurs sont ensuite associés à intervalles réguliers aux catalogues, en constante évolution, des gènes d'organismes modèles. Cette procédure automatique de traçage se fonde en partie sur des ressources externes d'information génomique, telles que UniGene et RefSeq. La partie centrale de CleanEx consiste en un index de gènes établi de manière hebdomadaire et qui contient les liens à toutes les données publiques d'expression déjà incorporées au système. En outre, la base de données des séquences-cible fournit un lien sur le gène correspondant ainsi qu'un contrôle de qualité de ce lien pour différents types de ressources expérimentales, telles que des clones ou des sondes Affymetrix. Le système de recherche en ligne de CleanEx offre un accès aux entrées individuelles ainsi qu'à des outils d'analyse croisée de jeux de donnnées. Ces outils se sont avérés très efficaces dans le cadre de la comparaison de l'expression de gènes, ainsi que, dans une certaine mesure, dans la détection d'une variation de cette expression liée au phénomène d'épissage alternatif. Les fichiers et les outils de CleanEx sont accessibles en ligne (http://www.cleanex.isb-sib.ch/). Abstract: The automatic genome sequencing and annotation, as well as the large-scale gene expression measurements methods, generate a massive amount of data for model organisms. Searching for genespecific or organism-specific information througout all the different databases has become a very difficult task, and often results in fragmented and unrelated answers. The generation of a database which will federate and integrate genomic and transcriptomic data together will greatly improve the search speed as well as the quality of the results by allowing a direct comparison of expression results obtained by different techniques. The main goal of this project, called the CleanEx database, is thus to provide access to public gene expression data via unique gene names and to represent heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and crossdataset comparisons. A consistent and uptodate gene nomenclature is achieved by associating each single gene expression experiment with a permanent target identifier consisting of a physical description of the targeted RNA population or the hybridization reagent used. These targets are then mapped at regular intervals to the growing and evolving catalogues of genes from model organisms, such as human and mouse. The completely automatic mapping procedure relies partly on external genome information resources such as UniGene and RefSeq. The central part of CleanEx is a weekly built gene index containing crossreferences to all public expression data already incorporated into the system. In addition, the expression target database of CleanEx provides gene mapping and quality control information for various types of experimental resources, such as cDNA clones or Affymetrix probe sets. The Affymetrix mapping files are accessible as text files, for further use in external applications, and as individual entries, via the webbased interfaces . The CleanEx webbased query interfaces offer access to individual entries via text string searches or quantitative expression criteria, as well as crossdataset analysis tools, and crosschip gene comparison. These tools have proven to be very efficient in expression data comparison and even, to a certain extent, in detection of differentially expressed splice variants. The CleanEx flat files and tools are available online at: http://www.cleanex.isbsib. ch/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To date, published studies of alluvial bar architecture in large rivers have been restricted mostly to case studies of individual bars and single locations. Relatively little is known about how the depositional processes and sedimentary architecture of kilometre-scale bars vary within a multi-kilometre reach or over several hundreds of kilometres downstream. This study presents Ground Penetrating Radar and core data from 11, kilometre-scale bars from the Rio Parana, Argentina. The investigated bars are located between 30km upstream and 540km downstream of the Rio Parana - Rio Paraguay confluence, where a significant volume of fine-grained suspended sediment is introduced into the network. Bar-scale cross-stratified sets, with lengths and widths up to 600m and thicknesses up to 12m, enable the distinction of large river deposits from stacked deposits of smaller rivers, but are only present in half the surface area of the bars. Up to 90% of bar-scale sets are found on top of finer-grained ripple-laminated bar-trough deposits. Bar-scale sets make up as much as 58% of the volume of the deposits in small, incipient mid-channel bars, but this proportion decreases significantly with increasing age and size of the bars. Contrary to what might be expected, a significant proportion of the sedimentary structures found in the Rio Parana is similar in scale to those found in much smaller rivers. In other words, large river deposits are not always characterized by big structures that allow a simple interpretation of river scale. However, the large scale of the depositional units in big rivers causes small-scale structures, such as ripple sets, to be grouped into thicker cosets, which indicate river scale even when no obvious large-scale sets are present. The results also show that the composition of bars differs between the studied reaches upstream and downstream of the confluence with the Rio Paraguay. Relative to other controls on downstream fining, the tributary input of fine-grained suspended material from the Rio Paraguay causes a marked change in the composition of the bar deposits. Compared to the upstream reaches, the sedimentary architecture of the downstream reaches in the top ca 5m of mid-channel bars shows: (i) an increase in the abundance and thickness (up to metre-scale) of laterally extensive (hundreds of metres) fine-grained layers; (ii) an increase in the percentage of deposits comprised of ripple sets (to >40% in the upper bar deposits); and (iii) an increase in bar-trough deposits and a corresponding decrease in bar-scale cross-strata (<10%). The thalweg deposits of the Rio Parana are composed of dune sets, even directly downstream from the Rio Paraguay where the upper channel deposits are dominantly fine-grained. Thus, the change in sedimentary facies due to a tributary point-source of fine-grained sediment is primarily expressed in the composition of the upper bar deposits.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Context. The understanding of Galaxy evolution can be facilitated by the use of population synthesis models, which allow to test hypotheses on the star formation history, star evolution, as well as chemical and dynamical evolution of the Galaxy. Aims. The new version of the Besanc¸on Galaxy Model (hereafter BGM) aims to provide a more flexible and powerful tool to investigate the Initial Mass Function (IMF) and Star Formation Rate (SFR) of the Galactic disc. Methods. We present a new strategy for the generation of thin disc stars which assumes the IMF, SFR and evolutionary tracks as free parameters. We have updated most of the ingredients for the star count production and, for the first time, binary stars are generated in a consistent way. We keep in this new scheme the local dynamical self-consistency as in Bienayme et al (1987). We then compare simulations from the new model with Tycho-2 data and the local luminosity function, as a first test to verify and constrain the new ingredients. The effects of changing thirteen different ingredients of the model are systematically studied. Results. For the first time, a full sky comparison is performed between BGM and data. This strategy allows to constrain the IMF slope at high masses which is found to be close to 3.0, excluding a shallower slope such as Salpeter"s one. The SFR is found decreasing whatever IMF is assumed. The model is compatible with a local dark matter density of 0.011 M pc−3 implying that there is no compelling evidence for significant amount of dark matter in the disc. While the model is fitted to Tycho2 data, a magnitude limited sample with V<11, we check that it is still consistent with fainter stars. Conclusions. The new model constitutes a new basis for further comparisons with large scale surveys and is being prepared to become a powerful tool for the analysis of the Gaia mission data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On a geological time scale the conditions on earth are very variable and biological patterns (for example the distributions of species) are very dynamic. Understanding large scale patterns of variation observed today thus requires a deep understanding of the historical factors that drove their evolution. In this thesis, we reevaluated the evolution and maintenance of a continental color cline observed in the European barn owl (Tyto alba) using population genetic tools. The colour cline spans from south-est Europe where most individual have pure white underparts to north and east Europe where most individuals have rufous-brown underparts. Our results globally showed that the old scenario, stipulating that the color cline evolved by secondary contact of two color morphs (white and rufous) that evolved in allopatry during the last ice age has to be revised. We collected samples of about 700 barn owls from the Western Palearctic to establish the first population genetic data set for this species. Individuals were genotyped at 22 microsatellites markers, at one mitochondrial gene, and at a candidate color gene. The color of each individuals was assessed and their sex determined by molecular methods. We first showed that the genetic variation in Western Europe is very limited compared to the heritable color variation. We found no evidences of different glacial lineages, and showed that selection must be involved in the maintenance of the color cline (chapter 1). Using computer simulations, we demonstrated that the post-glacial colonization of Europe occurred from the Iberian Peninsula and that the color cline could not have evolved by neutral demographic processes during this colonization (chapter 2). Finally we reevaluated the whole history of the establishment of the Western Palearctic variation of the barn owl (chapter 3): This study showed that all Western European barn owls descend from white barn owls phenotypes from the Middle East that colonized the Iberian Peninsula via North-Africa. Following the end of the last ice age (20'000 years ago), these white barn owls colonized Western Europe and under selection a novel rufous phenotype evolved (during or after the colonization). An important part of the color variation could be explained by a single mutation in the melanocortin-1-receptor (MC1R) gene that appeared during or after the colonization. The colonization of Europe reached until Greece, where the rufous birds encountered white ones (which reached Greece from the Middle East over the Bosporus) in a secondary contact zone. Our analyses show that white and rufous barn owls in Greece interbreed only to a limited extent. This suggests that barn owls are at the verge of becoming two species in Greece and demonstrates that European barn owls represent an incipient ring species around the Mediterranean. The revisited history of the establishment of the European barn owl color cline makes this model system remarkable for several aspects. It is a very clear example of strong local adaptation that can be achieved despite high gene flow (strong color and MC1R differentiation despite almost no neutral genetic differentiation). It also offers a wonderful model system to study the interactions between colonization processes and selection processes which have, for now, been remarkably understudied despite their potentially ubiquitous importance. Finally it represents a very interesting case in the speciation continuum and appeals for further studying the amount of gene flow that occurs between the color morphs in Greece. -- Sur l'échelle des temps géologiques, les conditions sur terre sont très variables et les patrons biologiques (telle que la distribution des espèces) sont très dynamiques. Si l'on veut comprendre des patrons que l'on peut observer à large échelle aujourd'hui, il est nécessaire de d'abord comprendre les facteurs historiques qui ont gouverné leur établissement. Dans cette thèse, nous allons réévaluer, grâce à des outils modernes de génétique des populations, l'évolution et la maintenance d'un cline de couleur continental observé chez l'effraie des clochers européenne (Tyto alba). Globalement, nos résultats montrent que le scenario accepté jusqu'à maintenant, qui stipule que le cline de couleur a évolué à partir du contact secondaire de deux morphes de couleur (blanches et rousses) ayant évolué en allopatrie durant les dernières glaciations, est à revoir. Afin de constituer le premier jeu de données de génétique des populations pour cette espèce, nous avons récolté des échantillons d'environ 700 effraies de l'ouest Paléarctique. Nous avons génotypé tous les individus à 22 loci microsatellites, sur un gène mitochondrial et sur un autre gène participant au déterminisme de la couleur. Nous avons aussi mesuré la couleur de tous les individus et déterminé leur sexe génétiquement. Nous avons tout d'abord pu montrer que la variation génétique neutre est négligeable en comparaison avec la variation héritable de couleur, qu'il n'existe qu'une seule lignée européenne et que de la sélection doit être impliquée dans le maintien du cline de couleur (chapitre 1). Grâce à des simulations informatiques, nous avons démontré que l'ensemble de l'Europe de l'ouest a été recolonisé depuis la Péninsule Ibérique après les dernières glaciations et que le cline de couleur ne peut pas avoir évolué par des processus neutre durant cette colonisation (chapitre 2). Finalement, nous avons réévalué l'ensemble de l'histoire postglaciaire de l'espèce dans l'ouest Paléarctique (chapitre 3): l'ensemble des effraies du Paléarctique descendent d'effraie claire du Moyen-Orient qui ont colonisé la péninsule ibérique en passant par l'Afrique du nord. Après la fin de la dernière glaciation (il y a 20'000 ans), ces effraies claires ont colonisé l'Europe de l'ouest et ont évolués par sélection le phénotype roux (durant ou après la colonisation). Une part importante de la variation de couleur peut être expliquée par une mutation sur le gène MC1R qui est apparue durant ou juste après la colonisation. Cette vague de colonisation s'est poursuivie jusqu'en Grèce où ces effraies rousses ont rencontré dans une zone de contact secondaire des effraies claires (qui sont remontées en Grèce depuis le Moyen-Orient via le Bosphore). Nos analyses montrent que le flux de gènes entre effraies blanches et rousses est limité en Grèce, ce qui suggère qu'elles sont en passe de former deux espèces et ce qui montre que les effraies constituent un exemple naissant de spéciation en anneaux autour de la Méditerranée. L'histoire revisitée des effraies des clochers de l'ouest Paléarctique en fait un système modèle remarquable pour plusieurs aspects. C'est un exemple très claire de forte adaptation locale maintenue malgré un fort flux de gènes (différenciation forte de couleur et sur le gène MC1R malgré presque aucune structure neutre). Il offre également un très bon système pour étudier l'interaction entre colonisation et sélection, un thème ayant été remarquablement peu étudié malgré son importance. Et il offre finalement un cas très intéressant dans le « continuum de spéciation » et il serait très intéressant d'étudier plus en détail l'importance du flux de gènes entre les morphes de couleur en Grèce.