30 resultados para Geo-mashups
em Université de Lausanne, Switzerland
Resumo:
Mountain ranges are biodiversity hotspots worldwide and provide refuge to many organisms under contemporary climate change. Gathering field information on mountain biodiversity over time is of primary importance to understand the response of biotic communities to climate changes. For plants, several long-term observation sites and networks of mountain biodiversity are emerging worldwide to gather field data and monitor altitudinal range shifts and community composition changes under contemporary climate change. Most of these monitoring sites, however, focus on alpine ecosystems and mountain summits, such as the global observation research initiative in alpine environments (GLORIA). Here we describe the Alps Vegetation Database, a comprehensive community level archive (GIVD ID EU-00-014) which aims at compiling all available geo-referenced vegetation plots from lowland forests to alpine grasslands across the greatest mountain range in Europe: the Alps. This research initiative was funded between 2008 and 2011 by the Danish Council for Independent Research and was part of a larger project to compare cross-scale plant community structure between the Alps and the Scandes. The Alps Vegetation Database currently harbours 35,731 geo-referenced vegetation plots and 5,023 valid taxa across Mediterranean, temperate and alpine environments. The data are mainly used by the main contributors of the Alps Vegetation Database in an ecoinformatics approach to test hypotheses related to plant macroecology and biogeography, but external proposals for joint collaborations are welcome.
Resumo:
The use of Geographic Information Systems has revolutionalized the handling and the visualization of geo-referenced data and has underlined the critic role of spatial analysis. The usual tools for such a purpose are geostatistics which are widely used in Earth science. Geostatistics are based upon several hypothesis which are not always verified in practice. On the other hand, Artificial Neural Network (ANN) a priori can be used without special assumptions and are known to be flexible. This paper proposes to discuss the application of ANN in the case of the interpolation of a geo-referenced variable.
Resumo:
BACKGROUND: The Nuclear Factor I (NFI) family of DNA binding proteins (also called CCAAT box transcription factors or CTF) is involved in both DNA replication and gene expression regulation. Using chromatin immuno-precipitation and high throughput sequencing (ChIP-Seq), we performed a genome-wide mapping of NFI DNA binding sites in primary mouse embryonic fibroblasts. RESULTS: We found that in vivo and in vitro NFI DNA binding specificities are indistinguishable, as in vivo ChIP-Seq NFI binding sites matched predictions based on previously established position weight matrix models of its in vitro binding specificity. Combining ChIP-Seq with mRNA profiling data, we found that NFI preferentially associates with highly expressed genes that it up-regulates, while binding sites were under-represented at expressed but unregulated genes. Genomic binding also correlated with markers of transcribed genes such as histone modifications H3K4me3 and H3K36me3, even outside of annotated transcribed loci, implying NFI in the control of the deposition of these modifications. Positional correlation between + and - strand ChIP-Seq tags revealed that, in contrast to other transcription factors, NFI associates with a nucleosomal length of cleavage-resistant DNA, suggesting an interaction with positioned nucleosomes. In addition, NFI binding prominently occurred at boundaries displaying discontinuities in histone modifications specific of expressed and silent chromatin, such as loci submitted to parental allele-specific imprinted expression. CONCLUSIONS: Our data thus suggest that NFI nucleosomal interaction may contribute to the partitioning of distinct chromatin domains and to epigenetic gene expression regulation.NFI ChIP-Seq and input control DNA data were deposited at Gene Expression Omnibus (GEO) repository under accession number GSE15844. Gene expression microarray data for mouse embryonic fibroblasts are on GEO accession number GSE15871.
Resumo:
1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.
Resumo:
As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data. Database URL: http://bgee.unil.ch/
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
Forest fire sequences can be modelled as a stochastic point process where events are characterized by their spatial locations and occurrence in time. Cluster analysis permits the detection of the space/time pattern distribution of forest fires. These analyses are useful to assist fire-managers in identifying risk areas, implementing preventive measures and conducting strategies for an efficient distribution of the firefighting resources. This paper aims to identify hot spots in forest fire sequences by means of the space-time scan statistics permutation model (STSSP) and a geographical information system (GIS) for data and results visualization. The scan statistical methodology uses a scanning window, which moves across space and time, detecting local excesses of events in specific areas over a certain period of time. Finally, the statistical significance of each cluster is evaluated through Monte Carlo hypothesis testing. The case study is the forest fires registered by the Forest Service in Canton Ticino (Switzerland) from 1969 to 2008. This dataset consists of geo-referenced single events including the location of the ignition points and additional information. The data were aggregated into three sub-periods (considering important preventive legal dispositions) and two main ignition-causes (lightning and anthropogenic causes). Results revealed that forest fire events in Ticino are mainly clustered in the southern region where most of the population is settled. Our analysis uncovered local hot spots arising from extemporaneous arson activities. Results regarding the naturally-caused fires (lightning fires) disclosed two clusters detected in the northern mountainous area.
Resumo:
We study the effect of civil conflict on social capital, focusing on Uganda's experience during the last decade. Using individual and county-level data, we document large causal effects on trust and ethnic identity of an exogenous outburst of ethnic conflicts in 2002-2005. We exploit two waves of survey data from Afrobarometer (Round 4 Afrobarometer Survey in Uganda, 2000, 2008), including information on socioeconomic characteristics at the individual level, and geo-referenced measures of fighting events from ACLED. Our identification strategy exploits variations in the both the spatial and ethnic intensity of fighting. We find that more intense fighting decreases generalized trust and increases ethnic identity. The effects are quantitatively large and robust to a number of control variables, alternative measures of violence, and different statistical techniques involving ethnic and spatial fixed effects and instrumental variables. Controlling for the intensity of violence during the conflict, we also document that post-conflict economic recovery is slower in ethnically fractionalized counties. Our findings are consistent with the existence of a self-reinforcing process between conflicts and ethnic cleavages.
Resumo:
BACKGROUND: Animal societies are diverse, ranging from small family-based groups to extraordinarily large social networks in which many unrelated individuals interact. At the extreme of this continuum, some ant species form unicolonial populations in which workers and queens can move among multiple interconnected nests without eliciting aggression. Although unicoloniality has been mostly studied in invasive ants, it also occurs in some native non-invasive species. Unicoloniality is commonly associated with very high queen number, which may result in levels of relatedness among nestmates being so low as to raise the question of the maintenance of altruism by kin selection in such systems. However, the actual relatedness among cooperating individuals critically depends on effective dispersal and the ensuing pattern of genetic structuring. In order to better understand the evolution of unicoloniality in native non-invasive ants, we investigated the fine-scale population genetic structure and gene flow in three unicolonial populations of the wood ant F. paralugubris. RESULTS: The analysis of geo-referenced microsatellite genotypes and mitochondrial haplotypes revealed the presence of cryptic clusters of genetically-differentiated nests in the three populations of F. paralugubris. Because of this spatial genetic heterogeneity, members of the same clusters were moderately but significantly related. The comparison of nuclear (microsatellite) and mitochondrial differentiation indicated that effective gene flow was male-biased in all populations. CONCLUSION: The three unicolonial populations exhibited male-biased and mostly local gene flow. The high number of queens per nest, exchanges among neighbouring nests and restricted long-distance gene flow resulted in large clusters of genetically similar nests. The positive relatedness among clustermates suggests that kin selection may still contribute to the maintenance of altruism in unicolonial populations if competition occurs among clusters.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.
Resumo:
BACKGROUND: Body mass index (BMI) may cluster in space among adults and be spatially dependent. Whether BMI clusters among children and how age-specific BMI clusters are related remains unknown. We aimed to identify and compare the spatial dependence of BMI in adults and children in a Swiss general population, taking into account the area's income level. METHODS: Geo-referenced data from the Bus Santé study (adults, n=6663) and Geneva School Health Service (children, n=3601) were used. We implemented global (Moran's I) and local (local indicators of spatial association (LISA)) indices of spatial autocorrelation to investigate the spatial dependence of BMI in adults (35-74 years) and children (6-7 years). Weight and height were measured using standardized procedures. Five spatial autocorrelation classes (LISA clusters) were defined including the high-high BMI class (high BMI participant's BMI value correlated with high BMI-neighbors' mean BMI values). The spatial distributions of clusters were compared between adults and children with and without adjustment for area's income level. RESULTS: In both adults and children, BMI was clearly not distributed at random across the State of Geneva. Both adults' and children's BMIs were associated with the mean BMI of their neighborhood. We found that the clusters of higher BMI in adults and children are located in close, yet different, areas of the state. Significant clusters of high versus low BMIs were clearly identified in both adults and children. Area's income level was associated with children's BMI clusters. CONCLUSIONS: BMI clusters show a specific spatial dependence in adults and children from the general population. Using a fine-scale spatial analytic approach, we identified life course-specific clusters that could guide tailored interventions.
Resumo:
Copy number variants (CNVs) influence the expression of genes that map not only within the rearrangement, but also to its flanks. To assess the possible mechanism(s) underlying this "neighboring effect", we compared intrachromosomal interactions and histone modifications in cell lines of patients affected by genomic disorders and control individuals. Using chromosome conformation capture (4C-seq), we observed that a set of genes flanking the Williams-Beuren Syndrome critical region (WBSCR) were often looping together. The newly identified interacting genes include AUTS2, mutations of which are associated with autism and intellectual disabilities. Deletion of the WBSCR disrupts the expression of this group of flanking genes, as well as long-range interactions between them and the rearranged interval. We also pinpointed concomitant changes in histone modifications between samples. We conclude that large genomic rearrangements can lead to chromatin conformation changes that extend far away from the structural variant, thereby possibly modulating expression globally and modifying the phenotype. GEO SERIES ACCESSION NUMBER: GSE33784, GSE33867.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.