93 resultados para Large-scale Analysis
Resumo:
Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available.
Resumo:
Brachial circumference (BC), also known as upper arm or mid arm circumference, can be used as an indicator of muscle mass and fat tissue, which are distributed differently in men and women. Analysis of anthropometric measures of peripheral fat distribution such as BC could help in understanding the complex pathophysiology behind overweight and obesity. The purpose of this study is to identify genetic variants associated with BC through a large-scale genome-wide association scan (GWAS) meta-analysis. We used fixed-effects meta-analysis to synthesise summary results across 14 GWAS discovery and 4 replication cohorts comprising overall 22,376 individuals (12,031 women and 10,345 men) of European ancestry. Individual analyses were carried out for men, women, and combined across sexes using linear regression and an additive genetic model: adjusted for age and adjusted for age and BMI. We prioritised signals for follow-up in two-stages. We did not detect any signals reaching genome-wide significance. The FTO rs9939609 SNP showed nominal evidence for association (p<0.05) in the age-adjusted strata for men and across both sexes. In this first GWAS meta-analysis for BC to date, we have not identified any genome-wide significant signals and do not observe robust association of previously established obesity loci with BC. Large-scale collaborations will be necessary to achieve higher power to detect loci underlying BC.
Resumo:
ABSTRACT: BACKGROUND: The degree of conservation of gene expression between homologous organs largely remains an open question. Several recent studies reported some evidence in favor of such conservation. Most studies compute organs' similarity across all orthologous genes, whereas the expression level of many genes are not informative about organ specificity. RESULTS: Here, we use a modularization algorithm to overcome this limitation through the identification of inter-species co-modules of organs and genes. We identify such co-modules using mouse and human microarray expression data. They are functionally coherent both in terms of genes and of organs from both organisms. We show that a large proportion of genes belonging to the same co-module are orthologous between mouse and human. Moreover, their zebrafish orthologs also tend to be expressed in the corresponding homologous organs. Notable exceptions to the general pattern of conservation are the testis and the olfactory bulb. Interestingly, some co-modules consist of single organs, while others combine several functionally related organs. For instance, amygdala, cerebral cortex, hypothalamus and spinal cord form a clearly discernible unit of expression, both in mouse and human. CONCLUSIONS: Our study provides a new framework for comparative analysis which will be applicable also to other sets of large-scale phenotypic data collected across different species.
Resumo:
In this study, we report the first ever large-scale environmental validation of a microbial reporter-based test to measure arsenic concentrations in natural water resources. A bioluminescence-producing arsenic-inducible bacterium based on Escherichia coli was used as the reporter organism. Specific protocols were developed with the goal to avoid the negative influence of iron in groundwater on arsenic availability to the bioreporter cells. A total of 194 groundwater samples were collected in the Red River and Mekong River Delta regions of Vietnam and were analyzed both by atomic absorption spectroscopy (AAS) and by the arsenic bioreporter protocol. The bacterial cells performed well at and above arsenic concentrations in groundwater of 7 microg/L, with an almost linearly proportional increase of the bioluminescence signal between 10 and 100 microg As/L (r2 = 0.997). Comparisons between AAS and arsenic bioreporter determinations gave an overall average of 8.0% false negative and 2.4% false positive identifications for the bioreporter prediction at the WHO recommended acceptable arsenic concentration of 10 microg/L, which is far betterthan the performance of chemical field test kits. Because of the ease of the measurement protocol and the low application cost, the microbiological arsenic test has a great potential in large screening campaigns in Asia and in other areas suffering from arsenic pollution in groundwater resources.
Resumo:
Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.
Resumo:
Based on the case of reforms aimed at integrating the provision of income protection and employment services for jobless people in Europe, this thesis seeks to understand the reasons which may prompt governments to engage in large-scale organisational reforms. Over the last 20 years, several European countries have indeed radically redesigned the organisational structure of their welfare state by merging or bundling existing front-line offices in charge of benefit payment and employment services together into 'one-stop' agencies. Whereas in academic and political debates, these reforms are generally presented as a necessary and rational response to the problems and inconsistencies induced by fragmentation in a context of the reorientation of welfare states towards labour market activation, this thesis shows that the agenda setting of these reforms is in fact the result of multidimensional political dynamics. More specifically, the main argument of this thesis is that these reforms are best understood not so such from the problems induced by organisational compartmentalism, whose political recognition is often controversial, but from the various goals that governments may simultaneously achieve by means of their adoption. This argument is tested by comparing agenda-setting processes of large-scale reforms of coordination in the United Kingdom (Jobcentre Plus), Germany (Hartz IV reform) and Denmark (2005 Jobcentre reform), and contrasting them with the Swiss case where the government has so far rejected any coordination initiative involving organisational redesign. This comparison brings to light the importance, for the rise of organisational reforms, of the possibility to couple them with the following three goals: first, goals related to the strengthening of activation policies; second, institutional goals seeking to redefine the balance of responsibilities between the central state and non-state actors, and finally electoral goals for governments eager to maintain political credibility. The decisive role of electoral goals in the three countries suggests that these reforms are less bound by partisan politics than by the particular pressures facing governments arrived in office after long periods in opposition.
Resumo:
During my PhD, my aim was to provide new tools to increase our capacity to analyse gene expression patterns, and to study on a large-scale basis the evolution of gene expression in animals. Gene expression patterns (when and where a gene is expressed) are a key feature in understanding gene function, notably in development. It appears clear now that the evolution of developmental processes and of phenotypes is shaped both by evolution at the coding sequence level, and at the gene expression level.Studying gene expression evolution in animals, with complex expression patterns over tissues and developmental time, is still challenging. No tools are available to routinely compare expression patterns between different species, with precision, and on a large-scale basis. Studies on gene expression evolution are therefore performed only on small genes datasets, or using imprecise descriptions of expression patterns.The aim of my PhD was thus to develop and use novel bioinformatics resources, to study the evolution of gene expression. To this end, I developed the database Bgee (Base for Gene Expression Evolution). The approach of Bgee is to transform heterogeneous expression data (ESTs, microarrays, and in-situ hybridizations) into present/absent calls, and to annotate them to standard representations of anatomy and development of different species (anatomical ontologies). An extensive mapping between anatomies of species is then developed based on hypothesis of homology. These precise annotations to anatomies, and this extensive mapping between species, are the major assets of Bgee, and have required the involvement of many co-workers over the years. My main personal contribution is the development and the management of both the Bgee database and the web-application.Bgee is now on its ninth release, and includes an important gene expression dataset for 5 species (human, mouse, drosophila, zebrafish, Xenopus), with the most data from mouse, human and zebrafish. Using these three species, I have conducted an analysis of gene expression evolution after duplication in vertebrates.Gene duplication is thought to be a major source of novelty in evolution, and to participate to speciation. It has been suggested that the evolution of gene expression patterns might participate in the retention of duplicate genes. I performed a large-scale comparison of expression patterns of hundreds of duplicated genes to their singleton ortholog in an outgroup, including both small and large-scale duplicates, in three vertebrate species (human, mouse and zebrafish), and using highly accurate descriptions of expression patterns. My results showed unexpectedly high rates of de novo acquisition of expression domains after duplication (neofunctionalization), at least as high or higher than rates of partitioning of expression domains (subfunctionalization). I found differences in the evolution of expression of small- and large-scale duplicates, with small-scale duplicates more prone to neofunctionalization. Duplicates with neofunctionalization seemed to evolve under more relaxed selective pressure on the coding sequence. Finally, even with abundant and precise expression data, the majority fate I recovered was neither neo- nor subfunctionalization of expression domains, suggesting a major role for other mechanisms in duplicate gene retention.
Resumo:
The focus of my PhD research was the concept of modularity. In the last 15 years, modularity has become a classic term in different fields of biology. On the conceptual level, a module is a set of interacting elements that remain mostly independent from the elements outside of the module. I used modular analysis techniques to study gene expression evolution in vertebrates. In particular, I identified ``natural'' modules of gene expression in mouse and human, and I showed that expression of organ-specific and system-specific genes tends to be conserved between such distance vertebrates as mammals and fishes. Also with a modular approach, I studied patterns of developmental constraints on transcriptome evolution. I showed that none of the two commonly accepted models of the evolution of embryonic development (``evo-devo'') are exclusively valid. In particular, I found that the conservation of the sequences of regulatory regions is highest during mid-development of zebrafish, and thus it supports the ``hourglass model''. In contrast, events of gene duplication and new gene introduction are most rare in early development, which supports the ``early conservation model''. In addition to the biological insights on transcriptome evolution, I have also discussed in detail the advantages of modular approaches in large-scale data analysis. Moreover, I re-analyzed several studies (published in high-ranking journals), and showed that their conclusions do not hold out under a detailed analysis. This demonstrates that complex analysis of high-throughput data requires a co-operation between biologists, bioinformaticians, and statisticians.
Resumo:
PURPOSE: The aim of this study was to develop models based on kernel regression and probability estimation in order to predict and map IRC in Switzerland by taking into account all of the following: architectural factors, spatial relationships between the measurements, as well as geological information. METHODS: We looked at about 240,000 IRC measurements carried out in about 150,000 houses. As predictor variables we included: building type, foundation type, year of construction, detector type, geographical coordinates, altitude, temperature and lithology into the kernel estimation models. We developed predictive maps as well as a map of the local probability to exceed 300 Bq/m(3). Additionally, we developed a map of a confidence index in order to estimate the reliability of the probability map. RESULTS: Our models were able to explain 28% of the variations of IRC data. All variables added information to the model. The model estimation revealed a bandwidth for each variable, making it possible to characterize the influence of each variable on the IRC estimation. Furthermore, we assessed the mapping characteristics of kernel estimation overall as well as by municipality. Overall, our model reproduces spatial IRC patterns which were already obtained earlier. On the municipal level, we could show that our model accounts well for IRC trends within municipal boundaries. Finally, we found that different building characteristics result in different IRC maps. Maps corresponding to detached houses with concrete foundations indicate systematically smaller IRC than maps corresponding to farms with earth foundation. CONCLUSIONS: IRC mapping based on kernel estimation is a powerful tool to predict and analyze IRC on a large-scale as well as on a local level. This approach enables to develop tailor-made maps for different architectural elements and measurement conditions and to account at the same time for geological information and spatial relations between IRC measurements.
Resumo:
Background Biological rhythmicity has been extensively studied in animals for many decades. Although temporal patterns of physical activity have been identified in humans, no large-scale, multi-national study has been published, and no comparison has been attempted of the ubiquity of activity rhythms at different time scales (such as daily, weekly, monthly, and annual). Methods Using individually worn actigraphy devices, physical activity of 2,328 individuals from five different countries (adults of African descent from Ghana, South Africa, Jamaica, Seychelles, and the United States) was measured for seven consecutive days at different times of the year. Results Analysis for rhythmic patterns identified daily rhythmicity of physical activity in all five of the represented nationalities. Weekly rhythmicity was found in some, but not all, of the nationalities. No significant evidence of lunar rhythmicity or seasonal rhythmicity was found in any of the groups. Conclusions These findings extend previous small-scale observations of daily rhythmicity to a large cohort of individuals from around the world. The findings also confirm the existence of modest weekly rhythmicity but not lunar or seasonal rhythmicity in human activity. These differences in rhythm strength have implications for the management of health hazards of rhythm misalignment. Key Messages Analysis of the pattern of physical activity of 2,328 individuals from five countries revealed strong daily rhythmicity in all five countries, moderate weekly rhythmicity in some countries, and no lunar rhythmicity or seasonal rhythmicity in any of the countries.
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.
Resumo:
T-cell vaccination may prevent or treat cancer and infectious diseases, but further progress is required to increase clinical efficacy. Step-by-step improvements of T-cell vaccination in phase I/II clinical studies combined with very detailed analysis of T-cell responses at the single cell level are the strategy of choice for the identification of the most promising vaccine candidates for testing in subsequent large-scale phase III clinical trials. Major aims are to fully identify the most efficient T-cells in anticancer therapy, to characterize their TCRs, and to pinpoint the mechanisms of T-cell recruitment and function in well-defined clinical situations. Here we discuss novel strategies for the assessment of human T-cell responses, revealing in part unprecedented insight into T-cell biology and novel structural principles that govern TCR-pMHC recognition. Together, the described approaches advance our knowledge of T-cell mediated-protection from human diseases.
Resumo:
Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics.
Resumo:
1. Landscape modification is often considered the principal cause of population decline in many bat species. Thus, schemes for bat conservation rely heavily on knowledge about species-landscape relationships. So far, however, few studies have quantified the possible influence of landscape structure on large-scale spatial patterns in bat communities. 2. This study presents quantitative models that use landscape structure to predict (i) spatial patterns in overall community composition and (ii) individual species' distributions through canonical correspondence analysis and generalized linear models, respectively. A geographical information system (GIS) was then used to draw up maps of (i) overall community patterns and (ii) distribution of potential species' habitats. These models relied on field data from the Swiss Jura mountains. 3. Fight descriptors of landscape structure accounted for 30% of the variation in bat community composition. For some species, more than 60% of the variance in distribution could be explained by landscape structure. Elevation, forest or woodland cover, lakes and suburbs, were the most frequent predictors. 4. This study shows that community composition in bats is related to landscape structure through species-specific relationships to resources. Due to their nocturnal activities and the difficulties of remote identification, a comprehensive bat census is rarely possible, and we suggest that predictive modelling of the type described here provides an indispensable conservation tool.