881 resultados para Large-scale analysis
Resumo:
MOTIVATION: Supporting the functionality of recent duplicate gene copies is usually difficult, owing to high sequence similarity between duplicate counterparts and shallow phylogenies, which hamper both the statistical and experimental inference. RESULTS: We developed an integrated evolutionary approach to identify functional duplicate gene copies and other lineage-specific genes. By repeatedly simulating neutral evolution, our method estimates the probability that an ORF was selectively conserved and is therefore likely to represent a bona fide coding region. In parallel, our method tests whether the accumulation of non-synonymous substitutions reveals signatures of selective constraint. We show that our approach has high power to identify functional lineage-specific genes using simulated and real data. For example, a coding region of average length (approximately 1400 bp), restricted to hominoids, can be predicted to be functional in approximately 94-100% of cases. Notably, the method may support functionality for instances where classical selection tests based on the ratio of non-synonymous to synonymous substitutions fail to reveal signatures of selection. Our method is available as an automated tool, ReEVOLVER, which will also be useful to systematically detect functional lineage-specific genes of closely related species on a large scale. AVAILABILITY: ReEVOLVER is available at http://www.unil.ch/cig/page7858.html.
Resumo:
Raised blood pressure (BP) is a major risk factor for cardiovascular disease. Previous studies have identified 47 distinct genetic variants robustly associated with BP, but collectively these explain only a few percent of the heritability for BP phenotypes. To find additional BP loci, we used a bespoke gene-centric array to genotype an independent discovery sample of 25,118 individuals that combined hypertensive case-control and general population samples. We followed up four SNPs associated with BP at our p < 8.56 × 10(-7) study-specific significance threshold and six suggestively associated SNPs in a further 59,349 individuals. We identified and replicated a SNP at LSP1/TNNT3, a SNP at MTHFR-NPPB independent (r(2) = 0.33) of previous reports, and replicated SNPs at AGT and ATP2B1 reported previously. An analysis of combined discovery and follow-up data identified SNPs significantly associated with BP at p < 8.56 × 10(-7) at four further loci (NPR3, HFE, NOS3, and SOX6). The high number of discoveries made with modest genotyping effort can be attributed to using a large-scale yet targeted genotyping array and to the development of a weighting scheme that maximized power when meta-analyzing results from samples ascertained with extreme phenotypes, in combination with results from nonascertained or population samples. Chromatin immunoprecipitation and transcript expression data highlight potential gene regulatory mechanisms at the MTHFR and NOS3 loci. These results provide candidates for further study to help dissect mechanisms affecting BP and highlight the utility of studying SNPs and samples that are independent of those studied previously even when the sample size is smaller than that in previous studies.
Resumo:
BACKGROUND: The Complete Arabidopsis Transcript MicroArray (CATMA) initiative combines the efforts of laboratories in eight European countries 1 to deliver gene-specific sequence tags (GSTs) for the Arabidopsis research community. The CATMA initiative offers the power and flexibility to regularly update the GST collection according to evolving knowledge about the gene repertoire. These GST amplicons can easily be reamplified and shared, subsets can be picked at will to print dedicated arrays, and the GSTs can be cloned and used for other functional studies. This ongoing initiative has already produced approximately 24,000 GSTs that have been made publicly available for spotted microarray printing and RNA interference. RESULTS: GSTs from the CATMA version 2 repertoire (CATMAv2, created in 2002) were mapped onto the gene models from two independent Arabidopsis nuclear genome annotation efforts, TIGR5 and PSB-EuGène, to consolidate a list of genes that were targeted by previously designed CATMA tags. A total of 9,027 gene models were not tagged by any amplified CATMAv2 GST, and 2,533 amplified GSTs were no longer predicted to tag an updated gene model. To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform. To complete the CATMA repertoire, all 9,027 gene models for which no GST had yet been designed were processed with an adjusted version of the Specific Primer and Amplicon Design Software (SPADS). A total of 5,756 novel GSTs were designed and amplified by PCR from genomic DNA. Together with the pre-existing GST collection, this new addition constitutes the CATMAv3 repertoire. It comprises 30,343 unique amplified sequences that tag 24,202 and 23,009 protein-encoding nuclear gene models in the TAIR6 and EuGène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs) to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition. CONCLUSION: To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.
Resumo:
Androgenetic alopecia (AGA) is a highly heritable condition and the most common form of hair loss in humans. Susceptibility loci have been described on the X chromosome and chromosome 20, but these loci explain a minority of its heritable variance. We conducted a large-scale meta-analysis of seven genome-wide association studies for early-onset AGA in 12,806 individuals of European ancestry. While replicating the two AGA loci on the X chromosome and chromosome 20, six novel susceptibility loci reached genome-wide significance (p = 2.62×10(-9)-1.01×10(-12)). Unexpectedly, we identified a risk allele at 17q21.31 that was recently associated with Parkinson's disease (PD) at a genome-wide significant level. We then tested the association between early-onset AGA and the risk of PD in a cross-sectional analysis of 568 PD cases and 7,664 controls. Early-onset AGA cases had significantly increased odds of subsequent PD (OR = 1.28, 95% confidence interval: 1.06-1.55, p = 8.9×10(-3)). Further, the AGA susceptibility alleles at the 17q21.31 locus are on the H1 haplotype, which is under negative selection in Europeans and has been linked to decreased fertility. Combining the risk alleles of six novel and two established susceptibility loci, we created a genotype risk score and tested its association with AGA in an additional sample. Individuals in the highest risk quartile of a genotype score had an approximately six-fold increased risk of early-onset AGA [odds ratio (OR) = 5.78, p = 1.4×10(-88)]. Our results highlight unexpected associations between early-onset AGA, Parkinson's disease, and decreased fertility, providing important insights into the pathophysiology of these conditions.
Resumo:
Red wood ants (Formica rufa group) constitute a group of species that are considered to be among the most promising bioindicators in forest ecosystems. However, because of their morphological similarity and intraspecific variability, morphological species identification can be difficult. Considerable expertise is necessary to discriminate between the sibling species F. lugubris and F. paralugubris, two species that often live in sympatry in the same Alpine forests. New taxonomic tools providing rapid and reliable species identification are needed. We present a simple and reliable molecular technique based on mtDNA (COI gene) and a restriction enzyme for discriminating between F. lugubris and F. paralugubris. We confirm the validity of this method with a Bayesian analysis based on microsatellites. This new molecular tool represents a clear breakthrough for discriminating between F. lugubris and F. paralugubris and is likely to be helpful in large-scale biomonitoring.
Resumo:
A first assessment of debris flow susceptibility at a large scale was performed along the National Road N7, Argentina. Numerous catchments are prone to debris flows and likely to endanger the road-users. A 1:50,000 susceptibility map was created. The use of a DEM (grid 30 m) associated to three complementary criteria (slope, contributing area, curvature) allowed the identification of potential source areas. The debris flow spreading was estimated using a process- and GISbased model (Flow-R) based on basic probabilistic and energy calculations. The best-fit values for the coefficient of friction and the mass-to-drag ratio of the PCM model were found to be ? = 0.02 and M/D = 180 and the resulting propagation on one of the calibration site was validated using the Coulomb friction model. The results are realistic and will be useful to determine which areas need to be prioritized for detailed studies.
Resumo:
Nanoparticles <100 nanometres are being introduced into industrial processes, but they are suspected to cause similar negative health effects to ambient particles. Poor knowledge about the scale of introduction has not allowed global risk analysis until now. In 2006 a targeted telephone survey among Swiss companies (1) showed the usage of nanoparticles in a few selected companies but did not provide data to extrapolate to the full Swiss workforce. The purpose of the study presented here was to provide a quantitative estimate of the potential occupational exposure to nanoparticles in Swiss industry. Method: A layered representative questionnaire survey among 1626 Swiss companies of the production sector was conducted in 2007. The survey was a written questionnaire, collecting data about the used nanoparticles, the number of potentially exposed persons in the companies and their protection strategy. Results: The response rate of the study was 58.3%. The number of companies estimated to be using nanoparticles in Switzerland was 586 (95% Confidence Interval 145 to 1027). It is estimated that 1309 workers (95% CI 1073 to 1545) do their job in the same room as a nanoparticle application. Personal protection was shown to be the predominant protection means. Such information is valuable for risk evaluation. The low number of companies dealing with nanoparticles in Switzerland suggests that policy makers as well as health, safety and environmental officers within companies can focus their efforts on a relatively small number of companies or workers. The collected data about types of particles and applications may be used for research on prevention strategies and adapted protection means. However, to reflect the most recent trends, the information presented here has to be continuously updated, and a large-scale inventory of the usage should be considered.
Resumo:
Aim The spotted knapweed (Centaurea stoebe), a plant native to south-east and central Europe, is highly invasive in North America. We investigated the spatio-temporal climatic niche dynamics of the spotted knapweed in North America along two putative eastern and western invasion routes. We then considered the patterns observed in the light of historical, ecological and evolutionary factors. Location Europe and North America. Methods The niche characteristics of the east and west invasive populations of spotted knapweed in North America were determined from documented occurrences over 120 consecutive years (1890-2010). The 2.5 and 97.5 percentiles of values along temperature and precipitation gradients, as given by the two first axes of a principal component axis (PCA), were then calculated. We additionally measured the climatic dissimilarity between invaded and native niches using a multivariate environmental similarity surface (MESS) analysis. Results Along both invasion routes, the species established in regions with climatic conditions that were similar to those in the native range in Europe. An initial spread in ruderal habitats always preceded spread in (semi-)natural habitats. In the east, the niche gradually increased over time until it reached limits similar to the native niche. Conversely, in the west the niche abruptly expanded after an extended time lag into climates not occupied in the native range; only the native cold niche limit was conserved. Main conclusions Our study reveals that different niche dynamics have taken place during the eastern and western invasions. This pattern indicates different combinations of historical, ecological and evolutionary factors in the two ranges. We hypothesize that the lack of a well-developed transportation network in the west at the time of the introduction of spotted knapweed confined the species to a geographically and climatically isolated region. The invasion of dry rangelands may have been favoured during the agricultural transition in the 1930s by release from natural enemies, local adaptation and less competitive vegetation, but further experimental and molecular studies are needed to explain these contrasting niche patterns fully. Our study illustrates the need and benefit of applying large-scale, temporally explicit approaches to understanding biological invasions.
Resumo:
Estudi realitzat a partir d’una estada al Laboratoire d’études sur les monothéismes (UMR 8584, Centre national de la recherche scientifique / École pratique des hautes études / Université Paris IV-Sorbonne), França, entre 2010 i 2011. Anàlisi de la crisi estructural que afectà a l’església gal•la entre el darrer quart del segle IV i el primer del segle VI, crisi causada per la cristianització a gran escala de les elits aristocràtiques gal•loromanes i per la reivindicació per part d’aquest estament de la translació a l’esfera de la jerarquia institucional de l’Església de la seva preeminència econòmica i social. Aquest procés implicà l’aparició d’algunes interpretacions del “fet existencial cristià” que tractaven de legitimar en el plànol teòric la presa del control de les comunitats cristianes per part de la noblesa senatorial. En relació a aquest últim punt, s’ha donat particular rellevància a l’anomenada “controvèrsia semipelagiana” a Provença, amb especial èmfasi en dos punts: a) la relació entre l’oposició a la teologia agustiniana de la gràcia en alguns cercles monàstics provençals –Marsella, Lérins– i l’emergència en aquests ambients d’una literatura autobiogràfica en la que la reflexió sobre els conceptes de uocatio divina i conuersio a l’ascetisme cristià està estretament vinculada a un esforç teòric de redefinició i reorientació de l’ethos aristocràtic; i b) la relació entre els punts teològics debatuts en aquesta controvèrsia i les concepcions eclesiològiques dels pensadors que hi prengueren part –entengui’s aquí per eclesiologia la definició teòrica dels límits i dels fonaments de la “comunitat cristiana”, amb especial incidència en aquest cas en els plantejaments sobre el rol que l’aristòcrata havia d’exercir en aquestes noves comunitats “transversals”–. Aquest projecte bianual ha posat de manifest la inexistència d’una “teologia semipelagiana”, ateses les antagòniques concepcions eclesiològiques dels autors tradicionalment associats a aquesta corrent de pensament: Cassià entén la comunitat cristiana com una elit ascètica en la que els criteris “laics” d’estratificació social queden suspesos, i rebutja –en la teoria i en la pràctica– que aquesta elit hagi d’assumir el lideratge de la comunitat de fidels seglars; en els autors del cercle de Lérins, en canvi, l’oposició a la teologia agustiniana de la gràcia és inspirada per l’esforç d’importar a tota la comunitat cristiana els ideals monàstics, quelcom que fou també una via de legitimació de l’autoritat dels monjos-bisbes d’origen aristocràtic sorgits del cenobi de Lérins.
Resumo:
The Computational Biophysics Group at the Universitat Pompeu Fabra (GRIB-UPF) hosts two unique computational resources dedicated to the execution of large scale molecular dynamics (MD) simulations: (a) the ACMD molecular-dynamics software, used on standard personal computers with graphical processing units (GPUs); and (b) the GPUGRID. net computing network, supported by users distributed worldwide that volunteer GPUs for biomedical research. We leveraged these resources and developed studies, protocols and open-source software to elucidate energetics and pathways of a number of biomolecular systems, with a special focus on flexible proteins with many degrees of freedom. First, we characterized ion permeation through the bactericidal model protein Gramicidin A conducting one of the largest studies to date with the steered MD biasing methodology. Next, we addressed an open problem in structural biology, the determination of drug-protein association kinetics; we reconstructed the binding free energy, association, and dissaciociation rates of a drug like model system through a spatial decomposition and a Makov-chain analysis. The work was published in the Proceedings of the National Academy of Sciences and become one of the few landmark papers elucidating a ligand-binding pathway. Furthermore, we investigated the unstructured Kinase Inducible Domain (KID), a 28-peptide central to signalling and transcriptional response; the kinetics of this challenging system was modelled with a Markovian approach in collaboration with Frank Noe’s group at the Freie University of Berlin. The impact of the funding includes three peer-reviewed publication on high-impact journals; three more papers under review; four MD analysis components, released as open-source software; MD protocols; didactic material, and code for the hosting group.
Resumo:
Many terrestrial and marine systems are experiencing accelerating decline due to the effects of global change. This situation has raised concern about the consequences of biodiversity losses for ecosystem function, ecosystem service provision, and human well-being. Coastal marine habitats are a main focus of attention because they harbour a high biological diversity, are among the most productive systems of the world and present high anthropogenic interaction levels. The accelerating degradation of many terrestrial and marine systems highlights the urgent need to evaluate the consequence of biodiversity loss. Because marine biodiversity is a dynamic entity and this study was interested global change impacts, this study focused on benthic biodiversity trends over large spatial and long temporal scales. The main aim of this project was to investigate the current extent of biodiversity of the high diverse benthic coralligenous community in the Mediterranean Sea, detect its changes, and predict its future changes over broad spatial and long temporal scales. These marine communities are characterized by structural species with low growth rates and long life spans; therefore they are considered particularly sensitive to disturbances. For this purpose, this project analyzed permanent photographic plots over time at four locations in the NW Mediterranean Sea. The spatial scale of this study provided information on the level of species similarity between these locations, thus offering a solid background on the amount of large scale variability in coralligenous communities; whereas the temporal scale was fundamental to determine the natural variability in order to discriminate between changes observed due to natural factors and those related to the impact of disturbances (e.g. mass mortality events related to positive thermal temperatures, extreme catastrophic events). This study directly addressed the challenging task of analyzing quantitative biodiversity data of these high diverse marine benthic communities. Overall, the scientific knowledge gained with this research project will improve our understanding in the function of marine ecosystems and their trajectories related to global change.
Resumo:
Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.
Resumo:
Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.
Resumo:
CodeML (part of the PAML package) im- plements a maximum likelihood-based approach to de- tect positive selection on a specific branch of a given phylogenetic tree. While CodeML is widely used, it is very compute-intensive. We present SlimCodeML, an optimized version of CodeML for the branch-site model. Our performance analysis shows that SlimCodeML substantially outperforms CodeML (up to 9.38 times faster), especially for large-scale genomic analyses.
Resumo:
A large proportion of the death toll associated with malaria is a consequence of malaria infection during pregnancy, causing up to 200,000 infant deaths annually. We previously published the first extensive genetic association study of placental malaria infection, and here we extend this analysis considerably, investigating genetic variation in over 9,000 SNPs in more than 1,000 genes involved in immunity and inflammation for their involvement in susceptibility to placental malaria infection. We applied a new approach incorporating results from both single gene analysis as well as gene-gene interactionson a protein-protein interaction network. We found suggestive associations of variants in the gene KLRK1 in the single geneanalysis, as well as evidence for associations of multiple members of the IL-7/IL-7R signalling cascade in the combined analysis. To our knowledge, this is the first large-scale genetic study on placental malaria infection to date, opening the door for follow-up studies trying to elucidate the genetic basis of this neglected form of malaria.