684 resultados para Molecular Sequence Data
Resumo:
High-fidelity 'proofreading' polymerases are often used in library construction for next-generation sequencing projects, in an effort to minimize errors in the resulting sequence data. The increased template fidelity of these polymerases can come at the cost of reduced template specificity, and library preparation methods based on the AFLP technique may be particularly susceptible. Here, we compare AFLP profiles generated with standard Taq and two versions of a high-fidelity polymerase. We find that Taq produces fewer and brighter peaks than high-fidelity polymerase, suggesting that Taq performs better at selectively amplifying templates that exactly match the primer sequences. Because the higher accuracy of proofreading polymerases remains important for sequencing applications, we suggest that it may be more effective to use alternative library preparation methods.
Resumo:
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Resumo:
BACKGROUND: The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations. RESULTS: Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of Microtus arvalis. Using nuclear microsatellites and mitochondrial sequence data in the same estimation procedure enabled us to infer sex-specific population sizes and migration rates and to find that males show smaller population sizes but much higher levels of migration than females. CONCLUSION: ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.
Resumo:
BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.
Resumo:
Background: The ratio of the rates of non-synonymous and synonymous substitution (d(N)/d(S)) is commonly used to estimate selection in coding sequences. It is often suggested that, all else being equal, d(N)/d(S) should be lower in populations with large effective size (Ne) due to increased efficacy of purifying selection. As N-e is difficult to measure directly, life history traits such as body mass, which is typically negatively associated with population size, have commonly been used as proxies in empirical tests of this hypothesis. However, evidence of whether the expected positive correlation between body mass and d(N)/d(S) is consistently observed is conflicting. Results: Employing whole genome sequence data from 48 avian species, we assess the relationship between rates of molecular evolution and life history in birds. We find a negative correlation between dN/dS and body mass, contrary to nearly neutral expectation. This raises the question whether the correlation might be a method artefact. We therefore in turn consider non-stationary base composition, divergence time and saturation as possible explanations, but find no clear patterns. However, in striking contrast to d(N)/d(S), the ratio of radical to conservative amino acid substitutions (K-r/K-c) correlates positively with body mass. Conclusions: Our results in principle accord with the notion that non-synonymous substitutions causing radical amino acid changes are more efficiently removed by selection in large populations, consistent with nearly neutral theory. These findings have implications for the use of d(N)/d(S) and suggest that caution is warranted when drawing conclusions about lineage-specific modes of protein evolution using this metric.
Resumo:
UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representation and annotation of metabolic pathways. UniPathway provides explicit representations of enzyme-catalyzed and spontaneous chemical reactions, as well as a hierarchical representation of metabolic pathways. This hierarchy uses linear subpathways as the basic building block for the assembly of larger and more complex pathways, including species-specific pathway variants. All of the pathway data in UniPathway has been extensively cross-linked to existing pathway resources such as KEGG and MetaCyc, as well as sequence resources such as the UniProt KnowledgeBase (UniProtKB), for which UniPathway provides a controlled vocabulary for pathway annotation. We introduce here the basic concepts underlying the UniPathway resource, with the aim of allowing users to fully exploit the information provided by UniPathway.
Resumo:
PURPOSE: To improve the risk stratification of patients with rhabdomyosarcoma (RMS) through the use of clinical and molecular biologic data. PATIENTS AND METHODS: Two independent data sets of gene-expression profiling for 124 and 101 patients with RMS were used to derive prognostic gene signatures by using a meta-analysis. These and a previously published metagene signature were evaluated by using cross validation analyses. A combined clinical and molecular risk-stratification scheme that incorporated the PAX3/FOXO1 fusion gene status was derived from 287 patients with RMS and evaluated. RESULTS: We showed that our prognostic gene-expression signature and the one previously published performed well with reproducible and significant effects. However, their effect was reduced when cross validated or tested in independent data and did not add new prognostic information over the fusion gene status, which is simpler to assay. Among nonmetastatic patients, patients who were PAX3/FOXO1 positive had a significantly poorer outcome compared with both alveolar-negative and PAX7/FOXO1-positive patients. Furthermore, a new clinicomolecular risk score that incorporated fusion gene status (negative and PAX3/FOXO1 and PAX7/FOXO1 positive), Intergroup Rhabdomyosarcoma Study TNM stage, and age showed a significant increase in performance over the current risk-stratification scheme. CONCLUSION: Gene signatures can improve current stratification of patients with RMS but will require complex assays to be developed and extensive validation before clinical application. A significant majority of their prognostic value was encapsulated by the fusion gene status. A continuous risk score derived from the combination of clinical parameters with the presence or absence of PAX3/FOXO1 represents a robust approach to improving current risk-adapted therapy for RMS.
Resumo:
Determining the relative roles of vicariance and selection in restricting gene flow between populations is of central importance to the evolutionary process of population divergence and speciation. Here we use molecular and morphological data to contrast the effect of isolation (by mountains and geographical distance) with that of ecological factors (altitudinal gradients) in promoting differentiation in the wedge-billed woodcreeper, Glyphorynchus spirurus, a tropical forest bird, in Ecuador. Tarsus length and beak size increased relative to body size with altitude on both sides of the Andes, and were correlated with the amount of moss on tree trunks, suggesting the role of selection in driving adaptive divergence. In contrast, molecular data revealed a considerable degree of admixture along these altitudinal gradients, suggesting that adaptive divergence in morphological traits has occurred in the presence of gene flow. As suggested by mitochondrial DNA sequence data, the Andes act as a barrier to gene flow between ancient subspecific lineages. Genome-wide amplified fragment length polymorphism markers reflected more recent patterns of gene flow and revealed fine-scale patterns of population differentiation that were not detectable with mitochondrial DNA, including the differentiation of isolated coastal populations west of the Andes. Our results support the predominant role of geographical isolation in driving genetic differentiation in G. spirurus, yet suggest the role of selection in driving parallel morphological divergence along ecological gradients.
Resumo:
The genus Prunus L. is large and economically important. However, phylogenetic relationships within Prunus at low taxonomic level, particularly in the subgenus Amygdalus L. s.l., remain poorly investigated. This paper attempts to document the evolutionary history of Amygdalus s.l. and establishes a temporal framework, by assembling molecular data from conservative and variable molecular markers. The nuclear s6pdh gene in combination with the plastid trnSG spacer are analyzed with bayesian and maximum likelihood methods. Since previous phylogenetic analysis with these markers lacked resolution, we additionally analyzed 13 nuclear SSR loci with the δµ2 distance, followed by an unweighted pair group method using arithmetic averages algorithm. Our phylogenetic analysis with both sequence and SSR loci confirms the split between sections Amygdalus and Persica, comprising almonds and peaches, respectively. This result is in agreement with biogeographic data showing that each of the two sections is naturally distributed on each side of the Central Asian Massif chain. Using coalescent based estimations, divergence times between the two sections strongly varied when considering sequence data only or combined with SSR. The sequence-only based estimate (5 million years ago) was congruent with the Central Asian Massif orogeny and subsequent climate change. Given the low level of differentiation within the two sections using both marker types, the utility of combining microsatellites and data sequences to address phylogenetic relationships at low taxonomic level within Amygdalus is discussed. The recent evolutionary histories of almond and peach are discussed in view of the domestication processes that arose in these two phenotypically-diverging gene pools: almonds and peaches were domesticated from the Amygdalus s.s. and Persica sections, respectively. Such economically important crops may serve as good model to study divergent domestication process in close genetic pool.