982 resultados para PREDICTIONS
Resumo:
A fundamental question in developmental biology is how tissues are patterned to give rise to differentiated body structures with distinct morphologies. The Drosophila wing disc offers an accessible model to understand epithelial spatial patterning. It has been studied extensively using genetic and molecular approaches. Bristle patterns on the thorax, which arise from the medial part of the wing disc, are a classical model of pattern formation, dependent on a pre-pattern of trans-activators and –repressors. Despite of decades of molecular studies, we still only know a subset of the factors that determine the pre-pattern. We are applying a novel and interdisciplinary approach to predict regulatory interactions in this system. It is based on the description of expression patterns by simple logical relations (addition, subtraction, intersection and union) between simple shapes (graphical primitives). Similarities and relations between primitives have been shown to be predictive of regulatory relationships between the corresponding regulatory factors in other Systems, such as the Drosophila egg. Furthermore, they provide the basis for dynamical models of the bristle-patterning network, which enable us to make even more detailed predictions on gene regulation and expression dynamics. We have obtained a data-set of wing disc expression patterns which we are now processing to obtain average expression patterns for each gene. Through triangulation of the images we can transform the expression patterns into vectors which can easily be analysed by Standard clustering methods. These analyses will allow us to identify primitives and regulatory interactions. We expect to identify new regulatory interactions and to understand the basic Dynamics of the regulatory network responsible for thorax patterning. These results will provide us with a better understanding of the rules governing gene regulatory networks in general, and provide the basis for future studies of the evolution of the thorax-patterning network in particular.
Resumo:
Earth System Models (ESM) have been successfuly developed over past few years, and are currently beeing used for simulating present day-climate, seasonal to interanual predictions of climate change. The supercomputer performance plays an important role in climate modeling since one of the challenging issues for climate modellers is to efficiently and accurately couple earth System components on present day computers architectures. At the Barcelona Supercomputing Center (BSC), we work with the EC- Earth System Model. The EC- Earth is an ESM, which currently consists of an atmosphere (IFS) and an ocean (NEMO) model that communicate with each other through the OASIS coupler. Additional modules (e.g. for chemistry and vegetation ) are under development. The EC-Earth ESM has been ported successfully over diferent high performance computin platforms (e.g, IBM P6 AIX, CRAY XT-5, Intelbased Linux Clusters, SGI Altix) at diferent sites in Europ (e.g., KNMI, ICHEC, ECMWF). The objective of the first phase of the project was to identify and document the issues related with the portability and performance of EC-Earth on the MareNostrum supercomputer, a System based on IBM PowerPC 970MP processors and run under a Linux Suse Distribution. EC-Earth was successfully ported to MareNostrum, and a compilation incompatibilty was solved by a two step compilation approach using XLF version 10.1 and 12.1 compilers. In addition, the EC-Earth performance was analyzed with respect to escalability and trace analysis with the Paravear software. This analysis showed that EC-Earth with a larger number of IFS CPUs (<128) is not feasible at the moment since some issues exists with the IFS-NEMO balance and MPI Communications.
Resumo:
We report evidence that salience may have economically signi.cant e¤ects on homeowners.borrowing behavior, through a bias in favour of less salient but more costly loans. Survey evidence corroborates the existence of such a bias. We outline a simple model in which some consumers are biased and show that under plausible assumptions this affects prices in equilibrium. Market data support the predictions of the model.
Resumo:
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic–stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to ∼2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3′-UTRs. While we estimate a significant false discovery rate of ∼50%–70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).
Resumo:
Annotation of protein-coding genes is a key goal of genome sequencing projects. In spite of tremendous recent advances in computational gene finding, comprehensive annotation remains a challenge. Peptide mass spectrometry is a powerful tool for researching the dynamic proteome and suggests an attractive approach to discover and validate protein-coding genes. We present algorithms to construct and efficiently search spectra against a genomic database, with no prior knowledge of encoded proteins. By searching a corpus of 18.5 million tandem mass spectra (MS/MS) from human proteomic samples, we validate 39,000 exons and 11,000 introns at the level of translation. We present translation-level evidence for novel or extended exons in 16 genes, confirm translation of 224 hypothetical proteins, and discover or confirm over 40 alternative splicing events. Polymorphisms are efficiently encoded in our database, allowing us to observe variant alleles for 308 coding SNPs. Finally, we demonstrate the use of mass spectrometry to improve automated gene prediction, adding 800 correct exons to our predictions using a simple rescoring strategy. Our results demonstrate that proteomic profiling should play a role in any genome sequencing project.
Resumo:
GeneID is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, and start and stop codons are predicted and scored along the sequence using position weight matrices (PWMs). In the second step, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the log-likelihood ratio of a Markov model for coding DNA. In the last step, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. In this paper we describe the obtention of PWMs for sites, and the Markov model of coding DNA in Drosophila melanogaster. We also compare other models of coding DNA with the Markov model. Finally, we present and discuss the results obtained when GeneID is used to predict genes in the Adh region. These results show that the accuracy of GeneID predictions compares currently with that of other existing tools but that GeneID is likely to be more efficient in terms of speed and memory usage.
Resumo:
In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.
Resumo:
The completion of the sequencing of the mouse genome promises to help predict human genes with greater accuracy. While current ab initio gene prediction programs are remarkably sensitive (i.e., they predict at least a fragment of most genes), their specificity is often low, predicting a large number of false-positive genes in the human genome. Sequence conservation at the protein level with the mouse genome can help eliminate some of those false positives. Here we describe SGP2, a gene prediction program that combines ab initio gene prediction with TBLASTX searches between two genome sequences to provide both sensitive and specific gene predictions. The accuracy of SGP2 when used to predict genes by comparing the human and mouse genomes is assessed on a number of data sets, including single-gene data sets, the highly curated human chromosome 22 predictions, and entire genome predictions from ENSEMBL. Results indicate that SGP2 outperforms purely ab initio gene prediction methods. Results also indicate that SGP2 works about as well with 3x shotgun data as it does with fully assembled genomes. SGP2 provides a high enough specificity that its predictions can be experimentally verified at a reasonable cost. SGP2 was used to generate a complete set of gene predictions on both the human and mouse by comparing the genomes of these two species. Our results suggest that another few thousand human and mouse genes currently not in ENSEMBL are worth verifying experimentally.
Resumo:
Given the rate of projected environmental change for the 21st century, urgent adaptation and mitigation measures are required to slow down the on-going erosion of biodiversity. Even though increasing evidence shows that recent human-induced environmental changes have already triggered species' range shifts, changes in phenology and species' extinctions, accurate projections of species' responses to future environmental changes are more difficult to ascertain. This is problematic, since there is a growing awareness of the need to adopt proactive conservation planning measures using forecasts of species' responses to future environmental changes. There is a substantial body of literature describing and assessing the impacts of various scenarios of climate and land-use change on species' distributions. Model predictions include a wide range of assumptions and limitations that are widely acknowledged but compromise their use for developing reliable adaptation and mitigation strategies for biodiversity. Indeed, amongst the most used models, few, if any, explicitly deal with migration processes, the dynamics of population at the "trailing edge" of shifting populations, species' interactions and the interaction between the effects of climate and land-use. In this review, we propose two main avenues to progress the understanding and prediction of the different processes A occurring on the leading and trailing edge of the species' distribution in response to any global change phenomena. Deliberately focusing on plant species, we first explore the different ways to incorporate species' migration in the existing modelling approaches, given data and knowledge limitations and the dual effects of climate and land-use factors. Secondly, we explore the mechanisms and processes happening at the trailing edge of a shifting species' distribution and how to implement them into a modelling approach. We finally conclude this review with clear guidelines on how such modelling improvements will benefit conservation strategies in a changing world. (c) 2007 Rubel Foundation, ETH Zurich. Published by Elsevier GrnbH. All rights reserved.
Evolutionary history and its relevance in understanding and conserving southern African biodiversity
Resumo:
Abstract : Understanding how biodiversity is distributed is central to any conservation effort and has traditionally been based on niche modeling and the causal relationship between spatial distribution of organisms and their environment. More recently, the study of species' evolutionary history and relatedness has permeated the fields of ecology and conservation and, coupled with spatial predictions, provides useful insights to the origin of current biodiversity patterns, community structuring and potential vulnerability to extinction. This thesis explores several key ecological questions by combining the fields of niche modeling and phylogenetics and using important components of southern African biodiversity. The aims of this thesis are to provide comparisons of biodiversity measures, to assess how climate change will affect evolutionary history loss, to ask whether there is a clear link between evolutionary history and morphology and to investigate the potential role of relatedness in macro-climatic niche structuring. The first part of my thesis provides a fine scale comparison and spatial overlap quantification of species richness and phylogenetic diversity predictions for one of the most diverse plant families in the Cape Floristic Region (CFR), the Proteaceae. In several of the measures used, patterns do not match sufficiently to argue that species relatedness information is implicit in species richness patterns. The second part of my thesis predicts how climate change may affect threat and potential extinction of southern African animal and plant taxa. I compare present and future niche models to assess whether predicted species extinction will result in higher or lower V phylogenetic diversity survival than what would be experienced under random extinction processes. l find that predicted extinction will result in lower phylogenetic diversity survival but that this non-random pattern will be detected only after a substantial proportion of the taxa in each group has been lost. The third part of my thesis explores the relationship between phylogenetic and morphological distance in southern African bats to assess whether long evolutionary histories correspond to equally high levels of morphological variation, as predicted by a neutral model of character evolution. I find no such evidence; on the contrary weak negative trends are detected for this group, as well as in simulations of both neutral and convergent character evolution. Finally, I ask whether spatial and climatic niche occupancy in southern African bats is influenced by evolutionary history or not. I relate divergence time between species pairs to climatic niche and range overlap and find no evidence for clear phylogenetic structuring. I argue that this may be due to particularly high levels of micro-niche partitioning. Résumé : Comprendre la distribution de la biodiversité représente un enjeu majeur pour la conservation de la nature. Les analyses se basent le plus souvent sur la modélisation de la niche écologique à travers l'étude des relations causales entre la distribution spatiale des organismes et leur environnement. Depuis peu, l'étude de l'histoire évolutive des organismes est également utilisée dans les domaines de l'écologie et de la conservation. En combinaison avec la modélisation de la distribution spatiale des organismes, cette nouvelle approche fournit des informations pertinentes pour mieux comprendre l'origine des patterns de biodiversité actuels, de la structuration des communautés et des risques potentiels d'extinction. Cette thèse explore plusieurs grandes questions écologiques, en combinant les domaines de la modélisation de la niche et de la phylogénétique. Elle s'applique aux composants importants de la biodiversité de l'Afrique australe. Les objectifs de cette thèse ont été l) de comparer différentes mesures de la biodiversité, 2) d'évaluer l'impact des changements climatiques à venir sur la perte de diversité phylogénétique, 3) d'analyser le lien potentiel entre diversité phylogénétique et diversité morphologique et 4) d'étudier le rôle potentiel de la phylogénie sur la structuration des niches macro-climatiques des espèces. La première partie de cette thèse fournit une comparaison spatiale, et une quantification du chevauchement, entre des prévisions de richesse spécifique et des prédictions de la diversité phylogénétique pour l'une des familles de plantes les plus riches en espèces de la région floristique du Cap (CFR), les Proteaceae. Il résulte des analyses que plusieurs mesures de diversité phylogénétique montraient des distributions spatiales différentes de la richesse spécifique, habituellement utilisée pour édicter des mesures de conservation. La deuxième partie évalue les effets potentiels des changements climatiques attendus sur les taux d'extinction d'animaux et de plantes de l'Afrique australe. Pour cela, des modèles de distribution d'espèces actuels et futurs ont permis de déterminer si l'extinction des espèces se traduira par une plus grande ou une plus petite perte de diversité phylogénétique en comparaison à un processus d'extinction aléatoire. Les résultats ont effectivement montré que l'extinction des espèces liées aux changements climatiques pourrait entraîner une perte plus grande de diversité phylogénétique. Cependant, cette perte ne serait plus grande que celle liée à un processus d'extinction aléatoire qu'à partir d'une forte perte de taxons dans chaque groupe. La troisième partie de cette thèse explore la relation entre distances phylogénétiques et morphologiques d'espèces de chauves-souris de l'Afrique australe. ll s'agit plus précisément de déterminer si une longue histoire évolutive correspond également à des variations morphologiques plus grandes dans ce groupe. Cette relation est en fait prédite par un modèle neutre d'évolution de caractères. Aucune évidence de cette relation n'a émergé des analyses. Au contraire, des tendances négatives ont été détectées, ce qui représenterait la conséquence d'une évolution convergente entre clades et des niveaux élevés de cloisonnement pour chaque clade. Enfin, la dernière partie présente une étude sur la répartition de la niche climatique des chauves-souris de l'Afrique australe. Dans cette étude je rapporte temps de divergence évolutive (ou deux espèces ont divergé depuis un ancêtre commun) au niveau de chevauchement de leurs niches climatiques. Les résultats n'ont pas pu mettre en évidence de lien entre ces deux paramètres. Les résultats soutiennent plutôt l'idée que cela pourrait être I dû à des niveaux particulièrement élevés de répartition de la niche à échelle fine.
Resumo:
Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.
Resumo:
Background: Despite the continuous production of genome sequence for a number of organisms,reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularlytrue for genomes for which there is not a large collection of known gene sequences, such as therecently published chicken genome. We used the chicken sequence to test comparative andhomology-based gene-finding methods followed by experimental validation as an effective genomeannotation method.Results: We performed experimental evaluation by RT-PCR of three different computational genefinders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram wascomputed and each component of it was evaluated. The results showed that de novo comparativemethods can identify up to about 700 chicken genes with no previous evidence of expression, andcan correctly extend about 40% of homology-based predictions at the 5' end.Conclusions: De novo comparative gene prediction followed by experimental verification iseffective at enhancing the annotation of the newly sequenced genomes provided by standardhomology-based methods.
Resumo:
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
Resumo:
Background: Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both.Results: Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors.Conclusion: We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
Resumo:
Background: The understanding of whole genome sequences in higher eukaryotes depends to a large degree on the reliable definition of transcription units including exon/intron structures, translated open reading frames (ORFs) and flanking untranslated regions. The best currently available chicken transcript catalog is the Ensembl build based on the mappings of a relatively small number of full length cDNAs and ESTs to the genome as well as genome sequence derived in silico gene predictions.Results: We use Long Serial Analysis of Gene Expression (LongSAGE) in bursal lymphocytes and the DT40 cell line to verify the quality and completeness of the annotated transcripts. 53.6% of the more than 38,000 unique SAGE tags (unitags) match to full length bursal cDNAs, the Ensembl transcript build or the genome sequence. The majority of all matching unitags show single matches to the genome, but no matches to the genome derived Ensembl transcript build. Nevertheless, most of these tags map close to the 3' boundaries of annotated Ensembl transcripts.Conclusions: These results suggests that rather few genes are missing in the current Ensembl chicken transcript build, but that the 3' ends of many transcripts may not have been accurately predicted. The tags with no match in the transcript sequences can now be used to improve gene predictions, pinpoint the genomic location of entirely missed transcripts and optimize the accuracy of gene finder software.