191 resultados para bioinformàtica


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manualannotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.Results: The GENCODE gene features are divided into eight different categories of which onlythe first two (known and novel coding sequence) are confidently predicted to be protein-codinggenes. 5’ rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentallyverify the initial annotation. Of the 420 coding loci tested, 229 RACE products have beensequenced. They supported 5’ extensions of 30 loci and new splice variants in 50 loci. In addition,46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15putative transcripts. We assessed the comprehensiveness of the GENCODE annotation byattempting to validate all the predicted exon boundaries outside the GENCODE annotation. Outof 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only twoof them in intergenic regions.Conclusions: In total, 487 loci, of which 434 are coding, have been annotated as part of theGENCODE reference set available from the UCSC browser. Comparison of GENCODEannotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained withinthe two sets, which is a reflection of the high number of alternative splice forms with uniqueexons annotated. Over 50% of coding loci have been experimentally verified by 5’ RACE forEGASP and the GENCODE collaboration is continuing to refine its annotation of 1% humangenome with the aid of experimental validation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A report of the 6th Georgia Tech-Oak Ridge National Lab International Conference on Bioinformatics 'In silico Biology: Gene Discovery and Systems Genomics', Atlanta, USA, 15-17 November, 2007.

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Selenoproteins are a diverse group of proteinsusually misidentified and misannotated in sequencedatabases. The presence of an in-frame UGA (stop)codon in the coding sequence of selenoproteingenes precludes their identification and correctannotation. The in-frame UGA codons are recodedto cotranslationally incorporate selenocysteine,a rare selenium-containing amino acid. The developmentof ad hoc experimental and, more recently,computational approaches have allowed the efficientidentification and characterization of theselenoproteomes of a growing number of species.Today, dozens of selenoprotein families have beendescribed and more are being discovered in recentlysequenced species, but the correct genomic annotationis not available for the majority of thesegenes. SelenoDB is a long-term project that aims toprovide, through the collaborative effort of experimentaland computational researchers, automaticand manually curated annotations of selenoproteingenes, proteins and SECIS elements. Version 1.0 ofthe database includes an initial set of eukaryoticgenomic annotations, with special emphasis on thehuman selenoproteome, for immediate inspectionby selenium researchers or incorporation into moregeneral databases. SelenoDB is freely available athttp://www.selenodb.org.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

RESUM Com a continuació del treball de final de carrera “Desenvolupament d’un laboratori virtual per a les pràctiques de Biologia Molecular” de Jordi Romero, s’ha realitzat una eina complementaria per a la visualització de molècules integrada en el propi laboratori virtual. Es tracta d’una eina per a la visualització gràfica de gens, ORF, marques i seqüències de restricció de molècules reals o fictícies. El fet de poder treballar amb molècules fictícies és la gran avantatge respecte a les solucions com GENBANK que només permet treballar amb molècules pròpies. Treballar amb molècules fictícies fa que sigui una solució ideal per a l’ensenyament, ja que dóna la possibilitat als professors de realitzar exercicis o demostracions amb molècules reals o dissenyades expressament per a l’exercici a demostrar. A més, permet mostrar de forma visual les diferents parts simultàniament o per separat, de manera que ofereix una primera aproximació interpretació dels resultats. Per altra banda, permet marcar gens, crear marques, localitzar seqüències de restricció i generar els ORF de la molècula que nosaltres creem o modificar una ja existent. Per l’implementació, s’ha continuat amb l’idea de separar la part de codi i la part de disseny en les aplicacions Flash. Per fer-ho, s’ha utilitzat la plataforma de codi lliure Ariware ARPv2.02 que proposa un marc de desenvolupament d’aplicacions Flash orientades a objectes amb el codi (classes ActionScript 2.0) separats del movieclip. Per al processament de dades s’ha fet servir Perl per ser altament utilitzat en Bioinformàtica i per velocitat de càlcul. Les dades generades es guarden en una Base de Dades en MYSQL (de lliure distribució), de la que s’extreuen les dades per generar fitxers XML, fent servir tant PHP com la plataforma AMFPHP com a enllaç entre Flash i la resta de parts.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A systematic assessment of global neural network connectivity through direct electrophysiological assays has remained technically infeasible, even in simpler systems like dissociated neuronal cultures. We introduce an improved algorithmic approach based on Transfer Entropy to reconstruct structural connectivity from network activity monitored through calcium imaging. We focus in this study on the inference of excitatory synaptic links. Based on information theory, our method requires no prior assumptions on the statistics of neuronal firing and neuronal connections. The performance of our algorithm is benchmarked on surrogate time series of calcium fluorescence generated by the simulated dynamics of a network with known ground-truth topology. We find that the functional network topology revealed by Transfer Entropy depends qualitatively on the time-dependent dynamic state of the network (bursting or non-bursting). Thus by conditioning with respect to the global mean activity, we improve the performance of our method. This allows us to focus the analysis to specific dynamical regimes of the network in which the inferred functional connectivity is shaped by monosynaptic excitatory connections, rather than by collective synchrony. Our method can discriminate between actual causal influences between neurons and spurious non-causal correlations due to light scattering artifacts, which inherently affect the quality of fluorescence imaging. Compared to other reconstruction strategies such as cross-correlation or Granger Causality methods, our method based on improved Transfer Entropy is remarkably more accurate. In particular, it provides a good estimation of the excitatory network clustering coefficient, allowing for discrimination between weakly and strongly clustered topologies. Finally, we demonstrate the applicability of our method to analyses of real recordings of in vitro disinhibited cortical cultures where we suggest that excitatory connections are characterized by an elevated level of clustering compared to a random graph (although not extreme) and can be markedly non-local.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The observation that real complex networks have internal structure has important implication for dynamic processes occurring on such topologies. Here we investigate the impact of community structure on a model of information transfer able to deal with both search and congestion simultaneously. We show that networks with fuzzy community structure are more efficient in terms of packet delivery than those with pronounced community structure. We also propose an alternative packet routing algorithm which takes advantage of the knowledge of communities to improve information transfer and show that in the context of the model an intermediate level of community structure is optimal. Finally, we show that in a hierarchical network setting, providing knowledge of communities at the level of highest modularity will improve network capacity by the largest amount.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. RESULTS: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers. CONCLUSION: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Els avenços en les bases dels mètodes teòrics i l'espectacular desenvolupament de la potència de càlcul han fet possible progressar enormement en el somni dels fundadors de la química, és a dir, ser capaços d'estudiar amb mètodes computacionals el conjunt de processos químics. Actualment, la química teòrica està completant el darrer avenç: intentar esdevenir l'eina més recent per a comprendre la naturalesa química dels éssers vius. Aquesta revisió pretén mostrar com els mètodes de la química teòrica, originalment desenvolupats per a examinar molècules petites en fase gas, han evolucionat per a assolir la complexa descripció de sistemes biològics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Bionformatics is a rapidly evolving research field dedicated toanalyzing and managing biological data with computational resources. This paperaims to overview some of the processes and applications currently implementedat CCiT-UB¿s Bioinformatics Unit, focusing mainly on the areas of Genomics,Transcriptomics and Proteomics

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS (http://genome.imim.es/datasets/abs2005/index.html) is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: The arrangement of regulatory motifs in gene promoters, or promoterarchitecture, is the result of mutation and selection processes that have operated over manymillions of years. In mammals, tissue-specific transcriptional regulation is related to the presence ofspecific protein-interacting DNA motifs in gene promoters. However, little is known about therelative location and spacing of these motifs. To fill this gap, we have performed a systematic searchfor motifs that show significant bias at specific promoter locations in a large collection ofhousekeeping and tissue-specific genes.Results: We observe that promoters driving housekeeping gene expression are enriched inparticular motifs with strong positional bias, such as YY1, which are of little relevance in promotersdriving tissue-specific expression. We also identify a large number of motifs that show positionalbias in genes expressed in a highly tissue-specific manner. They include well-known tissue-specificmotifs, such as HNF1 and HNF4 motifs in liver, kidney and small intestine, or RFX motifs in testis,as well as many potentially novel regulatory motifs. Based on this analysis, we provide predictionsfor 559 tissue-specific motifs in mouse gene promoters.Conclusion: The study shows that motif positional bias is an important feature of mammalianproximal promoters and that it affects both general and tissue-specific motifs. Motif positionalconstraints define very distinct promoter architectures depending on breadth of expression andtype of tissue.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In past years, comprehensive representations of cell signalling pathways have been developed by manual curation from literature, which requires huge effort and would benefit from information stored in databases and from automatic retrieval and integration methods. Once a reconstruction of the network of interactions is achieved, analysis of its structural features and its dynamic behaviour can take place. Mathematical modelling techniques are used to simulate the complex behaviour of cell signalling networks, which ultimately sheds light on the mechanisms leading to complex diseases or helps in the identification of drug targets. A variety of databases containing information on cell signalling pathways have been developed in conjunction with methodologies to access and analyse the data. In principle, the scenario is prepared to make the most of this information for the analysis of the dynamics of signalling pathways. However, are the knowledge repositories of signalling pathways ready to realize the systems biology promise? In this article we aim to initiate this discussion and to provide some insights on this issue.