994 resultados para biological sequences


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The work of biochemists and molecular biologists often is dependent or extremely favored by a preliminary computer analysis. Thus, the development of an efficient and friendly computational tool is very important. In this work, we developed a package of programs in Javascript language which can be used online or locally. The programs depend exclusively of Web browsers and are compatible with Internet Explorer, Opera, Mozilla Firefox and Google Chrome. With the EBiAn package it is can perform the main analysis and manipulation of DNA, RNA, proteins and peptides sequences. The programs can be freely accessed and adapted or modified to generate new programs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Identifying local similarity between two or more sequences, or identifying repeats occurring at least twice in a sequence, is an essential part in the analysis of biological sequences and of their phylogenetic relationship. Finding such fragments while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computationally expensive task, and consequently exact methods can usually not be applied in practice. Results: The filter TUIUIU that we introduce in this paper provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment or repeats inference method, eliminating a possibly large fraction of the input that is guaranteed not to contain any approximate repeat. It consists in the verification of several strong necessary conditions that can be checked in a fast way. We implemented three versions of the filter. The first is simply a straightforward extension to the case of multiple sequences of an application of conditions already existing in the literature. The second uses a stronger condition which, as our results show, enable to filter sensibly more with negligible (if any) additional time. The third version uses an additional condition and pushes the sensibility of the filter even further with a non negligible additional time in many circumstances; our experiments show that it is particularly useful with large error rates. The latter version was applied as a preprocessing of a multiple alignment tool, obtaining an overall time (filter plus alignment) on average 63 and at best 530 times smaller than before (direct alignment), with in most cases a better quality alignment. Conclusion: To the best of our knowledge, TUIUIU is the first filter designed for multiple repeats and for dealing with error rates greater than 10% of the repeats length.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les processus Markoviens continus en temps sont largement utilisés pour tenter d’expliquer l’évolution des séquences protéiques et nucléotidiques le long des phylogénies. Des modèles probabilistes reposant sur de telles hypothèses sont conçus pour satisfaire la non-homogénéité spatiale des contraintes fonctionnelles et environnementales agissant sur celles-ci. Récemment, des modèles Markov-modulés ont été introduits pour décrire les changements temporels dans les taux d’évolution site-spécifiques (hétérotachie). Des études ont d’autre part démontré que non seulement la force mais également la nature de la contrainte sélective agissant sur un site peut varier à travers le temps. Ici nous proposons de prendre en charge cette réalité évolutive avec un modèle Markov-modulé pour les protéines sous lequel les sites sont autorisés à modifier leurs préférences en acides aminés au cours du temps. L’estimation a posteriori des différents paramètres modulants du noyau stochastique avec les méthodes de Monte Carlo est un défi de taille que nous avons su relever partiellement grâce à la programmation parallèle. Des réglages computationnels sont par ailleurs envisagés pour accélérer la convergence vers l’optimum global de ce paysage multidimensionnel relativement complexe. Qualitativement, notre modèle semble être capable de saisir des signaux d’hétérogénéité temporelle à partir d’un jeu de données dont l’histoire évolutive est reconnue pour être riche en changements de régimes substitutionnels. Des tests de performance suggèrent de plus qu’il serait mieux ajusté aux données qu’un modèle équivalent homogène en temps. Néanmoins, les histoires substitutionnelles tirées de la distribution postérieure sont bruitées et restent difficilement interprétables du point de vue biologique.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Il progresso tecnologico nel campo della biologia molecolare, pone la comunità scientifica di fronte all’esigenza di dare un’interpretazione all’enormità di sequenze biologiche che a mano a mano vanno a costituire le banche dati, siano esse proteine o acidi nucleici. In questo contesto la bioinformatica gioca un ruolo di primaria importanza. Un nuovo livello di possibilità conoscitive è stato introdotto con le tecnologie di Next Generation Sequencing (NGS), per mezzo delle quali è possibile ottenere interi genomi o trascrittomi in poco tempo e con bassi costi. Tra le applicazioni del NGS più rilevanti ci sono senza dubbio quelle oncologiche che prevedono la caratterizzazione genomica di tessuti tumorali e lo sviluppo di nuovi approcci diagnostici e terapeutici per il trattamento del cancro. Con l’analisi NGS è possibile individuare il set completo di variazioni che esistono nel genoma tumorale come varianti a singolo nucleotide, riarrangiamenti cromosomici, inserzioni e delezioni. Va però sottolineato che le variazioni trovate nei geni vanno in ultima battuta osservate dal punto di vista degli effetti a livello delle proteine in quanto esse sono le responsabili più dirette dei fenotipi alterati riscontrabili nella cellula tumorale. L’expertise bioinformatica va quindi collocata sia a livello dell’analisi del dato prodotto per mezzo di NGS ma anche nelle fasi successive ove è necessario effettuare l’annotazione dei geni contenuti nel genoma sequenziato e delle relative strutture proteiche che da esso sono espresse, o, come nel caso dello studio mutazionale, la valutazione dell’effetto della variazione genomica. È in questo contesto che si colloca il lavoro presentato: da un lato lo sviluppo di metodologie computazionali per l’annotazione di sequenze proteiche e dall’altro la messa a punto di una pipeline di analisi di dati prodotti con tecnologie NGS in applicazioni oncologiche avente come scopo finale quello della individuazione e caratterizzazione delle mutazioni genetiche tumorali a livello proteico.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Human endogenous retroviruses (HERVs) are very likely footprints of ancient germ-cell infections. HERV sequences encompass about 1% of the human genome. HERVs have retained the potential of other retroelements to retrotranspose and thus to change genomic structure and function. The genomes of almost all HERV families are highly defective. Recent progress has allowed the identification of the biologically most active family, HTDV/HERV-K, which codes for viral proteins and particles and is highly expressed in germ-cell tumors. The demonstrable and potential roles of HTDV/HERV-K as well as of other human elements in disease and in maintaining genome plasticity are illustrated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In Brazil, the Laurencia complex is represented by twenty taxa: Laurencia s.s. with twelve species, Palisada with four species (including Chondrophycus furcatus now that the proposal of its transference to Palisada is in process), and Osmundea and Yuzurua with two species each. The majority of the Brazilian species of the Laurencia complex have been phylogenetically analyzed by 54 rbcL sequences, including five other Rhodomelacean species as outgroups. The analysis showed that the Laurencia complex is monophyletic with high posterior probability value. The complex was separated into five clades, corresponding to the genera: Chondrophycus, Laurencia, Osmundea, Palisada, and Yuzurua. A bibliographical survey of the terpenoids produced by Brazilian species showed that only six species of Laurencia and five of Palisada (including C. furcatcus) have been submitted to chemical analysis with 48 terpenoids (47 sesquiterpenes and one triterpene) isolated. No diterpenes were found. Of the total, 23 sesquiterpenes belong to the bisabolane class and eighteen to the chamigrene type, whose biochemical precursor is bisabolane, two are derived from lauranes and four are triquinols. Despite the considerable number of known terpenes and their ecological and pharmacological importance, few experimental biological studies have been performed. In this review, only bioactivities related to human health were considered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We recently evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach in delineating breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Wolbachia are maternally inherited intracellular bacteria that infect a wide range of arthropods and nematodes and are associated with various reproductive abnormalities in their hosts. Insect-associated Wolbachia form a monophyletic clade in the α-Proteobacteria and recently have been separated into two supergroups (A and B) and 19 groups. Our recent polymerase chain reaction (PCR) survey using wsp specific primers indicated that various strains of Wolbachia were present in mosquitoes collected from Southeast Asia. Here, we report the phylogenetic relationship of the Wolbachia strains found in these mosquitoes using wsp gene sequences. Our phylogenetic analysis revealed eight new Wolbachia strains, five in the A supergroup and three in the B supergroup. Most of the Wolbachia strains present in Southeast Asian mosquitoes belong to the established Mors, Con, and Pip groups.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Wolbachia endosymbiotic bacteria are widespread in arthropods and are also present in filarial nematodes. Almost all filarial species so far examined have been found to harbor these endosymbionts. The sequences of only three genes have been published for nematode Wolbachia (i.e., the genes coding for the proteins FtsZ and catalase and for 16S rRNA). Here we present the sequences of the genes coding for the Wolbachia surface protein (WSP) from the endosymbionts of eight species of filaria. Complete gene sequences were obtained from the endosymbionts of two different species, Dirofilaria immitis and Brugia malayi. These sequences allowed us to design general primers for amplification of the wsp gene from the Wolbachia of all filarial species examined. For these species, partial WSP sequences (about 600 base pairs) were obtained with these primers. Phylogenetic analysis groups these nematode wsp sequences into a coherent cluster. Within the nematode cluster, wsp-based Wolbachia phylogeny matches a previous phylogeny obtained with ftsZ gene sequences, with a good consistency of the phylogeny of hosts (nematodes) and symbionts (Wolbachia). In addition, different individuals of the same host species (Dirofilaria immitis and Wuchereria bancrofti) show identical wsp gene sequences.