995 resultados para MULTIPLE ALIGNMENT
Resumo:
The M-Coffee server is a web server that makes it possible to compute multiple sequence alignments (MSAs) by running several MSA methods and combining their output into one single model. This allows the user to simultaneously run all his methods of choice without having to arbitrarily choose one of them. The MSA is delivered along with a local estimation of its consistency with the individual MSAs it was derived from. The computation of the consensus multiple alignment is carried out using a special mode of the T-Coffee package [Notredame, Higgins and Heringa (T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000; 302: 205-217); Wallace, O'Sullivan, Higgins and Notredame (M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 2006; 34: 1692-1699)] Given a set of sequences (DNA or proteins) in FASTA format, M-Coffee delivers a multiple alignment in the most common formats. M-Coffee is a freeware open source package distributed under a GPL license and it is available either as a standalone package or as a web service from www.tcoffee.org.
Resumo:
Sao Paulo State Research Foundation-FAPESP
Resumo:
Background: Identifying local similarity between two or more sequences, or identifying repeats occurring at least twice in a sequence, is an essential part in the analysis of biological sequences and of their phylogenetic relationship. Finding such fragments while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computationally expensive task, and consequently exact methods can usually not be applied in practice. Results: The filter TUIUIU that we introduce in this paper provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment or repeats inference method, eliminating a possibly large fraction of the input that is guaranteed not to contain any approximate repeat. It consists in the verification of several strong necessary conditions that can be checked in a fast way. We implemented three versions of the filter. The first is simply a straightforward extension to the case of multiple sequences of an application of conditions already existing in the literature. The second uses a stronger condition which, as our results show, enable to filter sensibly more with negligible (if any) additional time. The third version uses an additional condition and pushes the sensibility of the filter even further with a non negligible additional time in many circumstances; our experiments show that it is particularly useful with large error rates. The latter version was applied as a preprocessing of a multiple alignment tool, obtaining an overall time (filter plus alignment) on average 63 and at best 530 times smaller than before (direct alignment), with in most cases a better quality alignment. Conclusion: To the best of our knowledge, TUIUIU is the first filter designed for multiple repeats and for dealing with error rates greater than 10% of the repeats length.
Resumo:
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous superposition (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 ‘orphans’ (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pa uling.mbu.iisc.ernet.in/~pali.
Resumo:
Intergenic spacers of chloroplast DNA (cpDNA) are very useful in phylogenetic and population genetic studies of plant species, to study their potential integration in phylogenetic analysis. The non-coding trnE-trnT intergenic spacer of cpDNA was analyzed to assess the nucleotide sequence polymorphism of 16 Solanaceae species and to estimate its ability to contribute to the resolution of phylogenetic studies of this group. Multiple alignments of DNA sequences of trnE-trnT intergenic spacer made the identification of nucleotide variability in this region possible and the phylogeny was estimated by maximum parsimony and rooted with Convolvulaceae Ipomoea batalas, the most closely related family. Besides, this intergenic spacer was tested for the phylogenetic ability to differentiate taxonomic levels. For this purpose, species from four other families were analyzed and compared with Solanaceae species. Results confirmed polymorphism in the trnE-trnT region at different taxonomic levels.
Resumo:
Xylella fastidiosa is a Gram negative plant pathogen causing many economically important diseases, and analyses of completely sequenced X. fastidiosa genome strains allowed the identification of many prophage-like elements and possibly phage remnants, accounting for up to 15% of the genome composition. To better evaluate the recent evolution of the X. fastidiosa chromosome backbone among distinct pathovars, the number and location of prophage-like regions on two finished genomes (9a5c and Temecula1), and in two candidate molecules (Ann1 and Dixon) were assessed. Based on comparative best bidirectional hit analyses, the majority (51%) of the predicted genes in the X. fastidiosa prophage-like regions are related to structural phage genes belonging to the Siphoviridae family. Electron micrograph reveals the existence of putative viral particles with similar morphology to lambda phages in the bacterial cell in planta. Moreover, analysis of microarray data indicates that 9a5c strain cultivated under stress conditions presents enhanced expression of phage anti-repressor genes, suggesting switches from lysogenic to lytic cycle of phages under stress-induced situations. Furthermore, virulence-associated proteins and toxins are found within these prophage-like elements, thus suggesting an important role in host adaptation. Finally, clustering analyses of phage integrase genes based on multiple alignment patterns reveal they group in five lineages, all possessing a tyrosine recombinase catalytic domain, and phylogenetically close to other integrases found in phages that are genetic mosaics and able to perform generalized and specialized transduction. Integration sites and tRNA association is also evidenced. In summary, we present comparative and experimental evidence supporting the association and contribution of phage activity on the differentiation of Xylella genomes.
Resumo:
A proteinase, named BmooMP alpha-I, from the venom of Bothrops moojeni, was purified by DEAE-Sephacel, Sephadex G-75 and heparin-agarose column chromatography. The enzyme was purified to homogeneity as judged by its migration profile in SDS-PAGE stained with coomassie blue, and showed a molecular mass of about 24.5 kDa. Its complete cDNA was obtained by RT-PCR and the 615 bp codified for a mature protein of 205 amino acid residues. The multiple alignment of its deduced amino acid sequence and those of other snake venom metalloproteinases showed a high structural similarly, mainly among class P-IB proteases. The enzyme cleaves the A alpha-chain of fibrinogen first, followed by the B beta-chain, and shows no effects on the gamma-chain. On fibrin, the enzyme hydrolyzed only the beta-chain, leaving the gamma-dimer apparently untouched. It was devoid of phospholipase A(2), hemorrhagic and thrombin-like activities. Like many venom enzymes, it is stable at pH values between 4 and 10 and stable at 70 degrees C for 15 min. The inhibitory effects of EDTA on the fibrinogenolytic activity suggest that BmooMP alpha-I is a metalloproteinase and inhibition by beta-mercaptoethanol revealed the important role of the disulfide bonds in the stabilization of the native structure. Aprotinin and benzamidine, specific serine proteinase inhibitors, had no effect on BmooMP alpha-I activity. Since the BmooMP alpha-I enzyme was found to cause defibrinogenation when administered i.p. on mice, it is expected that it may be of medical interest as a therapeutic agent in the treatment and prevention of arterial thrombosis. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Las aplicaciones de alineamiento múltiple de secuencias son prototipos de aplicaciones que requieren elevada potencia de cómputo y memoria. Se destacan por la relevancia científica que tienen los resultados que brindan a investigaciones científicas en el campo de la biomedicina, genética y farmacología. Las aplicaciones de alineamiento múltiple tienen la limitante de que no son capaces de procesar miles de secuencias, por lo que se hace necesario crear un modelo para resolver la problemática. Analizando el volumen de datos que se manipulan en el área de las ciencias biológica y la complejidad de los algoritmos de alineamiento de secuencias, la única vía de solución del problema es a través de la utilización de entornos de cómputo paralelos y la computación de altas prestaciones. La investigación realizada por nosotros tiene como objetivo la creación de un modelo paralelo que le permita a los algoritmos de alineamiento múltiple aumentar el número de secuencias a procesar, tratando de mantener la calidad en los resultados para garantizar la precisión científica. El modelo que proponemos emplea como base la clusterización de las secuencias de entrada utilizando criterios biológicos que permiten mantener la calidad de los resultados. Además, el modelo se enfoca en la disminución del tiempo de cómputo y consumo de memoria. Para presentar y validar el modelo utilizamos T-Coffee, como plataforma de desarrollo e investigación. El modelo propuesto pudiera ser aplicado a cualquier otro algoritmo de alineamiento múltiple de secuencias.
Resumo:
BACKGROUND: The bacterial flagellum is the most important organelle of motility in bacteria and plays a key role in many bacterial lifestyles, including virulence. The flagellum also provides a paradigm of how hierarchical gene regulation, intricate protein-protein interactions and controlled protein secretion can result in the assembly of a complex multi-protein structure tightly orchestrated in time and space. As if to stress its importance, plants and animals produce receptors specifically dedicated to the recognition of flagella. Aside from motility, the flagellum also moonlights as an adhesion and has been adapted by humans as a tool for peptide display. Flagellar sequence variation constitutes a marker with widespread potential uses for studies of population genetics and phylogeny of bacterial species. RESULTS: We sequenced the complete flagellin gene (flaA) in 18 different species and subspecies of Aeromonas. Sequences ranged in size from 870 (A. allosaccharophila) to 921 nucleotides (A. popoffii). The multiple alignment displayed 924 sites, 66 of which presented alignment gaps. The phylogenetic tree revealed the existence of two groups of species exhibiting different FlaA flagellins (FlaA1 and FlaA2). Maximum likelihood models of codon substitution were used to analyze flaA sequences. Likelihood ratio tests suggested a low variation in selective pressure among lineages, with an omega ratio of less than 1 indicating the presence of purifying selection in almost all cases. Only one site under potential diversifying selection was identified (isoleucine in position 179). However, 17 amino acid positions were inferred as sites that are likely to be under positive selection using the branch-site model. Ancestral reconstruction revealed that these 17 amino acids were among the amino acid changes detected in the ancestral sequence. CONCLUSION: The models applied to our set of sequences allowed us to determine the possible evolutionary pathway followed by the flaA gene in Aeromonas, suggesting that this gene have probably been evolving independently in the two groups of Aeromonas species since the divergence of a distant common ancestor after one or several episodes of positive selection. REVIEWERS: This article was reviewed by Alexey Kondrashov, John Logsdon and Olivier Tenaillon (nominated by Laurence D Hurst).
Resumo:
Background. Visceral leishmaniasis (VL) is caused by Leishmania donovani and Leishmania infantum chagasi. Genome-wide linkage studies from Sudan and Brazil identified a putative susceptibility locus on chromosome 6q27. Methods. Twenty-two single-nucleotide polymorphisms (SNPs) at genes PHF10, C6orf70, DLL1, FAM120B, PSMB1, and TBP were genotyped in 193 VL cases from 85 Sudanese families, and 8 SNPs at genes PHF10, C6orf70, DLL1, PSMB1, and TBP were genotyped in 194 VL cases from 80 Brazilian families. Family-based association, haplotype, and linkage disequilibrium analyses were performed. Multispecies comparative sequence analysis was used to identify conserved noncoding sequences carrying putative regulatory elements. Quantitative reverse-transcription polymerase chain reaction measured expression of candidate genes in splenic aspirates from Indian patients with VL compared with that in the control spleen sample. Results. Positive associations were observed at PHF10, C6orf70, DLL1, PSMB1, and TBP in Sudan, but only at DLL1 in Brazil (combined P = 3 x 10(-4) at DLL1 across Sudan and Brazil). No functional coding region variants were observed in resequencing of 22 Sudanese VL cases. DLL1 expression was significantly (P = 2 x 10(-7)) reduced (mean fold change, 3.5 [SEM, 0.7]) in splenic aspirates from patients with VL, whereas other 6q27 genes showed higher levels (1.27 x 10(-6) < P < .01) than did the control spleen sample. A cluster of conserved noncoding sequences with putative regulatory variants was identified in the distal promoter of DLL1. Conclusions. DLL1, which encodes Delta-like 1, the ligand for Notch3, is strongly implicated as the chromosome 6q27 VL susceptibility gene.
Resumo:
BjussuMP-II is an acidic low molecular weight metalloprotease (Mr similar to 24,000 and pI similar to 6.5), isolated from Bothrops jararacussu snake venom. The chromatographic profile in RP-HPLC and its N-terminal sequence confirmed its high purity level. Its complete cDNA was obtained by RT-PCR and the 615 bp codified for a mature protein of 205 amino acid residues. The multiple alignment of its deduced amino acid sequence and those of other snake venom metalloproteases showed a high structural similarity, mainly among class P-I proteases. The molecular modeling analysis of BjussuMP-II showed also conserved structural features with other SVMPs. BjussuMP-II did not induce hemorrhage, myotoxicity and lethality, but displayed dose-dependent proteolytic activity on fibrinogen, collagen, fibrin, casein and gelatin, keeping stable at different pHs, temperatures and presence of several divalent ions. BjussuMP-II did not show any clotting or anticoagulant activity on human citrated plasma, in contrast to its inhibitory effects on platelet aggregation. The aspects broached, in this work, provide data on the relationship between structure and function, in order to better understand the effects elicited by snake venom metalloproteases. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
As proteínas oxigenases com ferro não hêmico compartilham um domínio conservado composto por oito histidinas, podem ser encontradas em organismos eucariotos e procariotos, e participam de importantes vias de biossíntese lipídica. Para compreender a relação evolutiva existente entre essas proteínas, foram realizadas análises comparativa e filogenética em procariotos e eucariotos que permitiram uma classificação dessa família, até então inexistente. A busca de seqüências resultou, após a curadoria, em uma coleção de 448 proteínas, pertencentes a 58 organismos previamente selecionados dentro dos principais taxa. O alinhamento múltiplo de seqüências gerado com a ferramenta MAFFT (BLOSUM 62; L-INS-i) mostrou a presença do domínio de histidinas com espaçamento conservado entre os motivos. A classificação das proteínas feita com o software CLANS gerou 28 grupos a partir da similaridade entre pares de seqüências. Dentre esses, 2 contêm seqüências que não tiveram similaridade com proteínas já caracterizadas e 48 seqüências não foram atribuídas a quaisquer dos grupos formados. As seqüências de plantas, representadas por 119 seqüências da coleção, foram distribuídas em 7 grupos correspondentes às funções C4 metilesterol monoxigenase, C5 esterol desaturase, ácido graxo hidroxilase, esfingolipídeo C4 monooxigenase, aldeído decarbonilase, β-caroteno hidroxilase e Acil-ACP desaturase. A análise filogenética, utilizando o método de máxima verossimilhança com a ferramenta PhyML, mostrou a formação de grupos bem definidos e que foram similares aos gerados por CLANS. Esses resultados começam a preencher a lacuna existente até o momento acerca da relação evolutiva e da classificação das oxigenases com ferro não hêmico. Além disso, sugerem que dentro dessa família ainda há proteínas com funções desconhecidas, reforçando a necessidade de realizar mais estudos de caracterização funcional das mesmas.
Resumo:
The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree.