963 resultados para Multiple Sequence Alignment
Resumo:
Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.
Resumo:
We present the genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of A1163 with the recently sequenced A. fumigatus isolate Af293 has identified core, variable and up to 2% unique genes in each genome. While the core genes are 99.8% identical at the nucleotide level, identity for variable genes can be as low 40%. The most divergent loci appear to contain heterokaryon incompatibility ( het) genes associated with fungal programmed cell death such as developmental regulator rosA. Cross-species comparison has revealed that 8.5%, 13.5% and 12.6%, respectively, of A. fumigatus, N. fischeri and A. clavatus genes are species-specific. These genes are significantly smaller in size than core genes, contain fewer exons and exhibit a subtelomeric bias. Most of them cluster together in 13 chromosomal islands, which are enriched for pseudogenes, transposons and other repetitive elements. At least 20% of A. fumigatus-specific genes appear to be functional and involved in carbohydrate and chitin catabolism, transport, detoxification, secondary metabolism and other functions that may facilitate the adaptation to heterogeneous environments such as soil or a mammalian host. Contrary to what was suggested previously, their origin cannot be attributed to horizontal gene transfer ( HGT), but instead is likely to involve duplication, diversification and differential gene loss (DDL). The role of duplication in the origin of lineage-specific genes is further underlined by the discovery of genomic islands that seem to function as designated ""gene dumps'' and, perhaps, simultaneously, as "" gene factories''.
Resumo:
The freshwater prawn Macrobrachium amazonicum is widely distributed in South America, and occupies habitats with a wide range of salinities. Several investigations have revealed the existence of wide intraspecific variability among different populations, although the understanding of this variability is still fragmentary and incomplete. We compared and characterized inland and coastal populations of M. amazonicum from Brazil, using molecular data (16S and COI mtDNA) to describe the degree of variability, structure, and relationships among them. Genetic divergence rates among populations showed variability at the intraspecific level. All the analyses evidenced significant genetic divergence among populations, structuring them in three groups: I-inland waters of the Amazonian Hydrographic Region (HR); II-Parana/Paraguay HR; and III-coastal systems of northern and northeastern Brazil. Phylogenetic reconstructions revealed that the populations form a single monophyletic clade, which supports their characterization as a single species. Clade I was a sister clade of that formed by clades II and III, which were themselves sister clades. Populations from Sertaozinho/Miguelopolis and Avare, introduced into the state of Sao Paulo, may have originated from natural populations in the states of Mato Grosso do Sul and Para, respectively. Geographical isolation probably contributed to the observed variation, and if this isolation continues. M. amazonicum may undergo speciation within its broad geographical distribution. The sequences obtained here can be used as name-tags for population identification, and the DNA barcodes are useful to identify the origin of specimens used in different freshwater-prawn cultures or introduced populations of unknown origin.
Resumo:
Background: Rhipicephalus sanguineus, known as the brown dog tick, is a common ectoparasite of domestic dogs and can be found worldwide. R. sanguineus is recognized as the primary vector of the etiological agent of canine monocytic ehrlichiosis and canine babesiosis. Here we present the first description of a R. sanguineus salivary gland transcriptome by the production and analysis of 2,034 expressed sequence tags (EST) from two cDNA libraries, one consctructed using mRNA from dissected salivary glands from female ticks fed for 3-5 days (early to mid library, RsSGL1) and the another from ticks fed for 5 days (mid library, RsSGL2), identifying 1,024 clusters of related sequences. Results: Based on sequence similarities to nine different databases, we identified transcripts of genes that were further categorized according to function. The category of putative housekeeping genes contained similar to 56% of the sequences and had on average 2.49 ESTs per cluster, the secreted protein category contained 26.6% of the ESTs and had 2.47 EST's/clusters, while 15.3% of the ESTs, mostly singletons, were not classifiable, and were annotated as ""unknown function"". The secreted category included genes that coded for lipocalins, proteases inhibitors, disintegrins, metalloproteases, immunomodulatory and antiinflammatory proteins, as Evasins and Da-p36, as well as basic-tail and 18.3 kDa proteins, cement proteins, mucins, defensins and antimicrobial peptides. Comparison of the abundance of ESTs from similar contigs of the two salivary gland cDNA libraries allowed the identification of differentially expressed genes, such as genes coding for Evasins and a thrombin inhibitor, which were over expressed in the RsSGL1 (early to mid library) versus RsSGL2 (mid library), indicating their role in inhibition of inflammation at the tick feeding site from the very beginning of the blood meal. Conversely, sequences related to cement (64P), which function has been correlated with tick attachment, was largely expressed in the mid library. Conclusions: Our survey provided an insight into the R. sanguineus sialotranscriptome, which can assist the discovery of new targets for anti-tick vaccines, as well as help to identify pharmacologically active proteins.
Resumo:
Background: Mites (Acari) have traditionally been treated as monophyletic, albeit composed of two major lineages: Acariformes and Parasitiformes. Yet recent studies based on morphology, molecular data, or combinations thereof, have increasingly drawn their monophyly into question. Furthermore, the usually basal (molecular) position of one or both mite lineages among the chelicerates is in conflict to their morphology, and to the widely accepted view that mites are close relatives of Ricinulei. Results: The phylogenetic position of the acariform mites is examined through employing SSU, partial LSU sequences, and morphology from 91 chelicerate extant terminals (forty Acariformes). In a static homology framework, molecular sequences were aligned using their secondary structure as guide, whereby regions of ambiguous alignment were discarded, and pre-aligned sequences analyzed under parsimony and different mixed models in a Bayesian inference. Parsimony and Bayesian analyses led to trees largely congruent concerning infraordinal, well-supported branches, but with low support for inter-ordinal relationships. An exception is Solifugae + Acariformes (P. P = 100%, J. = 0.91). In a dynamic homology framework, two analyses were run: a standard POY analysis and an analysis constrained by secondary structure. Both analyses led to largely congruent trees; supporting a (Palpigradi (Solifugae Acariformes)) clade and Ricinulei as sister group of Tetrapulmonata with the topology (Ricinulei (Amblypygi (Uropygi Araneae))). Combined analysis with two different morphological data matrices were run in order to evaluate the impact of constraining the analysis on the recovered topology when employing secondary structure as a guide for homology establishment. The constrained combined analysis yielded two topologies similar to the exclusively molecular analysis for both morphological matrices, except for the recovery of Pedipalpi instead of the (Uropygi Araneae) clade. The standard (direct optimization) POY analysis, however, led to the recovery of trees differing in the absence of the otherwise well-supported group Solifugae + Acariformes. Conclusions: Previous studies combining ribosomal sequences and morphology often recovered topologies similar to purely morphological analyses of Chelicerata. The apparent stability of certain clades not recovered here, like Haplocnemata and Acari, is regarded as a byproduct of the way the molecular homology was previously established using the instrumentalist approach implemented in POY. Constraining the analysis by a priori homology assessment is defended here as a way of maintaining the severity of the test when adding new data to the analysis. Although the strength of the method advocated here is keeping phylogenetic information from regions usually discarded in an exclusively static homology framework; it still has the inconvenience of being uninformative on the effect of alignment ambiguity on resampling methods of clade support estimation. Finally, putative morphological apomorphies of Solifugae + Acariformes are the reduction of the proximal cheliceral podomere, medial abutting of the leg coxae, loss of sperm nuclear membrane, and presence of differentiated germinative and secretory regions in the testis delivering their products into a common lumen.
Resumo:
In February 2007, sweet orange trees with characteristic symptoms of huanglongbing (HLB) were encountered in a region of Sao Paulo state (SPs) hitherto free of HLB. These trees tested negative for the three liberibacter species associated with HLB. A polymerase chain reaction (PCR) product from symptomatic fruit columella DNA amplifications with universal primers fDI/rPI was cloned and sequenced. The corresponding agent was found to have highest 16S rDNA sequence identity (99%) with the Pigeon pea witches`-broom phytoplasma of group 16Sr IX. Sequences of PCR products obtained with phytoplasma 16S rDNA primer pairs fU5/rU3, fU5/P7 confirm these result.,;. With two primers D7f2/D7r2 designed based oil the 16S rDNA Sequence of the cloned DNA fragment, positive amplifications were obtained from more than one hundred samples including symptomatic fruits and blotchy mottle leaves. Samples positive for phytoplasmas were negative for liberibacters, except for four samples, which were positive for both the phytoplasma and `Candidatus Liberibacter asiaticus`. The phytoplasma was detected by electron microscopy in the sieve tubes of midribs from symptomatic leaves. These results Show that a phytoplasma of group IX is associated with citrus HLB symptoms ill northern, central, and Southern SPs. This phytoplasma has very probably been transmitted to citrus from an external Source of inoculum, but the Putative insect vector is not yet known.
Resumo:
Allergies are a major cause of chronic ill health in industrialised countries with the incidence of reported cases steadily increasing. This Research Focus details how bioinformatics is transforming the field of allergy through providing databases for management of allergen data, algorithms for characterisation of allergic crossreactivity, structural motifs and B- and T-cell epitopes, tools for prediction of allergenicity and techniques for genomic and proteomic analysis of allergens.
Resumo:
Allergy is a major cause of morbidity worldwide. The number of characterized allergens and related information is increasing rapidly creating demands for advanced information storage, retrieval and analysis. Bioinformatics provides useful tools for analysing allergens and these are complementary to traditional laboratory techniques for the study of allergens. Specific applications include structural analysis of allergens, identification of B- and T-cell epitopes, assessment of allergenicity and cross-reactivity, and genome analysis. In this paper, the most important bioinformatic tools and methods with relevance to the study of allergy have been reviewed.
Resumo:
The current taxonomy of two poorly known hermit crab species Pagurus forceps H. Milne Edwards, 1836 and Pagurus comptus White, 1847 from temperate Pacific and Atlantic coastlines of South America is based only on adult morphology. Past studies have questioned the separation of these two very similar species, which occur sympatrically. We included specimens morphologically assignable to P. forceps and P. comptus in a phylogenetic analysis, along with other selected anomuran decapods, based on 16S ribosomal gene sequences. Differences between samples putatively assigned to either P. forceps and P. comptus were moderate, with sequence similarity ranging from 98.2 to 99.4% for the fragments analyzed. Our comparison of mitochondrial DNA sequences (16S rRNA) revealed diagnostic differences between the two putative species, suggesting that P. forceps and P. comptus are indeed phylogenetically close but different species, with no genetic justification to support their synonymization. The polyphyly of Pagurus is not corroborated here among the represented Atlantic species, despite obviously complex relationships among the members of the genus.
Resumo:
Although many mathematical models exist predicting the dynamics of transposable elements (TEs), there is a lack of available empirical data to validate these models and inherent assumptions. Genomes can provide a snapshot of several TE families in a single organism, and these could have their demographics inferred by coalescent analysis, allowing for the testing of theories on TE amplification dynamics. Using the available genomes of the mosquitoes Aedes aegypti and Anopheles gambiae, we indicate that such an approach is feasible. Our analysis follows four steps: (1) mining the two mosquito genomes currently available in search of TE families; (2) fitting, to selected families found in (1), a phylogeny tree under the general time-reversible (GTR) nucleotide substitution model with an uncorrelated lognormal (UCLN) relaxed clock and a nonparametric demographic model; (3) fitting a nonparametric coalescent model to the tree generated in (2); and (4) fitting parametric models motivated by ecological theories to the curve generated in (3).
Resumo:
Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.
Resumo:
Las aplicaciones de alineamiento múltiple de secuencias son prototipos de aplicaciones que requieren elevada potencia de cómputo y memoria. Se destacan por la relevancia científica que tienen los resultados que brindan a investigaciones científicas en el campo de la biomedicina, genética y farmacología. Las aplicaciones de alineamiento múltiple tienen la limitante de que no son capaces de procesar miles de secuencias, por lo que se hace necesario crear un modelo para resolver la problemática. Analizando el volumen de datos que se manipulan en el área de las ciencias biológica y la complejidad de los algoritmos de alineamiento de secuencias, la única vía de solución del problema es a través de la utilización de entornos de cómputo paralelos y la computación de altas prestaciones. La investigación realizada por nosotros tiene como objetivo la creación de un modelo paralelo que le permita a los algoritmos de alineamiento múltiple aumentar el número de secuencias a procesar, tratando de mantener la calidad en los resultados para garantizar la precisión científica. El modelo que proponemos emplea como base la clusterización de las secuencias de entrada utilizando criterios biológicos que permiten mantener la calidad de los resultados. Además, el modelo se enfoca en la disminución del tiempo de cómputo y consumo de memoria. Para presentar y validar el modelo utilizamos T-Coffee, como plataforma de desarrollo e investigación. El modelo propuesto pudiera ser aplicado a cualquier otro algoritmo de alineamiento múltiple de secuencias.