965 resultados para MULTIPLE SEQUENCE ALIGNMENT


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: DNA assembly programs classically perform an all-against-all comparison of reads to identify overlaps, followed by a multiple sequence alignment and generation of a consensus sequence. If the aim is to assemble a particular segment, instead of a whole genome or transcriptome, a target-specific assembly is a more sensible approach. GenSeed is a Perl program that implements a seed-driven recursive assembly consisting of cycles comprising a similarity search, read selection and assembly. The iterative process results in a progressive extension of the original seed sequence. GenSeed was tested and validated on many applications, including the reconstruction of nuclear genes or segments, full-length transcripts, and extrachromosomal genomes. The robustness of the method was confirmed through the use of a variety of DNA and protein seeds, including short sequences derived from SAGE and proteome projects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. Results In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. Conclusion A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: IgE is the pivotal-specific effector molecule of allergic reactions yet it remains unclear whether the elevated production of IgE in atopic individuals is due to superantigen activation of B cell populations, increased antibody class switching to IgE or oligoclonal allergen-driven IgE responses. Objectives: To increase our understanding of the mechanisms driving IgE responses in allergic disease we examined immunoglobulin variable regions of IgE heavy chain transcripts from three patients with seasonal rhinitis due to grass pollen allergy. Methods: Variable domain of heavy chain-epsilon constant domain 1 cDNAs were amplified from peripheral blood using a two-step semi-nested PCR, cloned and sequenced. Results: The VH gene family usage in subject A was broadly based, but there were two clusters of sequences using genes VH 3-9 and 3-11 with unusually low levels of somatic mutations, 0-3%. Subject B repeatedly used VH 1-69 and subject C repeatedly used VH 1-02, 1-46 and 5a genes. Most clones were highly mutated being only 86-95% homologous to their germline VH gene counterparts and somatic mutations were more abundant at the complementarity determining rather than framework regions. Multiple sequence alignment revealed both repeated use of particular VH genes as well as clonal relatedness among clusters of IgE transcripts. Conclusion: In contrast to previous studies we observed no preferred VH gene common to IgE transcripts of the three subjects allergic to grass pollen. Moreover, most of the VH gene characteristics of the IgE transcripts were consistent with oligoclonal antigen-driven IgE responses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Antibody screening of phage-displayed random peptide libraries to identify mimotopes of conformational epitopes is promising. However, because interpretations can be difficult, an exemplary system has been used in the present study to investigate whether variation in the peptide sequences of selected phagotopes corresponded with variation in immunoreactivity. The phagotopes, derived using a well-characterized monoclonal antibody, CII-C1, to a known conformational epitope on type II collagen, C1, were tested by direct and inhibition ELISA for reactivity with CII-C1. A multiple sequence alignment algorithm, PILEUP, was used to sort the peptides expressed by the phagotopes into clusters. A model was prepared of the C1 epitope on type II collagen. The 12 selected phagotopes reacted with CII-C1 by both direct ELISA (titres from < 100-11 200) and inhibition ELISA (20-100% inhibition); the reactivity varied according to the peptide sequence and assay format. The differences in reactivity between the phagotopes were mostly in accord with the alignment, by PILEUP, of the peptide sequences. The finding that the phagotopes functionally mimicked the C1 epitope on collagen was validated in that amino acids RRL at the amino terminal of many of the peptides were topographically demonstrable on the model of the C1 epitope. Notably, one phagotope that expressed the widely divergent peptide C-IAPKRHNSA-C also mimicked the C1 epitope, as judged by reactivity in each of the assays used: these included cross-inhibition of CII-C1 reactivity with each of the other phagotopes and inhibition by a synthetic peptide corresponding to that expressed by the most frequently selected phagotope, RRLPFGSQM. Thus, it has been demonstrated that multiple phage-displayed peptides can mimic the same epitope and that observed immunoreactivity of selected phagotopes with the selecting mAb can depend on the primary sequence of the expressed peptide and also on the assay format used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pulicat Lake sediments are often severely polluted with the toxic heavy metal mercury. Several mercury-resistant strains of Bacillus species were isolated from the sediments and all the isolates exhibited broad spectrum resistance (resistance to both organic and inorganic mercuric compounds). Plasmid curing assay showed that all the isolated Bacillus strains carry chromosomally borne mercury resistance. Polymerase chain reaction and southern hybridization analyses using merA and merB3 gene primers/probes showed that five of the isolated Bacillus strains carry sequences similar to known merA and merB3 genes. Results of multiple sequence alignment revealed 99% similarity with merA and merB3 of TnMERI1 (class II transposons). Other mercury resistant Bacillus species lacking homology to these genes were not able to volatilize mercuric chloride, indicating the presence of other modes of resistance to mercuric compounds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pulicat Lake sediments are often severely polluted with the toxic heavy metal mercury. Several mercury-resistant strains of Bacillus species were isolated from the sediments and all the isolates exhibited broad spectrum resistance (resistance to both organic and inorganic mercuric compounds). Plasmid curing assay showed that all the isolated Bacillus strains carry chromosomally borne mercury resistance. Polymerase chain reaction and southern hybridization analyses using merA and merB3 gene primers/probes showed that five of the isolated Bacillus strains carry sequences similar to known merA and merB3 genes. Results of multiple sequence alignment revealed 99% similarity with merA and merB3 of TnMERI1 (class II transposons). Other mercury resistant Bacillus species lacking homology to these genes were not able to volatilize mercuric chloride, indicating the presence of other modes of resistance to mercuric compounds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thiolases are enzymes involved in lipid metabolism. Thiolases remove the acetyl-CoA moiety from 3-ketoacyl-CoAs in the degradative reaction. They can also catalyze the reverse Claisen condensation reaction, which is the first step of biosynthetic processes such as the biosynthesis of sterols and ketone bodies. In human, six distinct thiolases have been identified. Each of these thiolases is different from the other with respect to sequence, oligomeric state, substrate specificity and subcellular localization. Four sequence fingerprints, identifying catalytic loops of thiolases, have been described. In this study genome searches of two mycobacterial species (Mycobacterium tuberculosis and Mycobacterium smegmatis), were carried out, using the six human thiolase sequences as queries. Eight and thirteen different thiolase sequences were identified in M. tuberculosis and M. smegmatis, respectively. In addition, thiolase-like proteins (one encoded in the Mtb and two in the Msm genome) were found. The purpose of this study is to classify these mostly uncharacterized thiolases and thiolase-like proteins. Several other sequences obtained by searches of genome databases of bacteria, mammals and the parasitic protist family of the Trypanosomatidae were included in the analysis. Thiolase-like proteins were also found in the trypanosomatid genomes, but not in those of mammals. In order to study the phylogenetic relationships at a high confidence level, additional thiolase sequences were included such that a total of 130 thiolases and thiolase-like protein sequences were used for the multiple sequence alignment. The resulting phylogenetic tree identifies 12 classes of sequences, each possessing a characteristic set of sequence fingerprints for the catalytic loops. From this analysis it is now possible to assign the mycobacterial thiolases to corresponding homologues in other kingdoms of life. The results of this bioinformatics analysis also show interesting differences between the distributions of M. tuberculosis and M. smegmatis thiolases over the 12 different classes. (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It was expected that there are a coil (289 similar to 325) and two a helix (alpha(1)368 similar to 373, alpha(2)381 similar to 388) structures in p53 protein C-terminal region based on its mRNA secondary structure template and Chou-Fasman's protein secondary structure principle of prediction. The result was conformed by the other four methods of protein secondary structure prediction that are based on the multiple sequence alignment (accuracy = 73.20%). Combine with the 31 amino acids crystal structure of the oligomerization, the three dimensional conformation of p53 C-terminal 108 residues was built using the SGI INDIGO(2) computer. This structure further expounds the relationship among those biological function domains of p53 C- terminus at three-dimensional level.