9 resultados para SEQUENCE ALIGNMENT
em CentAUR: Central Archive University of Reading - UK
Resumo:
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.
Resumo:
One hundred and nine lactic acid bacterial strains (56 bifidobacteria-like and 53 lactobacilli-like) were isolated from faecal samples donated by healthy elderly individuals (>65 years old). Isolates were identified to species level by phenotypic analysis (by API) and by 16S rDNA sequencing. Eleven species of Lactobacillus and six species of Bifidobacterium were identified. The most frequently isolated lactobacillus was L. fermentum and the most frequently isolated bifidobacterium was closely related to B. infantis by 16S rDNA sequence alignment. The isolates were characterized for their antimicrobial activity against Clostridium difficile, enteropathogenic Escherichia coli (EPEC), verocytotoxigenic E. coli (VTEC) and Campylobacter jejuni. The lactobacilli displayed variations in their antimicrobial activity with few strains showing inhibitory activity against all pathogens. The bifidobacteria displayed higher levels of inhibitory activity against C. jejuni and Cl. difficile than against the E. coli strains. Keywords: Lactobacillus, Bifidobacterium, elderly, gastrointestinal microbiota, inhibition, Clostridium difficile, enteropathogenic Escherichia coli (EPEC), verocytotoxigenic E. coli (VTEC), Campylobacter jejuni.
Resumo:
A number of new and newly improved methods for predicting protein structure developed by the Jones–University College London group were used to make predictions for the CASP6 experiment. Structures were predicted with a combination of fold recognition methods (mGenTHREADER, nFOLD, and THREADER) and a substantially enhanced version of FRAGFOLD, our fragment assembly method. Attempts at automatic domain parsing were made using DomPred and DomSSEA, which are based on a secondary structure parsing algorithm and additionally for DomPred, a simple local sequence alignment scoring function. Disorder prediction was carried out using a new SVM-based version of DISOPRED. Attempts were also made at domain docking and “microdomain” folding in order to build complete chain models for some targets.
Resumo:
Motivation: In order to enhance genome annotation, the fully automatic fold recognition method GenTHREADER has been improved and benchmarked. The previous version of GenTHREADER consisted of a simple neural network which was trained to combine sequence alignment score, length information and energy potentials derived from threading into a single score representing the relationship between two proteins, as designated by CATH. The improved version incorporates PSI-BLAST searches, which have been jumpstarted with structural alignment profiles from FSSP, and now also makes use of PSIPRED predicted secondary structure and bi-directional scoring in order to calculate the final alignment score. Pairwise potentials and solvation potentials are calculated from the given sequence alignment which are then used as inputs to a multi-layer, feed-forward neural network, along with the alignment score, alignment length and sequence length. The neural network has also been expanded to accommodate the secondary structure element alignment (SSEA) score as an extra input and it is now trained to learn the FSSP Z-score as a measurement of similarity between two proteins. Results: The improvements made to GenTHREADER increase the number of remote homologues that can be detected with a low error rate, implying higher reliability of score, whilst also increasing the quality of the models produced. We find that up to five times as many true positives can be detected with low error rate per query. Total MaxSub score is doubled at low false positive rates using the improved method.
Resumo:
We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.
Resumo:
The alignment of model amyloid peptide YYKLVFFC is investigated in bulk and at a solid surface using a range of spectroscopic methods employing polarized radiation. The peptide is based on a core sequence of the amyloid beta (A beta) peptide, KLVFF. The attached tyrosine and cysteine units are exploited to yield information on alignment and possible formation of disulfide or dityrosine links. Polarized Raman spectroscopy on aligned stalks provides information on tyrosine orientation, which complements data from linear dichroism (LD) on aqueous solutions subjected to shear in a Couette cell. LD provides a detailed picture of alignment of peptide strands and aromatic residues and was also used to probe the kinetics of self-assembly. This suggests initial association of phenylalanine residues, followed by subsequent registry of strands and orientation of tyrosine residues. X-ray diffraction (XRD) data from aligned stalks is used to extract orientational order parameters from the 0.48 nm reflection in the cross-beta pattern, from which an orientational distribution function is obtained. X-ray diffraction on solutions subject to capillary flow confirmed orientation in situ at the level of the cross-beta pattern. The information on fibril and tyrosine orientation from polarized Raman spectroscopy is compared with results from NEXAFS experiments on samples prepared as films on silicon. This indicates fibrils are aligned parallel to the surface, with phenyl ring normals perpendicular to the surface. Possible disulfide bridging leading to peptide dimer formation was excluded by Raman spectroscopy, whereas dityrosine formation was probed by fluorescence experiments and was found not to occur except under alkaline conditions. Congo red binding was found not to influence the cross-beta XRD pattern.
Resumo:
The self-assembly and hydrogelation properties of two Fmoc-tripeptides [Fmoc = N-(fluorenyl-9-methoxycarbonyl)] are investigated, in borate buffer and other basic solutions. A remarkable difference in self-assembly properties is observed comparing Fmoc-VLK(Boc) with Fmoc-K(Boc)LV, both containing K protected by N(epsilon)-tert-butyloxycarbonate (Boc). In borate buffer, the former peptide forms highly anisotropic fibrils which show local alignment, and the hydrogels show flow-aligning properties. In contrast, Fmoc-K(Boc)LV forms highly branched fibrils that produce isotropic hydrogels with a much higher modulus (G' > 10(4) Pa), and lower concentration for hydrogel formation. The distinct self-assembled structures are ascribed to conformational differences, as revealed by secondary structure probes (CD, FTIR, Raman spectroscopy) and X-ray diffraction. Fmoc-VLK(Boc) forms well-defined beta-sheets with a cross-beta X-ray diffraction pattern, whereas Fmoc-KLV(Boc) forms unoriented assemblies with multiple stacked sheets. Interchange of the K and V residues when inverting the tripeptide sequence thus leads to substantial differences in self-assembled structures, suggesting a promising approach to control hydrogel properties.
Resumo:
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (±20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed