169 resultados para SEQUENCE ALIGNMENT
em Queensland University of Technology - ePrints Archive
Resumo:
Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.
Resumo:
Background Flower development in kiwifruit (Actinidia spp.) is initiated in the first growing season, when undifferentiated primordia are established in latent shoot buds. These primordia can differentiate into flowers in the second growing season, after the winter dormancy period and upon accumulation of adequate winter chilling. Kiwifruit is an important horticultural crop, yet little is known about the molecular regulation of flower development. Results To study kiwifruit flower development, nine MADS-box genes were identified and functionally characterized. Protein sequence alignment, phenotypes obtained upon overexpression in Arabidopsis and expression patterns suggest that the identified genes are required for floral meristem and floral organ specification. Their role during budbreak and flower development was studied. A spontaneous kiwifruit mutant was utilized to correlate the extended expression domains of these flowering genes with abnormal floral development. Conclusions This study provides a description of flower development in kiwifruit at the molecular level. It has identified markers for flower development, and candidates for manipulation of kiwifruit growth, phase change and time of flowering. The expression in normal and aberrant flowers provided a model for kiwifruit flower development.
Resumo:
Biopanning of phage-displayed random peptide libraries is a powerful technique for identifying peptides that mimic epitopes (mimotopes) for monoclonal antibodies (mAbs). However, peptides derived using polyclonal antisera may represent epitopes for a diverse range of antibodies. Hence following screening of phage libraries with polyclonal antisera, including autoimmune disease sera, a procedure is required to distinguish relevant from irrelevant phagotopes. We therefore applied the multiple sequence alignment algorithm PILEUP together with a matrix for scoring amino acid substitutions based on physicochemical properties to generate guide trees depicting relatedness of selected peptides. A random heptapeptide library was biopanned nine times using no selecting antibodies, immunoglobulin G (IgG) from sera of subjects with autoimmune diseases (primary biliary cirrhosis (PBC) and type 1 diabetes) and three murine ascites fluids that contained mAbs to overlapping epitope(s) on the Ross River Virus envelope protein 2. Peptides randomly sampled from the library were distributed throughout the guide tree of the total set of peptides whilst many of the peptides derived in the absence of selecting antibody aligned to a single cluster. Moreover peptides selected by different sources of IgG aligned to separate clusters, each with a different amino acid motif. These alignments were validated by testing all of the 53 phagotopes derived using IgG from PBC sera for reactivity by capture ELISA with antibodies affinity purified on the E2 subunit of the pyruvate dehydrogenase complex (PDC-E2), the major autoantigen in PBC: only those phagotopes that aligned to PBC-associated clusters were reactive. Hence the multiple sequence alignment procedure discriminates relevant from irrelevant phagotopes and thus a major difficulty with biopanning phage-displayed random peptide libraries with polyclonal antibodies is surmounted.
Resumo:
Perez-Losada et al. [1] analyzed 72 complete genomes corresponding to nine mammalian (67 strains) and 2 avian (5 strains) polyomavirus species using maximum likelihood and Bayesian methods of phylogenetic inference. Because some data of 2 genomes in their work are now not available in GenBank, in this work, we analyze the phylogenetic relationship of the remaining 70 complete genomes corresponding to nine mammalian (65 strains) and two avian (5 strains) polyomavirus species using a dynamical language model approach developed by our group (Yu et al., [26]). This distance method does not require sequence alignment for deriving species phylogeny based on overall similarities of the complete genomes. Our best tree separates the bird polyomaviruses (avian polyomaviruses and goose hemorrhagic polymaviruses) from the mammalian polyomaviruses, which supports the idea of splitting the genus into two subgenera. Such a split is consistent with the different viral life strategies of each group. In the mammalian polyomavirus subgenera, mouse polyomaviruses (MPV), simian viruses 40 (SV40), BK viruses (BKV) and JC viruses (JCV) are grouped as different branches as expected. The topology of our best tree is quite similar to that of the tree constructed by Perez-Losada et al.
Resumo:
Background The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. Results In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. Conclusion A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.
Resumo:
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star
Resumo:
The P0 protein of poleroviruses and P1 protein of sobemoviruses suppress the plant's RNA silencing machinery. Here we identified a silencing suppressor protein (SSP), P0PE, in the Enamovirus Pea enation mosaic virus-1 (PEMV-1) and showed that it and the P0s of poleroviruses Potato leaf roll virus and Cereal yellow dwarf virus have strong local and systemic SSP activity, while the P1 of Sobemovirus Southern bean mosaic virus supresses systemic silencing. The nuclear localized P0PE has no discernable sequence conservation with known SSPs, but proved to be a strong suppressor of local silencing and a moderate suppressor of systemic silencing. Like the P0s from poleroviruses, P0PE destabilizes AGO1 and this action is mediated by an F-box-like domain. Therefore, despite the lack of any sequence similarity, the poleroviral and enamoviral SSPs have a conserved mode of action upon the RNA silencing machinery. © 2012 Elsevier Inc.
Resumo:
Two BRCA2-like sequences are present in the Arabidopsis genome. Both genes are expressed in flower buds and encode nearly identical proteins, which contain four BRC motifs. In a yeast two-hybrid assay, the Arabidopsis Brca2 proteins interact with Rad51 and Dmc1. RNAi constructs aimed at silencing the BRCA2 genes at meiosis triggered a reproducible sterility phenotype, which was associated with dramatic meiosis alterations. We obtained the same phenotype upon introduction of RNAi constructs aimed at silencing the RAD51 gene at meiosis in dmc1 mutant plants. The meiotic figures we observed strongly suggest that homologous recombination is highly disturbed in these meiotic cells, leaving aberrant recombination events to repair the meiotic double-strand breaks. The 'brca2' meiotic phenotype was eliminated in spo11 mutant plants. Our experiments point to an essential role of Brca2 at meiosis in Arabidopsis. We also propose a role for Rad51 in the dmc1 context.
Resumo:
Coleoptera is the most diverse group of insects with over 360,000 described species divided into four suborders: Adephaga, Archostemata, Myxophaga, and Polyphaga. In this study, we present six new complete mitochondrial genome (mtgenome) descriptions, including a representative of each suborder, and analyze the evolution of mtgenomes from a comparative framework using all available coleopteran mtgenomes. We propose a modification of atypical cox1 start codons based on sequence alignment to better reflect the conservation observed across species as well as findings of TTG start codons in other genes. We also analyze tRNA-Ser(AGN) anticodons, usually GCU in arthropods, and report a conserved UCU anticodon as a possible synapomorphy across Polyphaga. We further analyze the secondary structure of tRNA-Ser(AGN) and present a consensus structure and an updated covariance model that allows tRNAscan-SE (via the COVE software package) to locate and fold these atypical tRNAs with much greater consistency. We also report secondary structure predictions for both rRNA genes based on conserved stems. All six species of beetle have the same gene order as the ancestral insect. We report noncoding DNA regions, including a small gap region of about 20 bp between tRNA-Ser(UCN) and nad1 that is present in all six genomes, and present results of a base composition analysis.
Resumo:
Bahia grass, Paspalum notatum, is an important pollen allergen source with a long season of pollination and wide distribution in subtropical and temperate regions. We aimed to characterize the 55. kDa allergen of Bahia grass pollen (BaGP) and ascertain its clinical importance. BaGP extract was separated by 2D-PAGE and immunoblotted with serum IgE of a grass pollen-allergic patient. The amino-terminal protein sequence of the predominant allergen isoform at 55. kDa had similarity with the group 13 allergens of Timothy grass and maize pollen, Phl p 13 and Zea m 13. Four sequences obtained by rapid amplification of the allergen cDNA ends represented multiple isoforms of Pas n 13. The predicted full length cDNA for Pas n 13 encoded a 423 amino acid glycoprotein including a signal peptide of 28 residues and with a predicted pI of 7.0. Tandem mass spectrometry of tryptic peptides of 2D gel spots identified peptides specific to the deduced amino acid sequence for each of the four Pas n 13 cDNA, representing 47% of the predicted mature protein sequence of Pas n 13. There was 80.6% and 72.6% amino acid identity with Zea m 13 and Phl p 13, respectively. Reactivity with a Phl p 13-specific monoclonal antibody AF6 supported designation of this allergen as Pas n 13. The allergen was purified from BaGP extract by ammonium sulphate precipitation, hydrophobic interaction and size exclusion chromatography. Purified Pas n 13 reacted with serum IgE of 34 of 71 (48%) grass pollen-allergic patients and specifically inhibited IgE reactivity with the 55. kDa band of BaGP for two grass pollen-allergic donors. Four isoforms of Pas n 13 from pI 6.3-7.8 had IgE-reactivity with grass pollen allergic sera. The allergenic activity of purified Pas n 13 was demonstrated by activation of basophils from whole blood of three grass pollen-allergic donors tested but not control donors. Pas n 13 is thus a clinically relevant pollen allergen of the subtropical Bahia grass likely to be important in eliciting seasonal allergic rhinitis and asthma in grass pollen-allergic patients.
Resumo:
Background: IgE is the pivotal-specific effector molecule of allergic reactions yet it remains unclear whether the elevated production of IgE in atopic individuals is due to superantigen activation of B cell populations, increased antibody class switching to IgE or oligoclonal allergen-driven IgE responses. Objectives: To increase our understanding of the mechanisms driving IgE responses in allergic disease we examined immunoglobulin variable regions of IgE heavy chain transcripts from three patients with seasonal rhinitis due to grass pollen allergy. Methods: Variable domain of heavy chain-epsilon constant domain 1 cDNAs were amplified from peripheral blood using a two-step semi-nested PCR, cloned and sequenced. Results: The VH gene family usage in subject A was broadly based, but there were two clusters of sequences using genes VH 3-9 and 3-11 with unusually low levels of somatic mutations, 0-3%. Subject B repeatedly used VH 1-69 and subject C repeatedly used VH 1-02, 1-46 and 5a genes. Most clones were highly mutated being only 86-95% homologous to their germline VH gene counterparts and somatic mutations were more abundant at the complementarity determining rather than framework regions. Multiple sequence alignment revealed both repeated use of particular VH genes as well as clonal relatedness among clusters of IgE transcripts. Conclusion: In contrast to previous studies we observed no preferred VH gene common to IgE transcripts of the three subjects allergic to grass pollen. Moreover, most of the VH gene characteristics of the IgE transcripts were consistent with oligoclonal antigen-driven IgE responses.
Resumo:
The major diabetes autoantigen, glutamic acid decarboxylase (GAD65), contains a region of sequence similarity, including six identical residues PEVKEK, to the P2C protein of coxsackie B virus, suggesting that cross-reactivity between coxsackie B virus and GAD65 can initiate autoimmune diabetes. We used the human islet cell mAbs MICA3 and MICA4 to identify the Ab epitopes of GAD65 by screening phage-displayed random peptide libraries. The identified peptide sequences could be mapped to a homology model of the pyridoxal phosphate (PLP) binding domain of GAD65. For MICA3, a surface loop containing the sequence PEVKEK and two adjacent exposed helixes were identified in the PLP binding domain as well as a region of the C terminus of GAD65 that has previously been identified as critical for MICA3 binding. To confirm that the loop containing tile PEVKEK sequence contributes to the MICA3 epitope, this loop was deleted by mutagenesis. This reduced binding of MICA3 by 70%. Peptide sequences selected using MICA4 were rich in basic or hydroxyl-containing amino acids, and the surface of the GAD65 PLP-binding domain surrounding Lys358, which is known to be critical for MICA4 binding, was likewise rich in these amino acids. Also, the two phage most reactive width MICA4 encoded the motif VALxG, and the reverse of this sequence, LAV, was located in this same region. Thus, we have defined the MICA3 and MICA4 epitopes on GAD65 using the combination of phage display, molecular modeling, and mutagenesis and have provided compelling evidence for the involvement of the PEVKEK loop in the MICA3 epitope.
Resumo:
Ross River virus (RRV) is the predominant cause of epidemic polyarthritis in Australia, yet the antigenic determinants are not well defined. We aimed to characterize epitope(s) on RRV-E2 for a panel of monoclonal antibodies (MAbs) that recognize overlapping conformational epitopes on the E2 envelope protein of RRV and that neutralize virus infection of cells in vitro. Phage-displayed random peptide libraries were probed with the MAbs T1E7, NB3C4, and T10C9 using solution-phase and solid-phase biopanning methods. The peptides VSIFPPA and KTAISPT were selected 15 and 6 times, respectively, by all three of the MAbs using solution-phase biopanning. The peptide LRLPPAP was selected 8 times by NB3C4 using solid-phase biopanning; this peptide shares a trio of amino acids with the peptide VSIFPPA. Phage that expressed the peptides VSIFPPA and LRLPPAP were reactive with T1E7 and/or NB3C4, and phage that expressed the peptides VSIFPPA, LRLPPAP, and KTAISPT partially inhibited the reactivity of T1E7 with RRV. The selected peptides resemble regions of RRV-E2 adjacent to sites mutated in neutralization escape variants of RRV derived by culture in the presence of these MAbs (E2 210-219 and 238-245) and an additional region of E2 172-182. Together these sites represent a conformational epitope of E2 that is informative of cellular contact sites on RRV.