156 resultados para sequence similarity searches
em Queensland University of Technology - ePrints Archive
Resumo:
Background The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. Results RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene. Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. Conclusions This transcriptomic dataset is a useful resource for molecular genetic studies of the koala, for evolutionary genetic studies of marsupials, for validation and annotation of the koala genome sequence, and for investigation of koala retrovirus. Annotated transcripts can be browsed and queried at http://koalagenome.org
Resumo:
The function of a protein can be partially determined by the information contained in its amino acid sequence. It can be assumed that proteins with similar amino acid sequences normally have closer functions. Hence analysing the similarity of proteins has become one of the most important areas of protein study. In this work, a layered comparison method is used to analyze the similarity of proteins. It is based on the empirical mode decomposition (EMD) method, and protein sequences are characterized by the intrinsic mode functions (IMFs). The similarity of proteins is studied with a new cross-correlation formula. It seems that the EMD method can be used to detect the functional relationship of two proteins. This kind of similarity method is a complement of traditional sequence similarity approaches which focus on the alignment of amino acids
Resumo:
Background: Biomineralization is a process encompassing all mineral containing tissues produced within an organism. One of the most dynamic examples of this process is the formation of the mollusk shell, comprising a variety of crystal phases and microstructures. The organic component incorporated within the shell is said to dictate this architecture. However general understanding of how this process is achieved remains ambiguous. The mantle is a conserved organ involved in shell formation throughout molluscs. Specifically the mantle is thought to be responsible for secreting the protein component of the shell. This study employs molecular approaches to determine the spatial expression of genes within the mantle tissue to further the elucidation of the shell biomineralization. Results: A microarray platform was custom generated (PmaxArray 1.0) from the pearl oyster Pinctada maxima. PmaxArray 1.0 consists of 4992 expressed sequence tags (ESTs) originating from mantle tissue. This microarray was used to analyze the spatial expression of ESTs throughout the mantle organ. The mantle was dissected into five discrete regions and analyzed for differential gene expression with PmaxArray 1.0. Over 2000 ESTs were determined to be differentially expressed among the tissue sections, identifying five major expression regions. In situ hybridization validated and further localized the expression for a subset of these ESTs. Comparative sequence similarity analysis of these ESTs revealed a number of the transcripts were novel while others showed significant sequence similarities to previously characterized shell related genes.
Resumo:
Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.
Resumo:
The P0 protein of poleroviruses and P1 protein of sobemoviruses suppress the plant's RNA silencing machinery. Here we identified a silencing suppressor protein (SSP), P0PE, in the Enamovirus Pea enation mosaic virus-1 (PEMV-1) and showed that it and the P0s of poleroviruses Potato leaf roll virus and Cereal yellow dwarf virus have strong local and systemic SSP activity, while the P1 of Sobemovirus Southern bean mosaic virus supresses systemic silencing. The nuclear localized P0PE has no discernable sequence conservation with known SSPs, but proved to be a strong suppressor of local silencing and a moderate suppressor of systemic silencing. Like the P0s from poleroviruses, P0PE destabilizes AGO1 and this action is mediated by an F-box-like domain. Therefore, despite the lack of any sequence similarity, the poleroviral and enamoviral SSPs have a conserved mode of action upon the RNA silencing machinery. © 2012 Elsevier Inc.
Resumo:
Phloridzin is the predominant polyphenol in apple (Malus× domestica Borkh.) where it accumulates to high concentrations in many tissues including the leaves, bark, roots and fruit. Despite its relative abundance in apple the biosynthesis of phloridzin and other related dihydrochalcones remains only partially understood. The key unidentified enzyme in phloridzin biosynthesis is a putative carbon double bond reductase which is thought to act on p-coumaroyl-CoA to produce the dihydro p-coumaroyl-CoA precursor. A functional screen of six apple enoyl reductase-like (ENRL) genes was carried out using transient infiltration into tobacco and gene silencing by RNA interference (RNAi) in order to determine carbon double bond reductase activity and contribution to foliar phloridzin concentrations. The ENRL-3 gene caused a significant increase in phloridzin concentration when infiltrated into tobacco leaves whilst a second protein ENRL-5, with over 98% amino acid sequence similarity to ENRL-3, showed p-coumaroyl-CoA reductase activity in enzyme assays. Finally, an RNAi study showed that reducing the transcript levels of ENRL-3 in transgenic 'Royal Gala' led to a 66% decrease in the concentration of dihydrochalcones in the leaves in the one available silenced line. Overall these results suggest that ENRL-3, and its close homolog ENRL-5, may contribute to the biosynthesis of phloridzin in apple.
Resumo:
Replacement of endogenous genes by homologous recombination is rare in plants; the majority of genetic modifications are the result of transforming DNA molecules undergoing random genomic insertion by way of non-homologous recombination. Factors that affect chromatin remodeling and DNA repair are thought to have the potential to enhance the frequency of homologous recombination in plants. Conventional tools to study the frequencies of genetic recombination often rely on stable transformation-based approaches, with these systems being rarely capable of high-throughput or combinatorial analysis. We developed a series of vectors that use chemiluminescent (LUC and REN) reporter genes to assay the relative frequency of homologous and non-homologous recombination in plants. These transient assay vectors were used to screen 14 candidategenes for their effects on recombination frequencies in Nicotiana benthamiana plants. Over-expression of Arabidopsis genes with sequence similarity to SNM1 from yeast and XRCC3 from humans enhanced the frequency of non-homologous recombination when assayed using two different donor vectors. Transient N. benthamiana leaf systems were also used in an alternative assay for preliminary measurements of homologous recombination frequencies, which were found to be enhanced by over-expression of RAD52, MIM and RAD51 from yeast, as well as CHR24 from Arabidopsis. The findings for the assays described here are in line with previous studies that analyzed recombination frequencies using stable transformation. The assays we report have revealed functions in non-homologous recombination for the Arabidopsis SNM1 and XRCC3 genes, so the suppression of these genes' expression offers a potential means to enhance the gene targeting frequency in plants. Furthermore, our findings also indicate that plant gene targeting frequencies could be enhanced by over-expression of RAD52, MIM, CHR24, and RAD51 genes.
Resumo:
Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..
Resumo:
Background Chlamydia pecorum is an important pathogen of domesticated livestock including sheep, cattle and pigs. This pathogen is also a key factor in the decline of the koala in Australia. We sequenced the genomes of three koala C. pecorum strains, isolated from the urogenital tracts and conjunctiva of diseased koalas. The genome of the C. pecorum VR629 (IPA) strain, isolated from a sheep with polyarthritis, was also sequenced. Results Comparisons of the draft C. pecorum genomes against the complete genomes of livestock C. pecorum isolates revealed that these strains have a conserved gene content and order, sharing a nucleotide sequence similarity > 98%. Single nucleotide polymorphisms (SNPs) appear to be key factors in understanding the adaptive process. Two regions of the chromosome were found to be accumulating a large number of SNPs within the koala strains. These regions include the Chlamydia plasticity zone, which contains two cytotoxin genes (toxA and toxB), and a 77 kbp region that codes for putative type III effector proteins. In one koala strain (MC/MarsBar), the toxB gene was truncated by a premature stop codon but is full-length in IPTaLE and DBDeUG. Another five pseudogenes were also identified, two unique to the urogenital strains C. pecorum MC/MarsBar and C. pecorum DBDeUG, respectively, while three were unique to the koala C. pecorum conjunctival isolate IPTaLE. An examination of the distribution of these pseudogenes in C. pecorum strains from a variety of koala populations, alongside a number of sheep and cattle C. pecorum positive samples from Australian livestock, confirmed the presence of four predicted pseudogenes in koala C. pecorum clinical samples. Consistent with our genomics analyses, none of these pseudogenes were observed in the livestock C. pecorum samples examined. Interestingly, three SNPs resulting in pseudogenes identified in the IPTaLE isolate were not found in any other C. pecorum strain analysed, raising questions over the origin of these point mutations. Conclusions The genomic data revealed that variation between C. pecorum strains were mainly due to the accumulation of SNPs, some of which cause gene inactivation. The identification of these genetic differences will provide the basis for further studies to understand the biology and evolution of this important animal pathogen. Keywords: Chlamydia pecorum; Single nucleotide polymorphism; Pseudogene; Cytotoxin
Resumo:
The major diabetes autoantigen, glutamic acid decarboxylase (GAD65), contains a region of sequence similarity, including six identical residues PEVKEK, to the P2C protein of coxsackie B virus, suggesting that cross-reactivity between coxsackie B virus and GAD65 can initiate autoimmune diabetes. We used the human islet cell mAbs MICA3 and MICA4 to identify the Ab epitopes of GAD65 by screening phage-displayed random peptide libraries. The identified peptide sequences could be mapped to a homology model of the pyridoxal phosphate (PLP) binding domain of GAD65. For MICA3, a surface loop containing the sequence PEVKEK and two adjacent exposed helixes were identified in the PLP binding domain as well as a region of the C terminus of GAD65 that has previously been identified as critical for MICA3 binding. To confirm that the loop containing tile PEVKEK sequence contributes to the MICA3 epitope, this loop was deleted by mutagenesis. This reduced binding of MICA3 by 70%. Peptide sequences selected using MICA4 were rich in basic or hydroxyl-containing amino acids, and the surface of the GAD65 PLP-binding domain surrounding Lys358, which is known to be critical for MICA4 binding, was likewise rich in these amino acids. Also, the two phage most reactive width MICA4 encoded the motif VALxG, and the reverse of this sequence, LAV, was located in this same region. Thus, we have defined the MICA3 and MICA4 epitopes on GAD65 using the combination of phage display, molecular modeling, and mutagenesis and have provided compelling evidence for the involvement of the PEVKEK loop in the MICA3 epitope.
Resumo:
The causes of autoimmune diseases have yet to be fully elucidated. Autoantibodies, autoreactive T cell responses, the presence of a predisposing major histocompatibility complex (MHC) haplotype and responsiveness to corticosteroids are features, and some are possibly contributory causes of autoimmune disease. The most challenging question is how autoimmune diseases are triggered. Molecular mimicry of host cell determinants by epitopes of infectious agents with ensuing cross-reactivity is one of the most popular yet still controversial theories for the initiation of autoimmune diseases [1]. Throughout the 1990s, hundreds of research articles focusing to various extents on epitope mimicry, as it is more accurately described in an immunological context, were published annually. Many of these articles presented data that were consistent with the hypothesis of mimicry but that did not actually prove the theory. Other equally convincing reports indicated that epitope mimicry was not the cause of the autoimmune disease despite sequence similarity between molecules of infectious agents and the host. Some 20 years ago, Rothman [2] proposed a model for disease causation and I have used this as a framework to examine the role of epitope mimicry in the development of autoimmune disease. The thesis of Rothman’s model is that an effect, in this instance autoimmune disease, arises as a result of a cause. In most cases, multiple-component causes contribute synergistically to yield the effect, and each of these components alone is insufficient as a cause. Logically, some component causes, such as the presence of a particular autoimmune response, are also necessary causes.
Resumo:
Epitope mimicry is the theory that an infectious agent such as a virus causes pathological effects via mimicry of host proteins and thus elicits a cross-reactive immune response to host tissues. Weise and Carnegie (1988) found a region of sequence similarity between the pol gene of the Maedi Visna virus (MVV), which induces demyelinating encephalitis in sheep, and myelin basic protein (MBP), which is known to induce experimental allergic encephalitis (EAE) in laboratory animals. In this study, cross-reactions between sera raised in sheep against synthetic peptides of MVV (TGKIPWILLPGR) and 21.5 kDa MBP (SGKVPWLKRPGR) were demonstrated using enzyme-linked immunosorbant assay (ELISA) and thin layer chromatography (TLC) immunoprobing. The antibody responses of MVV-infected sheep were investigated using ELISA against the peptides, and MBP protein, immunoprobing of the peptides on TPC plates and Western blotting against MBP. Slight significant reactions to the 21.5 kDa MBP peptide (P < 0.001) and to a lesser extent sheep MBP (P < 0.004) were detected in ELISA. The MBP peptide evoked stronger responses from more sera than the MVV peptide on immunoprobed TLC plates. On the Western blots, eight of the 23 sheep with Visna had serum reactivity to MBP. This slight reaction to MBP in MVV-infected sheep is of interest because of the immune responses to MBP evident in multiple sclerosis and EAE, but its relevance in Visna is limited since no correlation with disease severity was observed. The cell-mediated immune responses of MVV-infected sheep against similar peptides was assessed. The peptides did not stimulate proliferation of peripheral blood lymphocytes of MVV-infected sheep. Since the MVV peptide was not recognised by antibodies or T lymphocytes from MVV-infected and encephalic sheep, it was concluded that epitope mimicry of this 21.5 kDa MBP peptide by the similar MVV pol peptide was not contributing to the immunopathogenesis of Visna. The slight antibody response to MBP and the MBP peptide can be attributed to by-stander effects of the immunopathology of MVV-induced encephalitis.
Resumo:
This paper describes algorithms that can musically augment the realtime performance of electronic dance music by generating new musical material by morphing. Note sequence morphing involves the algorithmic generation of music that smoothly transitions between two existing musical segments. The potential of musical morphing in electronic dance music is outlined and previous research is summarised; including discussions of relevant music theoretic and algorithmic concepts. An outline and explanation is provided of a novel Markov morphing process that uses similarity measures to construct transition matrices. The paper reports on a ‘focus-concert’ study used to evaluate this morphing algorithm and to compare its output with performances from a professional DJ. Discussions of this trial include reflections on some of the aesthetic characteristics of note sequence morphing. The research suggests that the proposed morphing technique could be effectively used in some electronic dance music contexts.
Resumo:
We consider the problem of choosing, sequentially, a map which assigns elements of a set A to a few elements of a set B. On each round, the algorithm suffers some cost associated with the chosen assignment, and the goal is to minimize the cumulative loss of these choices relative to the best map on the entire sequence. Even though the offline problem of finding the best map is provably hard, we show that there is an equivalent online approximation algorithm, Randomized Map Prediction (RMP), that is efficient and performs nearly as well. While drawing upon results from the "Online Prediction with Expert Advice" setting, we show how RMP can be utilized as an online approach to several standard batch problems. We apply RMP to online clustering as well as online feature selection and, surprisingly, RMP often outperforms the standard batch algorithms on these problems.
Massively parallel sequencing and analysis of expressed sequence tags in a successful invasive plant
Resumo:
Background Invasive species pose a significant threat to global economies, agriculture and biodiversity. Despite progress towards understanding the ecological factors associated with plant invasions, limited genomic resources have made it difficult to elucidate the evolutionary and genetic factors responsible for invasiveness. This study presents the first expressed sequence tag (EST) collection for Senecio madagascariensis, a globally invasive plant species. Methods We used pyrosequencing of one normalized and two subtractive libraries, derived from one native and one invasive population, to generate an EST collection. ESTs were assembled into contigs, annotated by BLAST comparison with the NCBI non-redundant protein database and assigned gene ontology (GO) terms from the Plant GO Slim ontologies. Key Results Assembly of the 221 746 sequence reads resulted in 12 442 contigs. Over 50 % (6183) of 12 442 contigs showed significant homology to proteins in the NCBI database, representing approx. 4800 independent transcripts. The molecular transducer GO term was significantly over-represented in the native (South African) subtractive library compared with the invasive (Australian) library. Based on NCBI BLAST hits and literature searches, 40 % of the molecular transducer genes identified in the South African subtractive library are likely to be involved in response to biotic stimuli, such as fungal, bacterial and viral pathogens. Conclusions This EST collection is the first representation of the S. madagascariensis transcriptome and provides an important resource for the discovery of candidate genes associated with plant invasiveness. The over-representation of molecular transducer genes associated with defence responses in the native subtractive library provides preliminary support for aspects of the enemy release and evolution of increased competitive ability hypotheses in this successful invasive. This study highlights the contribution of next-generation sequencing to better understanding the molecular mechanisms underlying ecological hypotheses that are important in successful plant invasions.