888 resultados para Genome Sequence
Resumo:
Genetic recombination is a fundamental evolutionary mechanism promoting biological adaptation. Using engineered recombinants of the small single-stranded DNA plant virus, Maize streak virus (MSV), we experimentally demonstrate that fragments of genetic material only function optimally if they reside within genomes similar to those in which they evolved. The degree of similarity necessary for optimal functionality is correlated with the complexity of intragenomic interaction networks within which genome fragments must function. There is a striking correlation between our experimental results and the types of MSV recombinants that are detectable in nature, indicating that obligatory maintenance of intragenome interaction networks strongly constrains the evolutionary value of recombination for this virus and probably for genomes in general.
Resumo:
The main cis-acting control regions for replication of the single-stranded DNA genome of maize streak virus (MSV) are believed to reside within an approximately 310 nt long intergenic region (LIR). However, neither the minimum LIR sequence required nor the sequence determinants of replication specificity have been determined experimentally. There are iterated sequences, or iterons, both within the conserved inverted-repeat sequences with the potential to form a stem-loop structure at the origin of virion-strand replication, and upstream of the rep gene TATA box (the rep-proximal iteron or RPI). Based on experimental analyses of similar iterons in viruses from other geminivirus genera and their proximity to known Rep-binding sites in the distantly related mastrevirus wheat dwarf virus, it has been hypothesized that the iterons may be Rep-binding and/or -recognition sequences. Here, a series of LIR deletion mutants was used to define the upper bounds of the LIR sequence required for replication. After identifying MSV strains and distinct mastreviruses with incompatible replication-specificity determinants (RSDs), LIR chimaeras were used to map the primary MSV RSD to a 67 nt sequence containing the RPI. Although the results generally support the prevailing hypothesis that MSV iterons are functional analogues of those found in other geminivirus genera, it is demonstrated that neither the inverted-repeat nor RPI sequences are absolute determinants of replication specificity. Moreover, widely divergent mastreviruses can trans-replicate one another. These results also suggest that sequences in the 67 nt region surrounding the RPI interact in a sequence-specific manner with those of the inverted repeat.
Resumo:
Psittacine beak and feather disease (PBFD) has a broad host range and is widespread in wild and captive psittacine populations in Asia, Africa, the Americas, Europe and Australasia. Beak and feather disease circovirus (BFDV) is the causative agent. BFDV has an ~2 kb single stranded circular DNA genome encoding just two proteins (Rep and CP). In this study we provide support for demarcation of BFDV strains by phylogenetic analysis of 65 complete genomes from databases and 22 new BFDV sequences isolated from infected psittacines in South Africa. We propose 94% genome-wide sequence identity as a strain demarcation threshold, with isolates sharing > 94% identity belonging to the same strain, and strain subtypes sharing> 98% identity. Currently, BFDV diversity falls within 14 strains, with five highly divergent isolates from budgerigars probably representing a new species of circovirus with three strains (budgerigar circovirus; BCV-A, -B and -C). The geographical distribution of BFDV and BCV strains is strongly linked to the international trade in exotic birds; strains with more than one host are generally located in the same geographical area. Lastly, we examined BFDV and BCV sequences for evidence of recombination, and determined that recombination had occurred in most BFDV and BCV strains. We established that there were two globally significant recombination hotspots in the viral genome: the first is along the entire intergenic region and the second is in the C-terminal portion of the CP ORF. The implications of our results for the taxonomy and classification of circoviruses are discussed. © 2011 SGM.
Resumo:
The African streak viruses (AfSVs) are a diverse group of mastrevirus species (family Geminiviridae) that infect a wide variety of annual and perennial grass species across the African continent and its nearby Indian Ocean islands. Six AfSV species (of which maize streak virus is the best known) have been described. Here we report the full genome sequences of eight isolates of a seventh AfSV species: Urochloa streak virus (USV), sampled from various locations in Nigeria. Despite there being good evidence of recombination in many other AfSV species, we found no convincing evidence that any of the USV sequences were either inter- or intra-species recombinants. The USV isolates, all of which appear to be variants of the same strain (their genome sequences are all more than 98% identical), share less than 69% nucleotide sequence identity with other currently described AfSV species. © 2008 Springer-Verlag.
Resumo:
Background. One of the promising avenues for development of vaccines against Human immunodeficiency virus type 1 (HIV-1) and other human pathogens is the use of plasmid-based DNA vaccines. However, relatively large doses of plasmid must be injected for a relatively weak response. We investigated whether genome elements from Porcine circovirus type 1 (PCV-1), an apathogenic small ssDNA-containing virus, had useful expression-enhancing properties that could allow dose-sparing in a plasmid vaccine. Results. The linearised PCV-1 genome inserted 5' of the CMV promoter in the well-characterised HIV-1 plasmid vaccine pTHgrttnC increased expression of the polyantigen up to 2-fold, and elicited 3-fold higher CTL responses in mice at 10-fold lower doses than unmodified pTHgrttnC. The PCV-1 capsid gene promoter (Pcap) alone was equally effective. Enhancing activity was traced to a putative composite host transcription factor binding site and a "Conserved Late Element" transcription-enhancing sequence previously unidentified in circoviruses. Conclusions. We identified a novel PCV-1 genome-derived enhancer sequence that significantly increased antigen expression from plasmids in in vitro assays, and improved immunogenicity in mice of the HIV-1 subtype C vaccine plasmid, pTHgrttnC. This should allow significant dose sparing of, or increased responses to, this and other plasmid-based vaccines. We also report investigations of the potential of other circovirus-derived sequences to be similarly used. © 2011 Tanzer et al; licensee BioMed Central Ltd.
Resumo:
Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.
Resumo:
Five Canadian high school Chemistry classes in one school, taught by three different teachers, studied the concepts of dynamic chemical equilibria and Le Chatelier’s Principle. Some students received traditional teacher-led explanations of the concept first and used an interactive scientific visualisation second, while others worked with the visualisation first and received the teacher-led explanation second. Students completed a test of their conceptual understanding of the relevant concepts prior to instruction, after the first instructional session and at the end of instruction. Data on students’ academic achievement (highest, middle or lowest third of the class on the mid-term exam) and gender were also collected to explore the relationship between these factors, conceptual development and instructional sequencing. Results show, within this context at least, that teaching sequence is not important in terms of students’ conceptual learning gains.
Resumo:
Genome-wide association studies (GWAS) have identified multiple common genetic variants associated with an increased risk of prostate cancer (PrCa), but these explain less than one-third of the heritability. To identify further susceptibility alleles, we conducted a meta-analysis of four GWAS including 5953 cases of aggressive PrCa and 11 463 controls (men without PrCa). We computed association tests for approximately 2.6 million SNPs and followed up the most significant SNPs by genotyping 49 121 samples in 29 studies through the international PRACTICAL and BPC3 consortia. We not only confirmed the association of a PrCa susceptibility locus, rs11672691 on chromosome 19, but also showed an association with aggressive PrCa [odds ratio = 1.12 (95% confidence interval 1.03-1.21), P = 1.4 × 10(-8)]. This report describes a genetic variant which is associated with aggressive PrCa, which is a type of PrCa associated with a poorer prognosis.
Resumo:
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star
Resumo:
Approximately 2500 fly species comprise the Sarcophagidae family worldwide. The complete mitochondrial genome of the carrion-breeding, forensically important Sarcophaga impatiens Walker (Diptera: Sarcophagidae) from Australia was sequenced. The 15,169 bp circular genome contains the 37 genes found in a typical Metazoan genome: 13 protein-coding genes, 2 ribosomal RNA genes and 22 transfer RNA genes. It also contains one non-coding A+T-rich region. The arrangement of the genes was the same as that found in the ancestral insect. All the protein initiation codons are ATN, except for cox1 that begins with TCG (encoding S). The 22 tRNA anticodons of S. impatiens are consistent with those observed in Drosophila yakuba, and all form the typical cloverleaf structure, except for tRNA-Ser(AGN) that lacks the DHU arm. The mitochondrial genome of Sarcophaga presented will be valuable for resolving phylogenetic relationships within the family Sarcophagidae and the order Diptera, and could be used to identify favourable genetic markers for species identifications for forensic purposes.
Resumo:
A genome-wide search for markers associated with BSE incidence was performed by using Transmission-Disequilibrium Tests (TDTs). Significant segregation distortion, i.e., unequal transmission probabilities of alleles within a locus, was found for three marker loci on Chromosomes (Chrs) 5, 10, and 20. Although TDTs are robust to false associations owing to hidden population substructures, it cannot distinguish segregation distortion caused by a true association between a marker and bovine spongiform encephalopathy (BSE) from a population-wide distortion. An interaction test and a segregation distortion analysis in half-sib controls were used to disentangle these two alternative hypotheses. None of the markers showed any significant interaction between allele transmission rates and disease status, and only the marker on Chr 10 showed a significant segregation distortion in control individuals. Nevertheless, the control group may have been a mixture of resistant and susceptible but unchallenged individuals. When new genotypes were generated in the vicinity of these three markers, evidence for an association with BSE was confirmed for the locus on Chr 5.
Resumo:
Phylogenetic inference from sequences can be misled by both sampling (stochastic) error and systematic error (nonhistorical signals where reality differs from our simplified models). A recent study of eight yeast species using 106 concatenated genes from complete genomes showed that even small internal edges of a tree received 100% bootstrap support. This effective negation of stochastic error from large data sets is important, but longer sequences exacerbate the potential for biases (systematic error) to be positively misleading. Indeed, when we analyzed the same data set using minimum evolution optimality criteria, an alternative tree received 100% bootstrap support. We identified a compositional bias as responsible for this inconsistency and showed that it is reduced effectively by coding the nucleotides as purines and pyrimidines (RY-coding), reinforcing the original tree. Thus, a comprehensive exploration of potential systematic biases is still required, even though genome-scale data sets greatly reduce sampling error.
Resumo:
Complementary sequences at the 5′ and 3′ ends of the dengue virus RNA genome are essential for viral replication, and are believed to cyclise the genome through long-range base pairing in cis. Although consistent with evidence in the literature, this view neglects possible biologically active multimeric forms that are equally consistent with the data. Here, we propose alternative multimeric structures, and suggest that multigenome noncovalent concatemers are more likely to exist under cellular conditions than single cyclised monomers. Concatemers provide a plausible mechanism for the dengue virus to overcome the single-stranded (+)-sense RNA virus dilemma, and can potentially assist genome transport from the virus-induced vesicles into the cytosol.
Resumo:
Originally developed in bioinformatics, sequence analysis is being increasingly used in social sciences for the study of life-course processes. The methodology generally employed consists in computing dissimilarities between the trajectories and, if typologies are sought, in clustering the trajectories according to their similarities or dissemblances. The choice of an appropriate dissimilarity measure is a major issue when dealing with sequence analysis for life sequences. Several dissimilarities are available in the literature, but neither of them succeeds to become indisputable. In this paper, instead of deciding upon one dissimilarity measure, we propose to use an optimal convex combination of different dissimilarities. The optimality is automatically determined by the clustering procedure and is defined with respect to the within-class variance.