963 resultados para Genome-specific Sequence


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main cis-acting control regions for replication of the single-stranded DNA genome of maize streak virus (MSV) are believed to reside within an approximately 310 nt long intergenic region (LIR). However, neither the minimum LIR sequence required nor the sequence determinants of replication specificity have been determined experimentally. There are iterated sequences, or iterons, both within the conserved inverted-repeat sequences with the potential to form a stem-loop structure at the origin of virion-strand replication, and upstream of the rep gene TATA box (the rep-proximal iteron or RPI). Based on experimental analyses of similar iterons in viruses from other geminivirus genera and their proximity to known Rep-binding sites in the distantly related mastrevirus wheat dwarf virus, it has been hypothesized that the iterons may be Rep-binding and/or -recognition sequences. Here, a series of LIR deletion mutants was used to define the upper bounds of the LIR sequence required for replication. After identifying MSV strains and distinct mastreviruses with incompatible replication-specificity determinants (RSDs), LIR chimaeras were used to map the primary MSV RSD to a 67 nt sequence containing the RPI. Although the results generally support the prevailing hypothesis that MSV iterons are functional analogues of those found in other geminivirus genera, it is demonstrated that neither the inverted-repeat nor RPI sequences are absolute determinants of replication specificity. Moreover, widely divergent mastreviruses can trans-replicate one another. These results also suggest that sequences in the 67 nt region surrounding the RPI interact in a sequence-specific manner with those of the inverted repeat.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this issue of Cancer Discovery, Guagnano and colleagues use a large and diverse annotated collection of cancer cell lines, the Cancer Cell Line Encyclopedia, to correlate whole-genome expression and genomic alteration datasets with cell line sensitivity data to the novel pan-fibroblast growth factor receptor (FGFR) inhibitor NVP-BGJ398. Their findings underscore not only the preclinical use of such cell line panels in identifying predictive biomarkers, but also the emergence of the FGFRs as valid therapeutic targets, across an increasingly broad range of malignancies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Migraine is a common neurological disease with a complex genetic aetiology. The disease affects ~12% of the Caucasian population and females are three times more likely than males to be diagnosed. In an effort to identify loci involved in migraine susceptibility, we performed a pedigree-based genome-wide association study of the isolated population of Norfolk Island, which has a high prevalence of migraine. This unique population originates from a small number of British and Polynesian founders who are descendents of the Bounty mutiny and forms a very large multigenerational pedigree (Bellis et al.; Human Genetics, 124(5):543-5542, 2008). These population genetic features may facilitate disease gene mapping strategies (Peltonen et al.; Nat Rev Genet, 1(3):182-90, 2000. In this study, we identified a high heritability of migraine in the Norfolk Island population (h (2) = 0.53, P = 0.016). We performed a pedigree-based GWAS and utilised a statistical and pathological prioritisation approach to implicate a number of variants in migraine. An SNP located in the zinc finger protein 555 (ZNF555) gene (rs4807347) showed evidence of statistical association in our Norfolk Island pedigree (P = 9.6 × 10(-6)) as well as replication in a large independent and unrelated cohort with >500 migraineurs. In addition, we utilised a biological prioritisation to implicate four SNPs, in within the ADARB2 gene, two SNPs within the GRM7 gene and a single SNP in close proximity to a HTR7 gene. Association of SNPs within these neurotransmitter-related genes suggests a disrupted serotoninergic system that is perhaps specific to the Norfolk Island pedigree, but that might provide clues to understanding migraine more generally.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The high risk of metabolic disease traits in Polynesians may be partly explained by elevated prevalence of genetic variants involved in energy metabolism. The genetics of Polynesian populations has been shaped by island hoping migration events which have possibly favoured thrifty genes. The aim of this study was to sequence the mitochondrial genome in a group of Maoris in an effort to characterise genome variation in this Polynesian population for use in future disease association studies. We sequenced the complete mitochondrial genomes of 20 non-admixed Maori subjects using Affymetrix technology. DNA diversity analyses showed the Maori group exhibited reduced mitochondrial genome diversity compared to other worldwide populations, which is consistent with historical bottleneck and founder effects. Global phylogenetic analysis positioned these Maori subjects specifically within mitochondrial haplogroup - B4a1a1. Interestingly, we identified several novel variants that collectively form new and unique Maori motifs – B4a1a1c, B4a1a1a3 and B4a1a1a5. Compared to ancestral populations we observed an increased frequency of non-synonymous coding variants of several mitochondrial genes in the Maori group, which may be a result of positive selection and/or genetic drift effects. In conclusion, this study reports the first complete mitochondrial genome sequence data for a Maori population. Overall, these new data reveal novel mitochondrial genome signatures in this Polynesian population and enhance the phylogenetic picture of maternal ancestry in Oceania. The increased frequency of several mitochondrial coding variants makes them good candidates for future studies aimed at assessment of metabolic disease risk in Polynesian populations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Migraine is a common, heterogeneous and heritable neurological disorder. Its pathophysiology is incompletely understood, and its genetic influences at the population level are unknown. In a population-based genome-wide analysis including 5,122 migraineurs and 18,108 non-migraineurs, rs2651899 (1p36.32, PRDM16), rs10166942 (2q37.1, TRPM8) and rs11172113 (12q13.3, LRP1) were among the top seven associations (P < 5 × 10(-6)) with migraine. These SNPs were significant in a meta-analysis among three replication cohorts and met genome-wide significance in a meta-analysis combining the discovery and replication cohorts (rs2651899, odds ratio (OR) = 1.11, P = 3.8 × 10(-9); rs10166942, OR = 0.85, P = 5.5 × 10(-12); and rs11172113, OR = 0.90, P = 4.3 × 10(-9)). The associations at rs2651899 and rs10166942 were specific for migraine compared with non-migraine headache. None of the three SNP associations was preferential for migraine with aura or without aura, nor were any associations specific for migraine features. TRPM8 has been the focus of neuropathic pain models, whereas LRP1 modulates neuronal glutamate signaling, plausibly linking both genes to migraine pathophysiology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The transient leaf assay in Nicotiana benthamiana is widely used in plant sciences, with one application being the rapid assembly of complex multigene pathways that produce new fatty acid profiles. This rapid and facile assay would be further improved if it were possible to simultaneously overexpress transgenes while accurately silencing endogenes. Here, we report a draft genome resource for N. benthamiana spanning over 75% of the 3.1 Gb haploid genome. This resource revealed a two-member NbFAD2 family, NbFAD2.1 and NbFAD2.2, and quantitative RT-PCR (qRT-PCR) confirmed their expression in leaves. FAD2 activities were silenced using hairpin RNAi as monitored by qRT-PCR and biochemical assays. Silencing of endogenous FAD2 activities was combined with overexpression of transgenes via the use of the alternative viral silencing-suppressor protein, V2, from Tomato yellow leaf curl virus. We show that V2 permits maximal overexpression of transgenes but, crucially, also allows hairpin RNAi to operate unimpeded. To illustrate the efficacy of the V2-based leaf assay system, endogenous lipids were shunted from the desaturation of 18:1 to elongation reactions beginning with 18:1 as substrate. These V2-based leaf assays produced ~50% more elongated fatty acid products than p19-based assays. Analyses of small RNA populations generated from hairpin RNAi against NbFAD2 confirm that the siRNA population is dominated by 21 and 22 nt species derived from the hairpin. Collectively, these new tools expand the range of uses and possibilities for metabolic engineering in transient leaf assays. © 2012 Naim et al.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Nicotiana benthamiana has been widely used for transient gene expression assays and as a model plant in the study of plant-microbe interactions, lipid engineering and RNA silencing pathways. Assembling the sequence of its transcriptome provides information that, in conjunction with the genome sequence, will facilitate gaining insight into the plant's capacity for high-level transient transgene expression, generation of mobile gene silencing signals, and hyper-susceptibility to viral infection. Methodology/Results: RNA-seq libraries from 9 different tissues were deep sequenced and assembled, de novo, into a representation of the transcriptome. The assembly, of16GB of sequence, yielded 237,340 contigs, clustering into 119,014 transcripts (unigenes). Between 80 and 85% of reads from all tissues could be mapped back to the full transcriptome. Approximately 63% of the unigenes exhibited a match to the Solgenomics tomato predicted proteins database. Approximately 94% of the Solgenomics N. benthamiana unigene set (16,024 sequences) matched our unigene set (119,014 sequences). Using homology searches we identified 31 homologues that are involved in RNAi-associated pathways in Arabidopsis thaliana, and show that they possess the domains characteristic of these proteins. Of these genes, the RNA dependent RNA polymerase gene, Rdr1, is transcribed but has a 72 nt insertion in exon1 that would cause premature termination of translation. Dicer-like 3 (DCL3) appears to lack both the DEAD helicase motif and second dsRNA binding motif, and DCL2 and AGO4b have unexpectedly high levels of transcription. Conclusions: The assembled and annotated representation of the transcriptome and list of RNAi-associated sequences are accessible at www.benthgenome.com alongside a draft genome assembly. These genomic resources will be very useful for further study of the developmental, metabolic and defense pathways of N. benthamiana and in understanding the mechanisms behind the features which have made it such a well-used model plant. © 2013 Nakasugi et al.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Forward genetic screens have identified numerous genes involved in development and metabolism, and remain a cornerstone of biological research. However, to locate a causal mutation, the practice of crossing to a polymorphic background to generate a mapping population can be problematic if the mutant phenotype is difficult to recognize in the hybrid F2 progeny, or dependent on parental specific traits. Here in a screen for leaf hyponasty mutants, we have performed a single backcross of an Ethane Methyl Sulphonate (EMS) generated hyponastic mutant to its parent. Whole genome deep sequencing of a bulked homozygous F2 population and analysis via the Next Generation EMS mutation mapping pipeline (NGM) unambiguously determined the causal mutation to be a single nucleotide polymorphisim (SNP) residing in HASTY, a previously characterized gene involved in microRNA biogenesis. We have evaluated the feasibility of this backcross approach using three additional SNP mapping pipelines; SHOREmap, the GATK pipeline, and the samtools pipeline. Although there was variance in the identification of EMS SNPs, all returned the same outcome in clearly identifying the causal mutation in HASTY. The simplicity of performing a single parental backcross and genome sequencing a small pool of segregating mutants has great promise for identifying mutations that may be difficult to map using conventional approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The nucleotide sequence of the genomic RNA of barley yellow dwarf virus, PAV serotype was determined except for the 5′-terminal base, and its genome organization deduced. The 5,677 nucleotide genome contains five large open reading frames (ORFs). The genes for the coat protein (1) and the putative viral RNA-dependent RNA polymerase were identified. The latter shows a striking degree of similarity to that of carnation mottle virus (CarMV). By comparison with corona- and retrovirus RNAs, it is proposed that a translational frameshift is involved in expression of the polymerase. An ORF encoding an Mr 49,797 protein (50K ORF) may be translated by in-frame readthrough of the coat protein stop codon. The coat protein, an overlapping 17K ORF, and a 3′ 6.7K ORF are likely to be expressed via subgenomic mRNAs. © 1988 IRL Press Limited.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The complete nucleotide sequence of Subterranean clover mottle virus (SCMoV) genomic RNA has been determined. The SCMoV genome is 4,258 nucleotides in length. It shares most nucleotide and amino acid sequence identity with the genome of Lucerne transient streak virus (LTSV). SCMoV RNA encodes four overlapping open reading frames and has a genome organisation similar to that of Cocksfoot mottle virus (CfMV). ORF1 and ORF4 are predicted to encode single proteins. ORF2 is predicted to encode two proteins that are derived from a -1 translational frameshift between two overlapping reading frames (ORF2a and ORF2b). A search of amino acid databases did not find a significant match for ORF1 and the function of this protein remains unclear. ORF2a contains a motif typical of chymotrypsin-like serine proteases and ORF2b has motifs characteristically present in positive-stranded RNA-dependent RNA polymerases. ORF4 is likely to be expressed from a subgenomic RNA and encodes the viral coat protein. The ORF2a/ORF2b overlapping gene expression strategy used by SCMoV and CfMV is similar to that of the poleroviruses and differ from that of other published sobemoviruses. These results suggest that the sobemoviruses could now be divided into two distinct subgroups based on those that express the RNA-dependent RNA polymerase from a single, in-frame polyprotein, and those that express it via a -1 translational frameshifting mechanism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The complete nucleotide sequence of genome segment S4 of rice ragged stunt oryzavirus (RRSV, Thai-isolate) was determined. The 3823 bp sequence contains two large open reading frames (ORFs). ORF1, spanning nucleotides 12 to 3776, is capable of encoding a protein of M(r) 141,380 (P4a). The P4a amino acid sequence predicted from the nucleotide sequence contains sequence motifs conserved in RNA-dependent RNA polymerases (RDRPs). When compared for evolutionary relationships with RDRPs of other reoviruses using the amino acid sequences around the conserved GDD motif, P4a was shown to be more related to Nilaparvata lugens reovirus and reovirus serotype 3 than to rice dwarf phytoreovirus, bovine rotavirus or bluetongue virus. The ORF2, spanning nucleotides 491 to 1468, is out of frame with ORF1 and is capable of encoding a protein of 36, 920 (P4b). Coupled in vitro transcription-translation from cloned ORF2 in wheat germ extract confirmed the existence of ORF2 but in vivo production and possible function of P4b is yet to be determined.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The nucleotide sequence of DNA complementary to rice ragged stunt oryzavirus (RRSV) genome segment 8 (S8) of an isolate from Thailand was determined. RRSV S8 is 1 914 bp in size and contains a single large open reading frame (ORF) spanning nucleotides 23 to 1 810 which is capable of encoding a protein of M(r) 67 348. The N-terminal amino acid sequence of a ~43K virion polypeptide matched to that inferred for an internal region of the S8 coding sequence. These data suggest that the 43K protein is encoded by S8 and is derived by a proteolytic cleavage. Predicted polypeptide sizes from this possible cleavage of S8 protein are 26K and 42K. Polyclonal antibodies raised against a maltose binding protein (MBP)-S8 fusion polypeptide (expressed in Escherichia coli) recognised four RRSV particle associated polypeptides of M(r) 67K, 46K, 43K and 26K and all except the 26K polypeptide were also highly immunoreactive to polyclonal antibodies raised against purified RRSV particles. Cleavage of the MBP-S8 fusion polypeptide with protease Factor X produced the expected 40K MBP and two polypeptides of apparent M(r) 46K and 26K. Antibodies to purified RRSV particles reacted strongly with the intact fusion protein and the 46K cleavage product but weakly to the 26K product. Furthermore, in vitro transcription and translation of the S8 coding region revealed a post-translational self cleavage of the 67K polypeptide to 46K and 26K products. These data indicate that S8 encodes a structural polypeptide, the majority of which is auto- catalytically cleaved to 26K and 46K proteins. The data also suggest that the 26K protein is the self cleaving protease and that the 46K product is further processed or undergoes stable conformational changes to a ~43K major capsid protein.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The complete nucleotide sequence of the genome segment 5 (S5) of a Thai isolate of rice ragged stunt virus (RRSV) was determined. The 2682 nucleotide sequence contains a single long open reading frame capable of encoding a polypeptide with a molecular mass of ~91 kDa. Polypeptides encoded by various truncated cDNAs of S5 were expressed using the pGEX fusion protein vector and the highest level of fusion protein was obtained from a construct encoding a hydrophilic region of S5 protein. Antibodies raised against this fusion protein recognized a minor polypeptide, with a molecular mass of ~ 91 kDa, that was present in purified preparations of RRSV particles, infected insect vectors and infected rice plants. This indicates that RRSV S5 encodes a minor structural protein. Comparing the RRSV S5 sequence with sequences of other reo-viruses did not reveal any significant sequence similarities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The genomic sequence of an Australian isolate of carrot mottle umbravirus (CMoV-A) was determined from cDNA generated from dsRNA. This provides the first data on the genome organization and phylogeny of an umbravirus. The 4201-nucleotide genome contains four major open reading frames (ORFs). Analysis suggests that ORF2 encodes an RNA-dependent RNA polymerase, that ORF4 encodes a movement protein, and that the virus has no coat protein gene. The functions of ORFs 1 and 3 remain unknown. ORF2 is probably translated following ribosomal frameshifting. ORFs 3 and 4 are probably translated from a subgenomic mRNA. Sequence comparisons showed CMoV-A to be closely related to pea enation mosaic RNA2 NA2), but also to have affinities with the Bromoviridae. These findings shed light on the relationships between the luteoviruses, PEMV, and the umbraviruses and on the relationships between the carmo-like viruses and the Bromoviridae.