881 resultados para RNA sequencing
Resumo:
The Arabidopsis genome contains a highly complex and abundant population of small RNAs, and many of the endogenous siRNAs are dependent on RNA-Dependent RNA Polymerase 2 (RDR2) for their biogenesis. By analyzing an rdr2 loss-of-function mutant using two different parallel sequencing technologies, MPSS and 454, we characterized the complement of miRNAs expressed in Arabidopsis inflorescence to considerable depth. Nearly all known miRNAs were enriched in this mutant and we identified 13 new miRNAs, all of which were relatively low abundance and constitute new families. Trans-acting siRNAs (ta-siRNAs) were even more highly enriched. Computational and gel blot analyses suggested that the minimal number of miRNAs in Arabidopsis is approximately 155. The size profile of small RNAs in rdr2 reflected enrichment of 21-nt miRNAs and other classes of siRNAs like ta-siRNAs, and a significant reduction in 24-nt heterochromatic siRNAs. Other classes of small RNAs were found to be RDR2-independent, particularly those derived from long inverted repeats and a subset of tandem repeats. The small RNA populations in other Arabidopsis small RNA biogenesis mutants were also examined; a dcl2/3/4 triple mutant showed a similar pattern to rdr2, whereas dcl1-7 and rdr6 showed reductions in miRNAs and ta-siRNAs consistent with their activities in the biogenesis of these types of small RNAs. Deep sequencing of mutants provides a genetic approach for the dissection and characterization of diverse small RNA populations and the identification of low abundance miRNAs.
Resumo:
Cystic Fibrosis (CF) is an autosomal recessive monogenic disorder caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene with the ΔF508 mutation accounting for approximately 70% of all CF cases worldwide. This thesis investigates whether existing zinc finger nucleases designed in this lab and CRISPR/gRNAs designed in this thesis can mediate efficient homology-directed repair (HDR) with appropriate donor repair plasmids to correct CF-causing mutations in a CF cell line. Firstly, the most common mutation, ΔF508, was corrected using a pair of existing ZFNs, which cleave in intron 9, and the donor repair plasmid pITR-donor-XC, which contains the correct CTT sequence and two unique restriction sites. HDR was initially determined to be <1% but further analysis by next generation sequencing (NGS) revealed HDR occurred at a level of 2%. This relatively low level of repair was determined to be a consequence of distance from the cut site to the mutation and so rather than designing a new pair of ZFNs, the position of the existing intron 9 ZFNs was exploited and attempts made to correct >80% of CF-causing mutations. The ZFN cut site was used as the site for HDR of a mini-gene construct comprising exons 10-24 from CFTR cDNA (with appropriate splice acceptor and poly A sites) to allow production of full length corrected CFTR mRNA. Finally, the ability to cleave closer to the mutation and mediate repair of CFTR using the latest gene editing tool CRISPR/Cas9 was explored. Two CRISPR gRNAs were tested; CRISPR ex10 was shown to cleave at an efficiency of 15% and CRISPR in9 cleaved at 3%. Both CRISPR gRNAs mediated HDR with appropriate donor plasmids at a rate of ~1% as determined by NGS. This is the first evidence of CRISPR induced HDR in CF cell lines.
Resumo:
RNA editing is a biological phenomena that alters nascent RNA transcripts by insertion, deletion and/or substitution of one or a few nucleotides. It is ubiquitous in all kingdoms of life and in viruses. The predominant editing event in organisms with a developed central nervous system is Adenosine to Inosine deamination. Inosine is recognized as Guanosine by the translational machinery and reverse-transcriptase. In primates, RNA editing occurs frequently in transcripts from repetitive regions of the genome. In humans, more than 500,000 editing instances have been identified, by applying computational pipelines on available ESTs and high-throughput sequencing data, and by using chemical methods. However, the functions of only a small number of cases have been studied thoroughly. RNA editing instances have been found to have roles in peptide variants synthesis by non-synonymous codon substitutions, transcript variants by alterations in splicing sites and gene silencing by miRNAs sequence modifications. We established the Database of RNA EDiting (DARNED) to accommo-date the reference genomic coordinates of substitution editing in human, mouse and fly transcripts from published literatures, with additional information on edited genomic coordinates collected from various databases e.g. UCSC, NCBI. DARNED contains mostly Adenosine to Inosine editing and allows searches based on genomic region, gene ID, and user provided sequence. The Database is accessible at http://darned.ucc.ie RNA editing instances in coding region are likely to result in recoding in protein synthesis. This encouraged me to focus my research on the occurrences of RNA editing specific CDS and non-Alu exonic regions. By applying various filters on discrepancies between available ESTs and their corresponding reference genomic sequences, putative RNA editing candidates were identified. High-throughput sequencing was used to validate these candidates. All predicted coordinates appeared to be either SNPs or unedited.
Resumo:
BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.
Resumo:
BACKGROUND: West Virginia has the worst oral health in the United States, but the reasons for this are unclear. This pilot study explored the etiology of this disparity using culture-independent analyses to identify bacterial species associated with oral disease. METHODS: Bacteria in subgingival plaque samples from twelve participants in two independent West Virginia dental-related studies were characterized using 16S rRNA gene sequencing and Human Oral Microbe Identification Microarray (HOMIM) analysis. Unifrac analysis was used to characterize phylogenetic differences between bacterial communities obtained from plaque of participants with low or high oral disease, which was further evaluated using clustering and Principal Coordinate Analysis. RESULTS: Statistically different bacterial signatures (P<0.001) were identified in subgingival plaque of individuals with low or high oral disease in West Virginia based on 16S rRNA gene sequencing. Low disease contained a high frequency of Veillonella and Streptococcus, with a moderate number of Capnocytophaga. High disease exhibited substantially increased bacterial diversity and included a large proportion of Clostridiales cluster bacteria (Selenomonas, Eubacterium, Dialister). Phylogenetic trees constructed using 16S rRNA gene sequencing revealed that Clostridiales were repeated colonizers in plaque associated with high oral disease, providing evidence that the oral environment is somehow influencing the bacterial signature linked to disease. CONCLUSIONS: Culture-independent analyses identified an atypical bacterial signature associated with high oral disease in West Virginians and provided evidence that the oral environment influenced this signature. Both findings provide insight into the etiology of the oral disparity in West Virginia.
Elucidation of hepatitis C virus transmission and early diversification by single genome sequencing.
Resumo:
A precise molecular identification of transmitted hepatitis C virus (HCV) genomes could illuminate key aspects of transmission biology, immunopathogenesis and natural history. We used single genome sequencing of 2,922 half or quarter genomes from plasma viral RNA to identify transmitted/founder (T/F) viruses in 17 subjects with acute community-acquired HCV infection. Sequences from 13 of 17 acute subjects, but none of 14 chronic controls, exhibited one or more discrete low diversity viral lineages. Sequences within each lineage generally revealed a star-like phylogeny of mutations that coalesced to unambiguous T/F viral genomes. Numbers of transmitted viruses leading to productive clinical infection were estimated to range from 1 to 37 or more (median = 4). Four acutely infected subjects showed a distinctly different pattern of virus diversity that deviated from a star-like phylogeny. In these cases, empirical analysis and mathematical modeling suggested high multiplicity virus transmission from individuals who themselves were acutely infected or had experienced a virus population bottleneck due to antiviral drug therapy. These results provide new quantitative and qualitative insights into HCV transmission, revealing for the first time virus-host interactions that successful vaccines or treatment interventions will need to overcome. Our findings further suggest a novel experimental strategy for identifying full-length T/F genomes for proteome-wide analyses of HCV biology and adaptation to antiviral drug or immune pressures.
Resumo:
Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
Resumo:
Zooplankton play an important role in our oceans, in biogeochemical cycling and providing a food source for commercially important fish larvae. However, difficulties in correctly identifying zooplankton hinder our understanding of their roles in marine ecosystem functioning, and can prevent detection of long term changes in their community structure. The advent of massively parallel next generation sequencing technology allows DNA sequence data to be recovered directly from whole community samples. Here we assess the ability of such sequencing to quantify richness and diversity of a mixed zooplankton assemblage from a productive time series site in the Western English Channel. Methodology/Principle Findings Plankton net hauls (200 µm) were taken at the Western Channel Observatory station L4 in September 2010 and January 2011. These samples were analysed by microscopy and metagenetic analysis of the 18S nuclear small subunit ribosomal RNA gene using the 454 pyrosequencing platform. Following quality control a total of 419,041 sequences were obtained for all samples. The sequences clustered into 205 operational taxonomic units using a 97% similarity cut-off. Allocation of taxonomy by comparison with the National Centre for Biotechnology Information database identified 135 OTUs to species level, 11 to genus level and 1 to order, <2.5% of sequences were classified as unknowns. By comparison a skilled microscopic analyst was able to routinely enumerate only 58 taxonomic groups. Conclusions Metagenetics reveals a previously hidden taxonomic richness, especially for Copepoda and hard-to-identify meroplankton such as Bivalvia, Gastropoda and Polychaeta. It also reveals rare species and parasites. We conclude that Next Generation Sequencing of 18S amplicons is a powerful tool for elucidating the true diversity and species richness of zooplankton communities. While this approach allows for broad diversity assessments of plankton it may become increasingly attractive in future if sequence reference libraries of accurately identified individuals are better populated.
Resumo:
The retinal vascular endothelium is essential for angiogenesis and is involved in maintaining barrier selectivity and vascular tone. The aim of this study was to identify and quantify microRNAs and other small regulatory non-coding RNAs (ncRNAs) which may regulate these crucial functions. Primary bovine retinal microvascular endothelial cells (RMECs) provide a well-characterized in vitro system for studying angiogenesis. RNA extracted from RMECs was used to prepare a small RNA library for deep sequencing (Illumina Genome Analyzer). A total of 6.8 million reads were mapped to 250 known microRNAs in miRBase (release 16). In many cases, the most frequent isomiR differed from the sequence reported in miRBase. In addition, five novel microRNAs, 13 novel bovine orthologs of known human microRNAs and multiple new members of the miR-2284/2285 family were detected. Several similar to 30 nucleotide sno-miRNAs were identified, with the most highly expressed being derived from snoRNA U78. Highly expressed microRNAs previously associated with endothelial cells included miR-126 and miR-378, but the most highly expressed was miR-21, comprising more than one-third of all mapped reads. Inhibition of miR-21 with an LNA inhibitor significantly reduced proliferation, migration, and tube-forming capacity of RMECs. The independence from prior sequence knowledge provided by deep sequencing facilitates analysis of novel microRNAs and other small RNAs. This approach also enables quantitative evaluation of microRNA expression, which has highlighted the predominance of a small number of microRNAs in RMECs. Knockdown of miR-21 suggests a role for this microRNA in regulation of angiogenesis in the retinal microvasculature. J. Cell. Biochem. 113: 20982111, 2012. (C) 2012 Wiley Periodicals, Inc.
Resumo:
The introduction of Next Generation Sequencing (NGS) has revolutionised population genetics, providing studies of non-model species with unprecedented genomic coverage, allowing evolutionary biologists to address questions previously far beyond the reach of available resources. Furthermore, the simple mutation model of Single Nucleotide Polymorphisms (SNPs) permits cost-effective high-throughput genotyping in thousands of individuals simultaneously. Genomic resources are scarce for the Atlantic herring (Clupea harengus), a small pelagic species that sustains high revenue fisheries. This paper details the development of 578 SNPs using a combined NGS and high-throughput genotyping approach. Eight individuals covering the species distribution in the eastern Atlantic were bar-coded and multiplexed into a single cDNA library and sequenced using the 454 GS FLX platform. SNP discovery was performed by de novo sequence clustering and contig assembly, followed by the mapping of reads against consensus contig sequences. Selection of candidate SNPs for genotyping was conducted using an in silico approach. SNP validation and genotyping were performed simultaneously using an Illumina 1,536 GoldenGate assay. Although the conversion rate of candidate SNPs in the genotyping assay cannot be predicted in advance, this approach has the potential to maximise cost and time efficiencies by avoiding expensive and time-consuming laboratory stages of SNP validation. Additionally, the in silico approach leads to lower ascertainment bias in the resulting SNP panel as marker selection is based only on the ability to design primers and the predicted presence of intron-exon boundaries. Consequently SNPs with a wider spectrum of minor allele frequencies (MAFs) will be genotyped in the final panel. The genomic resources presented here represent a valuable multi-purpose resource for developing informative marker panels for population discrimination, microarray development and for population genomic studies in the wild.
Resumo:
Tese de mestrado em Bioinformática e Biologia Computacional (Bioinformática), apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2014
Resumo:
We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com.
Resumo:
Pan-viral DNA array (PVDA) and high-throughput sequencing (HTS) are useful tools to identify novel viruses of emerging diseases. However, both techniques have difficulties to identify viruses in clinical samples because of the host genomic nucleic acid content (hg/cont). Both propidium monoazide (PMA) and ethidium bromide monoazide (EMA) have the capacity to bind free DNA/RNA, but are cell membrane-impermeable. Thus, both are unable to bind protected nucleic acid such as viral genomes within intact virions. However, EMA/PMA modified genetic material cannot be amplified by enzymes. In order to assess the potential of EMA/PMA to lower the presence of amplifiable hg/cont in samples and improve virus detection, serum and lung tissue homogenates were spiked with porcine reproductive and respiratory virus (PRRSV) and were processed with EMA/PMA. In addition, PRRSV RT-qPCR positive clinical samples were also tested. EMA/PMA treatments significantly decreased amplifiable hg/cont and significantly increased the number of PVDA positive probes and their signal intensity compared to untreated spiked lung samples. EMA/PMA treatments also increased the sensitivity of HTS by increasing the number of specific PRRSV reads and the PRRSV percentage of coverage. Interestingly, EMA/PMA treatments significantly increased the sensitivity of PVDA and HTS in two out of three clinical tissue samples. Thus, EMA/PMA treatments offer a new approach to lower the amplifiable hg/cont in clinical samples and increase the success of PVDA and HTS to identify viruses.
Resumo:
Dictyostelium discoideum is a social amoeba that serves as a model system for RNA interference and related mechanisms. Its position between plants and animals enables evolutionary snapshot of mechanisms and protein machinery involved in investigated subjects. MiRNAs are small regulatory RNAs that are evolutionary conserved and present in animals, plants, viruses and some prokaryotes. They have roles in development, cell growth and differentiation, apoptosis and their miss-regulation is associated with many diseases such as cancer, neurodegenerative disorders and diabetes. Recently, through sequencing of DNA libraries miRNAs have been discovered in D. discoideum. In this work, it has been shown that heterologues miRNA let-7 can be expressed and processed in D. discoideum. Expression of let-7 miRNA in social amoeba resulted in a strong developmental phenotype suggesting an overload of the processing/silencing system or/and endogenous targets. The various effects on prel-7 strain have been observed and characterized, serving as a background for postulation of miRNA roles. An artificial miRNA system has been established and imposed to D. discoideum, showing that miRNAs in Dictyostelium could mediate gene expression on the level of mRNA stability and on the posttranscriptional level. Furthermore, presence of translational inhibition as a type of gene control was shown for the first time in this organism. Due to it new structures representing co-localities of miRNA and target mRNA have been detected. Taken together, this work shows functional artificial miRNA system and postulates roles of endogenous small RNA in social amoeba.
Resumo:
Argonauten Proteine übernehmen vielfältige Funktionen in RNA vermittelten Signalwegen zur Genregulation und sind in eukaryotischen Organismen hoch konserviert. Obwohl das Repertoire an kleinen regulatorischen RNAs in D. discoideum schon früh untersucht wurde und dabei sowohl siRNAs als auch miRNAs identifiziert werden konnten, war die Funktion der fünf kodierten Argonauten Proteine zu Beginn meiner Arbeit noch völlig unbekannt. Im Fokus meiner Untersuchung standen die zwei Homologe AgnA und AgnB. Die molekularbiologische Charakterisierung von AgnA hat gezeigt, dass das Protein eine essentielle Funktion bei der posttranskriptionellen Regulation des Retrotransposons DIRS-1 hat. AgnA wird für die Generierung von über 90 % der DIRS-1 siRNAs benötigt, wobei unklar ist, ob die Slicer-Aktivität des Proteins relevant ist oder ob AgnA andere Proteine zur Generierung der kleinen RNAs rekrutiert. Mit Hilfe der Deep Sequencing Analyse kleiner RNAs im AgnA KO konnte die Abreicherung der DIRS-1 siRNAs bestätigt werden. Die Anreicherung von DIRS-1 sense und antisense Transkripten weist deutlich auf eine Deregulation des Retrotransposons bei Abwesenheit von AgnA hin. Der Verlust der AgnA abhängigen Regulationsebene ist nicht nur auf RNA- sondern auch auf DNA-Ebene nachweisbar, da im AgnA Knockout einzelsträngige extrachromosomale DIRS-1 Intermediate nachweisbar sind. Die Analyse dieser Strukturen mit Hilfe von Rasterkraftmikroskopie zeigt, dass die extrachromosomale DNA mit Proteinen assoziiert ist. Das Erscheinungsbild legt die Vermutung nahe, dass es sich um Virus ähnliche Partikel handeln könnte. Die Transposition der DIRS-1 Elemente konnte nicht nachgewiesen werden. Sie schlägt vermutlich fehl, da der zur Integration notwendige DNA-Doppelstrang nicht gebildet wird. Auch wenn der genaue Mechanismus der AgnA abhängigen DIRS-1 Regulation nicht vollständig aufgeklärt werden konnte, weisen die Ergebnisse darauf hin, dass AgnA nicht nur an der Biogenese der kleinen DIRS-1 siRNAs beteiligt ist, sondern auch weiter downstream, vermutlich innerhalb von Effektorkomplexen, als Regulator aktiv ist. AgnB ist nicht an der negativen Regulation des DIRS-1 Retrotransposons beteiligt. Im Gegenteil haben Experimente gezeigt, dass das Protein die Transkription des Elementes und die Bildung von DNA-Intermediaten eher positiv beeinflusst. Im Fall des Retrotransposons Skipper ist unklar, ob die wenigen siRNAs, die identifiziert worden sind, tatsächlich für die Regulation dieses Elementes genutzt werden. Der Knockout von AgnA hat eine Anreicherung der Skipper siRNAs zur Folge, wobei diese sehr variabel ist. Es konnten Skipper Transkripte nachgewiesen werden (Hinas et al., 2007), die wahrscheinlich die Vorläufermoleküle der siRNAs darstellen. Die Menge dieser Transkripte unterscheidet sich allerdings im Wildtyp und den untersuchten Knockout-Stämmen nicht. Bei der Untersuchung der miRNAs zeigte sich eine signifikante Anreicherung dieser regulatorischen RNAs im AgnA Knockout. Die Akkumulation kann durch die Expression von rekombinantem AgnA wieder auf Wildtyp Niveau gebracht werden. Die genaue Funktion von AgnA im miRNA Signalweg konnte aber nicht näher spezifiziert werden. Im Fall der beiden miRNAs konnte im Rahmen dieser Arbeit nachgewiesen werden, dass sie keine 2‘-O Methylierung besitzen und fast ausschließlich im Cytoplasma der Zelle vorliegen. Letzteres weist darauf hin, dass die untersuchten miRNAs ihre Zielgene vermutlich posttranskriptionell regulieren. Die Akkumulation von miRNAs im AgnA KO konnte ebenfalls durch Deep Sequencing Analysen verifiziert werden. Weiterhin wurden tRNA Fragmente gefunden, die im AgnA KO wesentlich stärker vertreten sind. Northern Blot Analysen haben gezeigt, dass ein zusätzliches Fragment der tRNA Asp akkumuliert, wenn AgnA nicht exprimiert wird. Möglicherweise ist AgnA am Umsatz der tRNA beteiligt. Die biologische Funktion der tRNA Fragmente in D. discoideum ist jedoch bisher ungeklärt. Bei der Suche nach putativen Interaktionspartnern konnte im Fall von AgnA das Protein DDB_G0268914 mittels Massenspektrometrie als putativer Interaktionspartner identifiziert werden. Dieses Protein zeigt Homologien zu MOV10 aus H. sapiens, das ebenfalls mit Argonauten Proteinen interagiert (Hock et al., 2007) und die Replikation von Retroviren unterdrückt (Burdick et al., 2010). Die Interaktion zwischen AgnA und dem MOV10 Homolog konnte bisher nicht mit anderen Ansätzen bestätigt werden. Darüber hinaus bleibt zu klären, ob der putative Interaktionsparter ebenfalls an der Regulation des Retrotransposons DIRS-1 beteiligt ist.