846 resultados para Whole genome sequencing
Resumo:
Whole transcriptome shotgun sequencing (RNA-seq) was used to assess the transcriptomic response of the toxic cyanobacterium Microcystis aeruginosa during growth with low levels of dissolved inorganic nitrogen (low N), low levels of dissolved inorganic phosphorus (low P), and in the presence of high levels of high molecular weight dissolved organic matter (HMWDOM). Under low N, one third of the genome was differentially expressed, with significant increases in transcripts observed among genes within the nir operon, urea transport genes (urtBCDE), and amino acid transporters while significant decreases in transcripts were observed in genes related to photosynthesis. There was also a significant decrease in the transcription of the microcystin synthetase gene set under low N and a significant decrease in microcystin content per Microcystis cell demonstrating that N supply influences cellular toxicity. Under low P, 27% of the genome was differentially expressed. The Pho regulon was induced leading to large increases in transcript levels of the alkaline phosphatase phoX, the Pst transport system (pstABC), and the sphX gene, and transcripts of multiple sulfate transporter were also significantly more abundant. While the transcriptional response to growth on HMWDOM was smaller (5–22% of genes differentially expressed), transcripts of multiple genes specifically associated with the transport and degradation of organic compounds were significantly more abundant within HMWDOM treatments and thus may be recruited by Microcystis to utilize these substrates. Collectively, these findings provide a comprehensive understanding of the nutritional physiology of this toxic, bloom-forming cyanobacterium and the role of N in controlling microcystin synthesis.
Resumo:
The accidental amplification of nuclear mitochondrial pseudogenes (NUMTs) can pose a serious problem for mitochondrial disease studies. This report shows that the mutation spectrum left by spurious amplification of a NUMT can be detected because it usuall
Resumo:
The complete genome of spring viraemia of carp virus (SVCV) strain A-1 isolated from cultured common carp (Cyprinus carpio) in China was sequenced and characterized. Reverse transcription-polymerase chain reaction (RT-PCR) derived clones were constructed and the DNA was sequenced. It showed that the entire genome of SVCV A-1 consists of 11,100 nucleotide base pairs, the predicted size of the viral RNA of rhabdoviruses. However, the additional insertions in bp 4633-4676 and bp 4684-4724 of SVCV A-1 were different from the other two published SVCV complete genomes. Five open reading frames (ORFs) of SVCV A-1 were identified and further confirmed by RT-PCR and DNA sequencing of their respective RT-PCR products. The 5 structural proteins encoded by the viral RNA were ordered 3'-N-P-M-G-L-5'. This is the first report of a complete genome sequence of SVCV isolated from cultured carp in China. Phylogenetic analysis indicates that SVCV A-1 is closely related to the members of the genus Vesiculovirus, family Rhabdoviridae.
Resumo:
As one of the most powerful tools in biomedical research, DNA sequencing not only has been improving its productivity in an exponential growth rate but also been evolving into a new layout of technological territories toward engineering and physical disciplines over the past three decades. In this technical review, we look into technical characteristics of the next-gen sequencers and provide prospective insights into their future development and applications. We envisage that some of the emerging platforms are capable of supporting the $1000 genome and $100 genome goals if given a few years for technical maturation. We also suggest that scientists from China should play an active role in this campaign that will have profound impact on both scientific research and societal healthcare systems.
Resumo:
A circular bacterial artificial chromosome of 148.9 kbp on human chromosome 3 has been extended and fixed on bare mica substrates using a developed fluid capillary flow method in evaporating liquid drops. Extended circular DNA molecules were imaged with an atomic force microscope (AFM) under ambient conditions. The measured total lengths of the whole DNA molecules were in agreement with sequencing analysis data with an error range of +/-3.6%. This work is important groundwork for probing single nucleotide polymorphisms in the human genome, mapping genomic DNA, manipulating biomolecular nanotechnology, and studying the interaction of DNA-protein complexes investigated by AFM.
Resumo:
To understand the systematic status of Larimichthys crocea in the Percoidei, we determined the complete mitochondrial (mt) genome sequence using 454 sequencing-by-synthesis technology. The complete mt genome is 16,466 bp in length including the typical structure of 22 tRNAs, 2 rRNAs, 13 protein-coding genes and the noncoding control region (CR). Further sequencing for the complete CR was performed using the primers Cyt b-F and 12S-R on six L crocea individuals and two L polyactis individuals. Interestingly, all seven CR sequences from L crocea were identical while the three sequences from L polyactis were distinct (including one from GenBank). Although the conserved blocks such as TAS and CSB-1, -2, and -3 are readily identifiable in the control regions of the two species, the typical central conserved blocks CSB-D, -E, and -F could not be detected, while they are found in Cynoscion acoupa of Sciaenidae and other Percoidei species. Phylogenetic analysis shows that L crocea is a relatively recently emerged species in Sciaenidae and this family is closely related to family Pomacanthidae within the Percoidei. L crocea, as the first species of Sciaenidae with complete mitochondrial genome available, will provide important information on the molecular evolution of the group. Moreover, the genus-specific pair of primers designed in this study for amplifying the complete mt control region will be very useful in studies on the population genetics and conservation biology of Larimichthys. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
Mitochondrial genome sequence and structure analysis has become a powerful tool for studying molecular evolution and phylogenetic relationships. To understand the systematic status of Trichiurus japonicus in suborder Scombroidei, we determined the complete mitochondrial genome (mitogenome) sequence using the long-polymerase chain reaction (long-PCR) and shotgun sequencing method. The entire mitogenome is 16,796 by in length and has three unusual features, including (1) the absence of tRNA(Pro) gene, (2) the possibly nonfunctional light-strand replication origin (O-L) showing a shorter loop in secondary structure and no conserved motif (5'-GCCGG-3'), (3) two sets of the tandem repeats at the 5' and 3' ends of the control region. The three features seem common for Trichiurus mitogenomes, as we have confirmed them in other three T. japonicus individuals and in T nanhaiensis. Phylogenetic analysis does not support the monophyly of Trichiuridae, which is against the morphological result. T. japonicus is most closely related to those species of family Scombridae; they in turn have a sister relationship with Perciformes members including suborders Acanthuroidei, Caproidei, Notothenioidei, Zoarcoidei, Trachinoidei, and some species of Labroidei, based on the current dataset of complete mitogenome. T japonicus together with T. brevis, T lepturus and Aphanopus carbo form a clade distinct from Lepidopus caudatus in terms of the complete Cyt b sequences. T. japonicus mitogenome, as the first discovered complete mitogenome of Trichiuridae, should provide important information on both genomics and phylogenetics of Trichiuridae. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Through random sequencing, we found a total of 884000 base-pairs (bp) of random genomic sequences in the genome of Chinese shrimp (Fenneropenaeus chinensis). Using bio-soft Tandem Repeat Finder (TRF) software, 2159 tandem repeats were found, in which there were 1714 microsatellites and 445 minisatellites, accounting for 79.4% and 20.6% of repeat sequences, respectively. The cumulative length of repeat sequences was found to be 116685 bp, accounting for 13.2% of the total DNA sequence; the cumulative length of microsatellites occupied 9.78% of the total DNA sequence, and that of minisatellites occupied 3.42%. In decreasing order, the 20 most abundant repeat sequence classes were as follows: AT (557), AC (471), AG (274), AAT (92), A (56), AAG (28), ATC (27), ATAG (27), AGG (18), ACT (15), C (11), AAC (11), ACAT (11), CAGA (10), AGAA (9), AGGG (7), CAAA (7), CGCA (6), ATAA (6), AGAGAA (6). Dinucleotide repeats, not only in the aspect of the number, but also in cumulative length, were the preponderant repeat type. There were few classes and low copy numbers of repeat units of the pentanucleotide repeat type, which included only three classes: AGAGA, GAGGC and AAAGA. The classes and copy numbers of heptanucleotide, eleven-nucleotide and thirteen-nucleotide primer-number-composed repeats were distinctly less than that of repeat types beside them.
Resumo:
Given the commercial and ecological importance of the Asian paddle crab, Charybdis japonica, there is a clearly need for genetic and molecular research on this species. Here, we present the complete mitochondrial genome sequence of C. japonica, determined by the long-polymerase chain reaction and primer walking sequencing method. The entire genome is 15,738 bp in length, encoding a standard set of 13 protein-coding genes, two ribosomal RNA genes, and 22 transfer RNA genes, plus the putative control region, which is typical for metazoans. The total A+T content of the genome is 69.2%, lower than the other brachyuran crabs except for Callinectes sapidus. The gene order is identical to the published marine brachyurans and differs from the ancestral pancrustacean order by only the position of the tRNA (His) gene. Phylogenetic analyses using the concatenated nucleotide and amino acid sequences of 13 protein-coding genes strongly support the monophyly of Dendrobranchiata and Pleocyemata, which is consistent with the previous taxonomic classification. However, the systematic status of Charybdis within subfamily Thalamitinae of family Portunidae is not supported. C. japonica, as the first species of Charybdis with complete mitochondrial genome available, will provide important information on both genomics and molecular ecology of the group.
Resumo:
We used ultra-deep sequencing to obtain tens of thousands of HIV-1 sequences from regions targeted by CD8+ T lymphocytes from longitudinal samples from three acutely infected subjects, and modeled viral evolution during the critical first weeks of infection. Previous studies suggested that a single virus established productive infection, but these conclusions were tempered because of limited sampling; now, we have greatly increased our confidence in this observation through modeling the observed earliest sample diversity based on vastly more extensive sampling. Conventional sequencing of HIV-1 from acute/early infection has shown different patterns of escape at different epitopes; we investigated the earliest escapes in exquisite detail. Over 3-6 weeks, ultradeep sequencing revealed that the virus explored an extraordinary array of potential escape routes in the process of evading the earliest CD8 T-lymphocyte responses--using 454 sequencing, we identified over 50 variant forms of each targeted epitope during early immune escape, while only 2-7 variants were detected in the same samples via conventional sequencing. In contrast to the diversity seen within epitopes, non-epitope regions, including the Envelope V3 region, which was sequenced as a control in each subject, displayed very low levels of variation. In early infection, in the regions sequenced, the consensus forms did not have a fitness advantage large enough to trigger reversion to consensus amino acids in the absence of immune pressure. In one subject, a genetic bottleneck was observed, with extensive diversity at the second time point narrowing to two dominant escape forms by the third time point, all within two months of infection. Traces of immune escape were observed in the earliest samples, suggesting that immune pressure is present and effective earlier than previously reported; quantifying the loss rate of the founder virus suggests a direct role for CD8 T-lymphocyte responses in viral containment after peak viremia. Dramatic shifts in the frequencies of epitope variants during the first weeks of infection revealed a complex interplay between viral fitness and immune escape.
Resumo:
The International Crocodilian Genomes Working Group (ICGWG) will sequence and assemble the American alligator (Alligator mississippiensis), saltwater crocodile (Crocodylus porosus) and Indian gharial (Gavialis gangeticus) genomes. The status of these projects and our planned analyses are described.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
Copepods of the genus Calanus are key zooplankton species in temperate to arctic marine ecosystems. Despite their ecological importance, species identification remains challenging. Furthermore, the recent report of hybrids among Calanus species highlights the need for diagnostic nuclear markers to efficiently identify parental species and hybrids. Using next-generation sequencing analysis of both the genome and transcriptome from two sibling species, Calanus finmarchicus and Calanus glacialis, we developed a panel of 12 nuclear insertion/deletion markers. All the markers showed species-specific amplicon length. Furthermore, most of the markers were successfully amplified in other Calanus species, allowing the molecular identification of Calanus helgolandicus, Calanus hyperboreus and Calanus marshallae.
Resumo:
Zooplankton play an important role in our oceans, in biogeochemical cycling and providing a food source for commercially important fish larvae. However, difficulties in correctly identifying zooplankton hinder our understanding of their roles in marine ecosystem functioning, and can prevent detection of long term changes in their community structure. The advent of massively parallel next generation sequencing technology allows DNA sequence data to be recovered directly from whole community samples. Here we assess the ability of such sequencing to quantify richness and diversity of a mixed zooplankton assemblage from a productive time series site in the Western English Channel. Methodology/Principle Findings Plankton net hauls (200 µm) were taken at the Western Channel Observatory station L4 in September 2010 and January 2011. These samples were analysed by microscopy and metagenetic analysis of the 18S nuclear small subunit ribosomal RNA gene using the 454 pyrosequencing platform. Following quality control a total of 419,041 sequences were obtained for all samples. The sequences clustered into 205 operational taxonomic units using a 97% similarity cut-off. Allocation of taxonomy by comparison with the National Centre for Biotechnology Information database identified 135 OTUs to species level, 11 to genus level and 1 to order, <2.5% of sequences were classified as unknowns. By comparison a skilled microscopic analyst was able to routinely enumerate only 58 taxonomic groups. Conclusions Metagenetics reveals a previously hidden taxonomic richness, especially for Copepoda and hard-to-identify meroplankton such as Bivalvia, Gastropoda and Polychaeta. It also reveals rare species and parasites. We conclude that Next Generation Sequencing of 18S amplicons is a powerful tool for elucidating the true diversity and species richness of zooplankton communities. While this approach allows for broad diversity assessments of plankton it may become increasingly attractive in future if sequence reference libraries of accurately identified individuals are better populated.
Resumo:
We have utilised polymorphic chloroplast microsatellites to analyse cytoplasmic relationships between accessions in the genera Triticum and Aegilops. Sequencing of PCR products revealed point mutations and insertions/deletions in addition to the standard repeat length expansion/contraction which most likely represent ancient synapomorphies. Phylogenetic analyses revealed three distinct groups of accessions. One of these contained all the non-Aegilops speltoides S-type cytoplasm species, another comprised almost exclusively A, C, D, M, N, T and U cytoplasm-type accessions and the third contained the polyploid Triticum species and all the Ae. speltoides accessions, further confirming that Ae. speltoides or a closely related but now extinct species was the original B-genome donor of cultivated polyploid wheat. Successive decreases in levels of genetic diversity due to domestication were also observed. Finally, we highlight the importance of elucidating longer-term evolutionary processes operating at microsatellite repeat loci.