956 resultados para Edman sequencing
Resumo:
Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (
Resumo:
My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.
Resumo:
Paracrine motogenic factors, including motility cytokines and extracellular matrix molecules secreted by normal cells, can stimulate metastatic cell invasion. For extracellular matrix molecules, both the intact molecules and the degradative products may exhibit these activities, which in some cases are not shared by the intact molecules. We found that human peritumoral and lung fibroblasts secrete motility-stimulating activity for several recently established human sarcoma cell strains. The motility of lung metastasis-derived human SYN-1 sarcoma cells was preferentially stimulated by human lung and peritumoral fibroblast motility-stimulating factors (FMSFs). FMSFs were nondialyzable, susceptible to trypsin, and sensitive to dithiothreitol. Cycloheximide inhibited accumulation of FMSF activity in conditioned medium; however, addition of cycloheximide to the migration assay did not significantly affect motility-stimulating activity. Purified hepatocyte growth factor/scatter factor (HGF/SF), rabbit anti-hHGF, and RT-PCR analysis of peritumoral and lung fibroblast HGF/SF mRNA expression indicated that FMSF activity was unrelated to HGF/SF. Partial purification of FMSF by gel exclusion chromatography revealed several peaks of activity, suggesting multiple FMSF molecules or complexes.^ We purified the fibroblast motility-stimulating factor from human lung fibroblast-conditioned medium to apparent homogeneity by sequential heparin affinity chromatography and DEAE anion exchange chromatography. Lysylendopeptidase C digestion of FMSF and sequencing of peptides purified by reverse phase HPLC after digestion identified it as an N-terminal fragment of human fibronectin. Purified FMSF stimulated predominantly chemotaxis but chemokinesis as well of SYN-1 sarcoma cells and was chemotactic for a variety of human sarcoma cells, including fibrosarcoma, leiomyosarcoma, liposarcoma, synovial sarcoma and neurofibrosarcoma cells. The motility-stimulating activity present in HLF-CM was completely eliminated by either neutralization or immunodepletion with a rabbit anti-human-fibronectin antibody, thus further confirming that the fibronectin fragment was the FMSF responsible for the motility stimulation of human soft tissue sarcoma cells. Since human soft tissue sarcomas have a distinctive hematogenous metastatic pattern (predominantly lung), FMSF may play a role in this process. ^
Resumo:
Background: Zooplankton play an important role in our oceans, in biogeochemical cycling and providing a food source for commercially important fish larvae. However, difficulties in correctly identifying zooplankton hinder our understanding of their roles in marine ecosystem functioning, and can prevent detection of long term changes in their community structure. The advent of massively parallel Next Generation Sequencing technology allows DNA sequence data to be recovered directly from whole community samples. Here we assess the ability of such sequencing to quantify the richness and diversity of a mixed zooplankton assemblage from a productive monitoring site in the Western English Channel. Methodology/Principle Findings: Plankton WP2 replicate net hauls (200 µm) were taken at the Western Channel Observatory long-term monitoring station L4 in September 2010 and January 2011. These samples were analysed by microscopy and metagenetic analysis of the 18S nuclear small subunit ribosomal RNA gene using the 454 pyrosequencing platform. Following quality control a total of 419,042 sequences were obtained for all samples. The sequences clustered in to 205 operational taxonomic units using a 97% similarity cut-off. Allocation of taxonomy by comparison with the National Centre for Biotechnology Information database identified 138 OTUs to species level, 11 to genus level and 1 to order, <2.5% of sequences were classified as unknowns. By comparison a skilled microscopic analyst was able to routinely enumerate only 75 taxonomic groups. Conclusions: The percentage of OTUs assigned to major eukaryotic taxonomic groups broadly aligns between the metagenetic and morphological analysis and are dominated by Copepoda. However, the metagenetics reveals a previously hidden taxonomic richness, especially for Copepoda and meroplankton such as Bivalvia, Gastropoda and Polychaeta. It also reveals rare species and parasites. We conclude that Next Generation Sequencing of 18S amplicons is a powerful tool for estimating diversity and species richness of zooplankton communities.
Resumo:
Microorganisms play an important role in the transformation of material within the earth's crust. The storage of CO2 could affect the composition of inorganic and organic components in the reservoir, consequently influencing microbial activities. To study the microbial induced processes together with geochemical, petrophysical and mineralogical changes, occurring during CO2 storage, long-term laboratory experiments under simulated reservoir P-T conditions were carried out. Clean inner core sections, obtained from the reservoir region at the CO2 storage site in Ketzin (Germany) from a depth of about 650 m, were incubated in high pressure vessels together with sterile synthetic formation brine under in situ P-T conditions of 5.5 MPa and 40°C. A 16S rDNA based fingerprinting method was used to identify the dominant species in DNA extracts of pristine sandstone samples. Members of the alpha- and beta-subdivisions of Proteobacteria and the Actinobacteria were identified. So far sequences belonging to facultative anaerobic, chemoheterotrophic bacteria (Burkholderia fungorum, Agrobacterium tumefaciens) gaining their energy from the oxidation of organic molecules and a genus also capable of chemolithoautotrophic growth (Hydrogenophaga) was identified. During CO2 incubation minor changes in the microbial community composition were observed. The majority of microbes were able to adapt to the changed conditions. During CO2 exposure increased concentrations of Ca**2+, K**+, Mg**2+ and SO4**2- were observed. Partially, concentration rises are (i) due to equilibration between rock pore water and synthetic brine, and (ii) between rock and brine, and are thus independent on CO2 exposure. However, observed concentrations of Ca**2+, K**+, Mg**2+ are even higher than in the original reservoir fluid and therefore indicate mineral dissolution due to CO2 exposure.
Resumo:
Bacterial biofilms provide cues for the settlement of marine invertebrates such as coral larvae, and are therefore important for the resilience and recovery of coral reefs. This study aimed to better understand how ocean acidification may affect the community composition and diversity of bacterial biofilms on surfaces under naturally reduced pH conditions. Settlement tiles were deployed at coral reefs in Papua New Guinea along pH gradients created by two CO2 seeps, and upper and lower tiles surfaces were sampled 5 and 13 months after deployment. Automated Ribosomal Intergenic Spacer Analysis were used to characterize more than 200 separate bacterial communities, complemented by amplicon sequencing of the bacterial 16S rRNA gene of 16 samples. The bacterial biofilm consisted predominantly of Alpha-, Gamma- and Deltaproteobacteria, as well as Cyanobacteria, Flavobacteriia and Cytophaga, whereas putative settlement-inducing taxa only accounted for a small fraction of the community. Bacterial biofilm composition was heterogeneous with approximately 25% shared operational taxonomic units between samples. Among the observed environmental parameters, pH only had a weak effect on community composition (R² ~ 1%) and did not affect community richness and evenness. In contrast, there were strong differences between upper and lower surfaces (contrasting in light exposure and grazing intensity). There also appeared to be a strong interaction between bacterial biofilm composition and the macroscopic components of the tile community. Our results suggest that on mature settlement surfaces in situ, pH does not have a strong impact on the composition of bacterial biofilms. Other abiotic and biotic factors such as light exposure and interactions with other organisms may be more important in shaping bacterial biofilms than changes in seawater pH.
Resumo:
The pufferfish Fugu rubripes has a genome ≈7.5 times smaller than that of mammals but with a similar number of genes. Although conserved synteny has been demonstrated between pufferfish and mammals across some regions of the genome, there is some controversy as to what extent Fugu will be a useful model for the human genome, e.g., [Gilley, J., Armes, N. & Fried, M. (1997) Nature (London) 385, 305–306]. We report extensive conservation of synteny between a 1.5-Mb region of human chromosome 11 and <100 kb of the Fugu genome in three overlapping cosmids. Our findings support the idea that the majority of DNA in the region of human chromosome 11p13 is intergenic. Comparative analysis of three unrelated genes with quite different roles, WT1, RCN1, and PAX6, has revealed differences in their structural evolution. Whereas the human WT1 gene can generate 16 protein isoforms via a combination of alternative splicing, RNA editing, and alternative start site usage, our data predict that Fugu WT1 is capable of generating only two isoforms. This raises the question of the extent to which the evolution of WT1 isoforms is related to the evolution of the mammalian genitourinary system. In addition, this region of the Fugu genome shows a much greater overall compaction than usual but with significant noncoding homology observed at the PAX6 locus, implying that comparative genomics has identified regulatory elements associated with this gene.
Resumo:
A loxP-transposon retrofitting strategy for generating large nested deletions from one end of the insert DNA in bacterial artificial chromosomes and P1 artificial chromosomes was described recently [Chatterjee, P. K. & Coren, J. S. (1997) Nucleic Acids Res. 25, 2205–2212]. In this report, we combine this procedure with direct sequencing of nested-deletion templates by using primers located in the transposon end to illustrate its value for position-specific single-nucleotide polymorphism (SNP) discovery from chosen regions of large insert clones. A simple ampicillin sensitivity screen was developed to facilitate identification and recovery of deletion clones free of transduced transposon plasmid. This directed approach requires minimal DNA sequencing, and no in vitro subclone library generation; positionally oriented SNPs are a consequence of the method. The procedure is used to discover new SNPs as well as physically map those identified from random subcloned libraries or sequence databases. The deletion templates, positioned SNPs, and markers are also used to orient large insert clones into a contig. The deletion clone can serve as a ready resource for future functional genomic studies because each carries a mammalian cell-specific antibiotic resistance gene from the transposon. Furthermore, the technique should be especially applicable to the analysis of genomes for which a full genome sequence or radiation hybrid cell lines are unavailable.
Resumo:
Pax proteins are a family of transcription factors with a highly conserved paired domain; many members also contain a paired-type homeodomain and/or an octapeptide. Nine mammalian Pax genes are known and classified into four subgroups: Pax-1/9, Pax-2/5/8, Pax-3/7, and Pax-4/6. Most of these genes are involved in nervous system development. In particular, Pax-6 is a key regulator that controls eye development in vertebrates and Drosophila. Although the Pax-4/6 subgroup seems to be more closely related to Pax-2/5/8 than to Pax-3/7 or Pax-1/9, its evolutionary origin is unknown. We therefore searched for a Pax-6 homolog and related genes in Cnidaria, which is the lowest phylum of animals that possess a nervous system and eyes. A sea nettle (a jellyfish) genomic library was constructed and two pax genes (Pax-A and -B) were isolated and partially sequenced. Surprisingly, unlike most known Pax genes, the paired box in these two genes contains no intron. In addition, the complete cDNA sequences of hydra Pax-A and -B were obtained. Hydra Pax-B contains both the homeodomain and the octapeptide, whereas hydra Pax-A contains neither. DNA binding assays showed that sea nettle Pax-A and -B and hydra Pax-A paired domains bound to a Pax-5/6 site and a Pax-5 site, although hydra Pax-B paired domain bound neither. An alignment of all available paired domain sequences revealed two highly conserved regions, which cover the DNA binding contact positions. Phylogenetic analysis showed that Pax-A and especially Pax-B were more closely related to Pax-2/5/8 and Pax-4/6 than to Pax-1/9 or Pax-3/7 and that the Pax genes can be classified into two supergroups: Pax-A/Pax-B/Pax-2/5/8/4/6 and Pax-1/9/3/7. From this analysis and the gene structure, we propose that modern Pax-4/6 and Pax-2/5/8 genes evolved from an ancestral gene similar to cnidarian Pax-B, having both the homeodomain and the octapeptide.
Resumo:
Multiple-complete-digest mapping is a DNA mapping technique based on complete-restriction-digest fingerprints of a set of clones that provides highly redundant coverage of the mapping target. The maps assembled from these fingerprints order both the clones and the restriction fragments. Maps are coordinated across three enzymes in the examples presented. Starting with yeast artificial chromosome contigs from the 7q31.3 and 7p14 regions of the human genome, we have produced cosmid-based maps spanning more than one million base pairs. Each yeast artificial chromosome is first subcloned into cosmids at a redundancy of ×15–30. Complete-digest fragments are electrophoresed on agarose gels, poststained, and imaged on a fluorescent scanner. Aberrant clones that are not representative of the underlying genome are rejected in the map construction process. Almost every restriction fragment is ordered, allowing selection of minimal tiling paths with clone-to-clone overlaps of only a few thousand base pairs. These maps demonstrate the practicality of applying the experimental and software-based steps in multiple-complete-digest mapping to a target of significant size and complexity. We present evidence that the maps are sufficiently accurate to validate both the clones selected for sequencing and the sequence assemblies obtained once these clones have been sequenced by a “shotgun” method.
Resumo:
An mAb was raised to the C5 phagosomal antigen in Paramecium multimicronucleatum. To determine its function, the cDNA and genomic DNA encoding C5 were cloned. This antigen consisted of 315 amino acid residues with a predicted molecular weight of 36,594, a value similar to that determined by SDS-PAGE. Sequence comparisons uncovered a low but significant homology with a Schizosaccharomyces pombe protein and the C-terminal half of the β-fructofuranosidase protein of Zymomonas mobilis. Lacking an obvious transmembrane domain or a possible signal sequence at the N terminus, C5 was predicted to be a soluble protein, whereas immunofluorescence data showed that it was present on the membranes of vesicles and digestive vacuoles (DVs). In cells that were minimally permeabilized but with intact DVs, C5 was found to be located on the cytosolic surface of the DV membranes. Immunoblotting of proteins from the purified and KCl-washed DVs showed that C5 was tightly bound to the DV membranes. Cryoelectron microscopy also confirmed that C5 was on the cytosolic surface of the discoidal vesicles, acidosomes, and lysosomes, organelles known to fuse with the membranes of the cytopharynx, the DVs of stages I (DV-I) and II (DV-II), respectively. Although C5 was concentrated more on the mature than on the young DV membranes, the striking observation was that the cytopharyngeal membrane that is derived from the discoidal vesicles was almost devoid of C5. Approximately 80% of the C5 was lost from the discoidal vesicle-derived membrane after this membrane fused with the cytopharyngeal membrane. Microinjection of the mAb to C5 greatly inhibited the fusion of the discoidal vesicles with the cytopharyngeal membrane and thus the incorporation of the discoidal vesicle membranes into the DV membranes. Taken together, these results suggest that C5 is a membrane protein that is involved in binding and/or fusion of the discoidal vesicles with the cytopharyngeal membrane that leads to DV formation.
Resumo:
We report automated DNA sequencing in 16-channel microchips. A microchip prefilled with sieving matrix is aligned on a heating plate affixed to a movable platform. Samples are loaded into sample reservoirs by using an eight-tip pipetting device, and the chip is docked with an array of electrodes in the focal plane of a four-color scanning detection system. Under computer control, high voltage is applied to the appropriate reservoirs in a programmed sequence that injects and separates the DNA samples. An integrated four-color confocal fluorescent detector automatically scans all 16 channels. The system routinely yields more than 450 bases in 15 min in all 16 channels. In the best case using an automated base-calling program, 543 bases have been called at an accuracy of >99%. Separations, including automated chip loading and sample injection, normally are completed in less than 18 min. The advantages of DNA sequencing on capillary electrophoresis chips include uniform signal intensity and tolerance of high DNA template concentration. To understand the fundamentals of these unique features we developed a theoretical treatment of cross-channel chip injection that we call the differential concentration effect. We present experimental evidence consistent with the predictions of the theory.
Resumo:
A de novo sequencing program for proteins is described that uses tandem MS data from electron capture dissociation and collisionally activated dissociation of electrosprayed protein ions. Computer automation is used to convert the fragment ion mass values derived from these spectra into the most probable protein sequence, without distinguishing Leu/Ile. Minimum human input is necessary for the data reduction and interpretation. No extra chemistry is necessary to distinguish N- and C-terminal fragments in the mass spectra, as this is determined from the electron capture dissociation data. With parts-per-million mass accuracy (now available by using higher field Fourier transform MS instruments), the complete sequences of ubiquitin (8.6 kDa) and melittin (2.8 kDa) were predicted correctly by the program. The data available also provided 91% of the cytochrome c (12.4 kDa) sequence (essentially complete except for the tandem MS-resistant region K13–V20 that contains the cyclic heme). Uncorrected mass values from a 6-T instrument still gave 86% of the sequence for ubiquitin, except for distinguishing Gln/Lys. Extensive sequencing of larger proteins should be possible by applying the algorithm to pieces of ≈10-kDa size, such as products of limited proteolysis.