883 resultados para Genotyping by sequencing
Resumo:
Identification of host factors that interact with pathogens is crucial to an understanding of infectious disease, but direct screening for host mutations to aid in this task is not feasible in mammals. The nematode Caenorhabditis elegans is a genetically tractable alternative for investigating the pathogenic bacterium Pseudomonas aeruginosa. A P. aeruginosa toxin, produced at high cell density under control of the quorum-sensing regulators LasR and RhlR, rapidly and lethally paralyzes C. elegans. Loss-of-function mutations in C. elegans egl-9, a gene required for normal egg laying, confer strong resistance to the paralysis. Thus, activation of EGL-9 or of a pathway that includes it may lead to the paralysis. The molecular identity of egl-9 was determined by transformation rescue and DNA sequencing. A mammalian homologue of EGL-9 is expressed in tissues in which exposure to P. aeruginosa could have clinical effects.
Resumo:
We report here that wild-type Escherichia coli can grow on the chitin disaccharide, N,N′-diacetylchitobiose (GlcNAc)2, as the sole source of carbon. Transposon mutants were isolated that were unable to ferment (GlcNAc)2 but grew normally on the monosaccharide GlcNAc. One such mutant was used to screen a wild-type E. coli genomic cosmid library for restoration of (GlcNAc)2 fermentation. A partial sequence analysis of the isolated fragment mapped the clone to the (previously sequenced) E. coli genome between 39.0 and 39.2 min. The nucleotide ORFs at this region had been previously assigned to code for a “cryptic” cellobiose utilization (cel) operon. We report here, however, that functional analysis of the operon, including growth and chemotaxis, reveal that it encodes a set of proteins that are not cryptic, but are induced by (GlcNAc)2 and catabolize the disaccharide. We therefore propose to rename the cel operon as the chb (N,N′-diacetylchitobiose) operon, with the letter designation of the genes of the operon to be reassigned consistent with the nomenclature based on functional characterization of the gene products as follows: celA to chbB, celB to chbC, celC to chbA, celD to chbR, and celF to chbF. Furthermore, sequencing evidence indicates that the operon contains an additional gene of unknown function to be designated as chbG. Thus, the overall gene sequence is to be named chbBCARFG.
Resumo:
The human polyomavirus JC (JCV) causes the central nervous system demyelinating disease progressive multifocal leukoencephalopathy. Previously, we showed that 40% of Caucasians in the United States excrete JCV in the urine as detected by PCR. We have now studied 68 Navaho from New Mexico, 25 Flathead from Montana, and 29 Chamorro from Guam. By using PCR amplification of a fragment of the VP1 gene, JCV DNA was detected in the urine of 45 (66%) Navaho, 14 (56%) Flathead, and 20 (69%) Chamorro. Genotyping of viral DNAs in these cohorts by cycle sequencing showed predominantly type 2 (Asian), rather than type 1 (European). Type 1 is the major type in the United States and Hungary. Type 2 can be further subdivided into 2A, 2B, and 2C. Type 2A is found in China and Japan. Type 2B is a subtype related to the East Asian type, and is now found in Europe and the United States. The large majority (56–89%) of strains excreted by Native Americans and Pacific Islanders were the type 2A subtype, consistent with the origin of these strains in Asia. These findings indicate that JCV infection of Native Americans predates contact with Europeans, and likely predates migration of Amerind ancestors across the Bering land bridge around 12,000–30,000 years ago. If JCV had already differentiated into stable modern genotypes and subtypes prior to first settlement, the origin of JCV in humans may date from 50,000 to 100,000 years ago or more. We conclude that JCV may have coevolved with the human species, and that it provides a convenient marker for human migrations in both prehistoric and modern times.
Resumo:
An mAb was raised to the C5 phagosomal antigen in Paramecium multimicronucleatum. To determine its function, the cDNA and genomic DNA encoding C5 were cloned. This antigen consisted of 315 amino acid residues with a predicted molecular weight of 36,594, a value similar to that determined by SDS-PAGE. Sequence comparisons uncovered a low but significant homology with a Schizosaccharomyces pombe protein and the C-terminal half of the β-fructofuranosidase protein of Zymomonas mobilis. Lacking an obvious transmembrane domain or a possible signal sequence at the N terminus, C5 was predicted to be a soluble protein, whereas immunofluorescence data showed that it was present on the membranes of vesicles and digestive vacuoles (DVs). In cells that were minimally permeabilized but with intact DVs, C5 was found to be located on the cytosolic surface of the DV membranes. Immunoblotting of proteins from the purified and KCl-washed DVs showed that C5 was tightly bound to the DV membranes. Cryoelectron microscopy also confirmed that C5 was on the cytosolic surface of the discoidal vesicles, acidosomes, and lysosomes, organelles known to fuse with the membranes of the cytopharynx, the DVs of stages I (DV-I) and II (DV-II), respectively. Although C5 was concentrated more on the mature than on the young DV membranes, the striking observation was that the cytopharyngeal membrane that is derived from the discoidal vesicles was almost devoid of C5. Approximately 80% of the C5 was lost from the discoidal vesicle-derived membrane after this membrane fused with the cytopharyngeal membrane. Microinjection of the mAb to C5 greatly inhibited the fusion of the discoidal vesicles with the cytopharyngeal membrane and thus the incorporation of the discoidal vesicle membranes into the DV membranes. Taken together, these results suggest that C5 is a membrane protein that is involved in binding and/or fusion of the discoidal vesicles with the cytopharyngeal membrane that leads to DV formation.
Resumo:
Current evidence indicates that methylation of cytosine in mammalian DNA is restricted to both strands of the symmetrical sequence CpG, although there have been sporadic reports that sequences other than CpG may also be methylated. We have used a dual-labeling nearest neighbor technique and bisulphite genomic sequencing methods to investigate the nearest neighbors of 5-methylcytosine residues in mammalian DNA. We find that embryonic stem cells, but not somatic tissues, have significant cytosine-5 methylation at CpA and, to a lesser extent, at CpT. As the expression of the de novo methyltransferase Dnmt3a correlates well with the presence of non-CpG methylation, we asked whether Dnmt3a might be responsible for this modification. Analysis of genomic methylation in transgenic Drosophila expressing Dnmt3a reveals that Dnmt3a is predominantly a CpG methylase but also is able to induce methylation at CpA and at CpT.
Resumo:
We report automated DNA sequencing in 16-channel microchips. A microchip prefilled with sieving matrix is aligned on a heating plate affixed to a movable platform. Samples are loaded into sample reservoirs by using an eight-tip pipetting device, and the chip is docked with an array of electrodes in the focal plane of a four-color scanning detection system. Under computer control, high voltage is applied to the appropriate reservoirs in a programmed sequence that injects and separates the DNA samples. An integrated four-color confocal fluorescent detector automatically scans all 16 channels. The system routinely yields more than 450 bases in 15 min in all 16 channels. In the best case using an automated base-calling program, 543 bases have been called at an accuracy of >99%. Separations, including automated chip loading and sample injection, normally are completed in less than 18 min. The advantages of DNA sequencing on capillary electrophoresis chips include uniform signal intensity and tolerance of high DNA template concentration. To understand the fundamentals of these unique features we developed a theoretical treatment of cross-channel chip injection that we call the differential concentration effect. We present experimental evidence consistent with the predictions of the theory.
Resumo:
The function of many of the uncharacterized open reading frames discovered by genomic sequencing can be determined at the level of expressed gene products, the proteome. However, identifying the cognate gene from minute amounts of protein has been one of the major problems in molecular biology. Using yeast as an example, we demonstrate here that mass spectrometric protein identification is a general solution to this problem given a completely sequenced genome. As a first screen, our strategy uses automated laser desorption ionization mass spectrometry of the peptide mixtures produced by in-gel tryptic digestion of a protein. Up to 90% of proteins are identified by searching sequence data bases by lists of peptide masses obtained with high accuracy. The remaining proteins are identified by partially sequencing several peptides of the unseparated mixture by nanoelectrospray tandem mass spectrometry followed by data base searching with multiple peptide sequence tags. In blind trials, the method led to unambiguous identification in all cases. In the largest individual protein identification project to date, a total of 150 gel spots—many of them at subpicomole amounts—were successfully analyzed, greatly enlarging a yeast two-dimensional gel data base. More than 32 proteins were novel and matched to previously uncharacterized open reading frames in the yeast genome. This study establishes that mass spectrometry provides the required throughput, the certainty of identification, and the general applicability to serve as the method of choice to connect genome and proteome.
Resumo:
Heparin- and heparan sulfate-like glycosaminoglycans (HLGAGs) represent an important class of molecules that interact with and modulate the activity of growth factors, enzymes, and morphogens. Of the many biological functions for this class of molecules, one of its most important functions is its interaction with antithrombin III (AT-III). AT-III binding to a specific heparin pentasaccharide sequence, containing an unusual 3-O sulfate on a N-sulfated, 6-O sulfated glucosamine, increases 1,000-fold AT-III's ability to inhibit specific proteases in the coagulation cascade. In this manner, HLGAGs play an important biological and pharmacological role in the modulation of blood clotting. Recently, a sequencing methodology was developed to further structure-function relationships of this important class of molecules. This methodology combines a property-encoded nomenclature scheme to handle the large information content (properties) of HLGAGs, with matrix-assisted laser desorption ionization MS and enzymatic and chemical degradation as experimental constraints to rapidly sequence picomole quantities of HLGAG oligosaccharides. Using the above property-encoded nomenclature-matrix-assisted laser desorption ionization approach, we found that the sequence of the decasaccharide used in this study is ΔU2SHNS,6SI2SHNS,6SI2SHNS,6SIHNAc,6SGHNS,3S,6S (±DDD4–7). We confirmed our results by using integral glycan sequencing and one-dimensional proton NMR. Furthermore, we show that this approach is flexible and is able to derive sequence information on an oligosaccharide mixture. Thus, this methodology will make possible both the analysis of other unusual sequences in HLGAGs with important biological activity as well as provide the basis for the structural analysis of these pharamacologically important group of heparin/heparan sulfates.
Resumo:
The proliferation of various tumors is inhibited by the antagonists of growth hormone-releasing hormone (GHRH) in vitro and in vivo, but the receptors mediating the effects of GHRH antagonists have not been identified so far. Using an approach based on PCR, we detected two major splice variants (SVs) of mRNA for human GHRH receptor (GHRH-R) in human cancer cell lines, including LNCaP prostatic, MiaPaCa-2 pancreatic, MDA-MB-468 breast, OV-1063 ovarian, and H-69 small-cell lung carcinomas. In addition, high-affinity, low-capacity binding sites for GHRH antagonists were found on the membranes of cancer cell lines such as MiaPaCa-2 that are negative for the vasoactive intestinal peptide/pituitary adenylate cyclase-activating polypeptide receptor (VPAC-R) or lines such as LNCaP that are positive for VPAC-R. Sequence analysis of cDNAs revealed that the first three exons in SV1 and SV2 are replaced by a fragment of retained intron 3 having a new putative in-frame start codon. The rest of the coding region of SV1 is identical to that of human pituitary GHRH-R, whereas in SV2 exon 7 is spliced out, resulting in a 1-nt upstream frameshift, which leads to a premature stop codon in exon 8. The intronic sequence may encode a distinct 25-aa fragment of the N-terminal extracellular domain, which could serve as a proposed signal peptide. The continuation of the deduced protein sequence coded by exons 4–13 in SV1 is identical to that of pituitary GHRH-R. SV2 may encode a GHRH-R isoform truncated after the second transmembrane domain. Thus SVs of GHRH-Rs have now been identified in human extrapituitary cells. The findings support the view that distinct receptors are expressed on human cancer cells, which may mediate the antiproliferative effect of GHRH antagonists.
Resumo:
Syntax denotes a rule system that allows one to predict the sequencing of communication signals. Despite its significance for both human speech processing and animal acoustic communication, the representation of syntactic structure in the mammalian brain has not been studied electrophysiologically at the single-unit level. In the search for a neuronal correlate for syntax, we used playback of natural and temporally destructured complex species-specific communication calls—so-called composites—while recording extracellularly from neurons in a physiologically well defined area (the FM–FM area) of the mustached bat’s auditory cortex. Even though this area is known to be involved in the processing of target distance information for echolocation, we found that units in the FM–FM area were highly responsive to composites. The finding that neuronal responses were strongly affected by manipulation in the time domain of the natural composite structure lends support to the hypothesis that syntax processing in mammals occurs at least at the level of the nonprimary auditory cortex.
Resumo:
Transcription factors control eukaryotic polymerase II function by influencing the recruitment of multiprotein complexes to promoters and their subsequent integrated function. The complexity of the functional ‘transcriptosome’ has necessitated biochemical fractionation and subsequent protein sequencing on a grand scale to identify individual components. As a consequence, much is now known of the basal transcription complex. In contrast, less is known about the complexes formed at distal promoter elements. The c-fos SRE, for example, is known to bind Serum Response Factor (SRF) and ternary complex factors such as Elk-1. Their interaction with other factors at the SRE is implied but, to date, none have been identified. Here we describe the use of mass-spectrometric sequencing to identify six proteins, SRF, Elk-1 and four novel proteins, captured on SRE duplexes linked to magnetic beads. This approach is generally applicable to the characterisation of nucleic acid-bound protein complexes and the post-translational modification of their components.
Resumo:
Linkage and association analyses were performed to identify loci affecting disease susceptibility by scoring previously characterized sequence variations such as microsatellites and single nucleotide polymorphisms. Lack of markers in regions of interest, as well as difficulty in adapting various methods to high-throughput settings, often limits the effectiveness of the analyses. We have adapted the Escherichia coli mismatch detection system, employing the factors MutS, MutL and MutH, for use in PCR-based, automated, high-throughput genotyping and mutation detection of genomic DNA. Optimal sensitivity and signal-to-noise ratios were obtained in a straightforward fashion because the detection reaction proved to be principally dependent upon monovalent cation concentration and MutL concentration. Quantitative relationships of the optimal values of these parameters with length of the DNA test fragment were demonstrated, in support of the translocation model for the mechanism of action of these enzymes, rather than the molecular switch model. Thus, rapid, sequence-independent optimization was possible for each new genomic target region. Other factors potentially limiting the flexibility of mismatch scanning, such as positioning of dam recognition sites within the target fragment, have also been investigated. We developed several strategies, which can be easily adapted to automation, for limiting the analysis to intersample heteroduplexes. Thus, the principal barriers to the use of this methodology, which we have designated PCR candidate region mismatch scanning, in cost-effective, high-throughput settings have been removed.
Resumo:
The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power of linking cDNA sequence data (including EST sequences) with transcript profiles revealed by cDNA-AFLP, a highly reproducible differential display method based on restriction enzyme digests and selective amplification under high stringency conditions. We have developed a computer program (GenEST) that predicts the sizes of virtual transcript-derived fragments (TDFs) of in silico-digested cDNA sequences retrieved from databases. The vast majority of the resulting virtual TDFs could be traced back among the thousands of TDFs displayed on cDNA-AFLP gels. Sequencing of the corresponding bands excised from cDNA-AFLP gels revealed no inconsistencies. As a consequence, cDNA sequence databases can be screened very efficiently to identify genes with relevant expression profiles. The other way round, it is possible to switch from cDNA-AFLP gels to sequences in the databases. Using the restriction enzyme recognition sites, the primer extensions and the estimated TDF size as identifiers, the DNA sequence(s) corresponding to a TDF with an interesting expression pattern can be identified. In this paper we show examples in both directions by analyzing the plant parasitic nematode Globodera rostochiensis. Various novel pathogenicity factors were identified by combining ESTs from the infective stage juveniles with expression profiles of ∼4000 genes in five developmental stages produced by cDNA-AFLP.
Resumo:
Thioredoxins are 12-kDa proteins functional in the regulation of cellular processes throughout the animal, plant, and microbial kingdoms. Growing evidence with seeds suggests that an h-type of thioredoxin, reduced by NADPH via NADP-thioredoxin reductase, reduces disulfide bonds of target proteins and thereby acts as a wakeup call in germination. A better understanding of the role of thioredoxin in seeds as well as other systems could be achieved if more were known about the target proteins. To this end, we have devised a strategy for the comprehensive identification of proteins targeted by thioredoxin. Tissue extracts incubated with reduced thioredoxin are treated with a fluorescent probe (monobromobimane) to label sulfhydryl groups. The newly labeled proteins are isolated by conventional two-dimensional electrophoresis: (i) nonreducing/reducing or (ii) isoelectric focusing/reducing SDS/PAGE. The isolated proteins are identified by amino acid sequencing. Each electrophoresis system offers an advantage: the first method reveals the specificity of thioredoxin in the reduction of intramolecular vs. intermolecular disulfide bonds, whereas the second method improves the separation of the labeled proteins. By application of both methods to peanut seed extracts, we isolated at least 20 thioredoxin targets and identified 5—three allergens (Ara h2, Ara h3, and Ara h6) and two proteins not known to occur in peanut (desiccation-related and seed maturation protein). These findings open the door to the identification of proteins targeted by thioredoxin in a wide range of systems, thereby enhancing our understanding of its function and extending its technological and medical applications.
Resumo:
The recent sequencing of several complete genomes has made it possible to track the evolution of large gene families by their genomic structure. Following the large-scale association of exons encoding domains with well defined functions in invertebrates could be useful in predicting the function of complex multidomain proteins in mammals produced by accretion of domains. With this objective, we have determined the genomic structure of the 14 genes in invertebrates and vertebrates that contain rel domains. The sequence encoding the rel domain is defined by intronic boundaries and has been recombined with at least three structurally and functionally distinct genomic sequences to generate coding sequences for: (i) the rel/Dorsal/NFκB proteins that are retained in the cytoplasm by IkB-like proteins; (ii) the NFATc proteins that sense calcium signals and undergo cytoplasmic-to-nuclear translocation in response to dephosphorylation by calcineurin; and (iii) the TonEBP tonicity-responsive proteins. Remarkably, a single exon in each NFATc family member encodes the entire Ca2+/calcineurin sensing region, including nuclear import/export, calcineurin-binding, and substrate regions. The Rel/Dorsal proteins and the TonEBP proteins are present in Drosophila but not Caenorhabditis elegans. On the other hand, the calcium-responsive NFATc proteins are present only in vertebrates, suggesting that the NFATc family is dedicated to functions specific to vertebrates such as a recombinational immune response, cardiovascular development, and vertebrate-specific aspects of the development and function of the nervous system.