51 resultados para Gene Set Enrichment
Resumo:
Clusters of orthologous groups [COGs; Tatusov, R. L., Koonin, E. V. & Lipman, D. J. (1997) Science 278, 631–637] were identified for a set of 13 completely sequenced herpesviruses. Each COG represented a family of gene products conserved across several herpes genomes. These families were defined without using an arbitrary threshold criterion based on sequence similarity. The COG technique was modified so that variable stringency in COG construction was possible. High stringencies identify a core set of highly conserved genes. Varying COG stringency reveals differences in the degree of conservation between functional classes of genes. The COG data were used to construct whole-genome phylogenetic trees based on gene content. These trees agree well with trees based on other methods and are robust when tested by bootstrap analysis. The COG data also were used to construct a reciprocal tree that clustered genes with similar phylogenetic profiles. This clustering may give clues to genes with related functions or with related histories of acquisition and loss during herpesvirus evolution.
Resumo:
We present an approach to map large numbers of Tc1 transposon insertions in the genome of Caenorhabditis elegans. Strains have been described that contain up to 500 polymorphic Tc1 insertions. From these we have cloned and shotgun sequenced over 2000 Tc1 flanks, resulting in an estimated set of 400 or more distinct Tc1 insertion alleles. Alignment of these sequences revealed a weak Tc1 insertion site consensus sequence that was symmetric around the invariant TA target site and reads CAYATATRTG. The Tc1 flanking sequences were compared with 40 Mbp of a C. elegans genome sequence. We found 151 insertions within the sequenced area, a density of ≈1 Tc1 insertion in every 265 kb. As the rest of the C. elegans genome sequence is obtained, remaining Tc1 alleles will fall into place. These mapped Tc1 insertions can serve two functions: (i) insertions in or near genes can be used to isolate deletion derivatives that have that gene mutated; and (ii) they represent a dense collection of polymorphic sequence-tagged sites. We demonstrate a strategy to use these Tc1 sequence-tagged sites in fine-mapping mutations.
Resumo:
The molecular mechanisms of pulmonary fibrosis are poorly understood. We have used oligonucleotide arrays to analyze the gene expression programs that underlie pulmonary fibrosis in response to bleomycin, a drug that causes lung inflammation and fibrosis, in two strains of susceptible mice (129 and C57BL/6). We then compared the gene expression patterns in these mice with 129 mice carrying a null mutation in the epithelial-restricted integrin β6 subunit (β6−/−), which develop inflammation but are protected from pulmonary fibrosis. Cluster analysis identified two distinct groups of genes involved in the inflammatory and fibrotic responses. Analysis of gene expression at multiple time points after bleomycin administration revealed sequential induction of subsets of genes that characterize each response. The availability of this comprehensive data set should accelerate the development of more effective strategies for intervention at the various stages in the development of fibrotic diseases of the lungs and other organs.
Resumo:
The role of the male gamete—the sperm cell—in the process of fertilization is to recognize, adhere to, and fuse with the female gamete. These highly specialized functions are expected to be controlled by activation of a unique set of genes. However, male gametic cells traditionally have been regarded as transcriptionally quiescent because of highly condensed chromatin and a very reduced amount of cytoplasm. Here, we provide evidence for male gamete-specific gene expression in flowering plants. We identified and characterized a gene, LGC1, which was shown to be expressed exclusively in the male gametic cells. The gene product of LGC1 was localized at the surface of male gametic cells, suggesting a possible role in sperm–egg interactions. These findings represent an important step toward defining the molecular mechanisms of male gamete development and the cellular processes involved in fertilization of flowering plants.
Resumo:
Analysis of previously published sets of DNA microarray gene expression data by singular value decomposition has uncovered underlying patterns or “characteristic modes” in their temporal profiles. These patterns contribute unequally to the structure of the expression profiles. Moreover, the essential features of a given set of expression profiles are captured using just a small number of characteristic modes. This leads to the striking conclusion that the transcriptional response of a genome is orchestrated in a few fundamental patterns of gene expression change. These patterns are both simple and robust, dominating the alterations in expression of genes throughout the genome. Moreover, the characteristic modes of gene expression change in response to environmental perturbations are similar in such distant organisms as yeast and human cells. This analysis reveals simple regularities in the seemingly complex transcriptional transitions of diverse cells to new states, and these provide insights into the operation of the underlying genetic networks.
Resumo:
Analyses of complete genomes indicate that a massive prokaryotic gene transfer (or transfers) preceded the formation of the eukaryotic cell. In comparisons of the entire set of Methanococcus jannaschii genes with their orthologs from Escherichia coli, Synechocystis 6803, and the yeast Saccharomyces cerevisiae, it is shown that prokaryotic genomes consist of two different groups of genes. The deeper, diverging informational lineage codes for genes which function in translation, transcription, and replication, and also includes GTPases, vacuolar ATPase homologs, and most tRNA synthetases. The more recently diverging operational lineage codes for amino acid synthesis, the biosynthesis of cofactors, the cell envelope, energy metabolism, intermediary metabolism, fatty acid and phospholipid biosynthesis, nucleotide biosynthesis, and regulatory functions. In eukaryotes, the informational genes are most closely related to those of Methanococcus, whereas the majority of operational genes are most closely related to those of Escherichia, but some are closest to Methanococcus or to Synechocystis.
Resumo:
5′-End fragments of two genes encoding plastid-localized acetyl-CoA carboxylase (ACCase; EC 6.4.1.2) of wheat (Triticum aestivum) were cloned and sequenced. The sequences of the two genes, Acc-1,1 and Acc-1,2, are 89% identical. Their exon sequences are 98% identical. The amino acid sequence of the biotin carboxylase domain encoded by Acc-1,1 and Acc-1,2 is 93% identical with the maize plastid ACCase but only 80–84% identical with the cytosolic ACCases from other plants and from wheat. Four overlapping fragments of cDNA covering the entire coding region were cloned by PCR and sequenced. The wheat plastid ACCase ORF contains 2,311 amino acids with a predicted molecular mass of 255 kDa. A putative transit peptide is present at the N terminus. Comparison of the genomic and cDNA sequences revealed introns at conserved sites found in the genes of other plant multifunctional ACCases, including two introns absent from the wheat cytosolic ACCase genes. Transcription start sites of the plastid ACCase genes were estimated from the longest cDNA clones obtained by 5′-RACE (rapid amplification of cDNA ends). The untranslated leader sequence encoded by the Acc-1 genes is at least 130–170 nucleotides long and is interrupted by an intron. Southern analysis indicates the presence of only one copy of the gene in each ancestral chromosome set. The gene maps near the telomere on the short arm of chromosomes 2A, 2B, and 2D. Identification of three different cDNAs, two corresponding to genes Acc-1,1 and Acc-1,2, indicates that all three genes are transcriptionally active.
Resumo:
Expressed sequence tags (ESTs) are randomly sequenced cDNA clones. Currently, nearly 3 million human and 2 million mouse ESTs provide valuable resources that enable researchers to investigate the products of gene expression. The EST databases have proven to be useful tools for detecting homologous genes, for exon mapping, revealing differential splicing, etc. With the increasing availability of large amounts of poorly characterised eukaryotic (notably human) genomic sequence, ESTs have now become a vital tool for gene identification, sometimes yielding the only unambiguous evidence for the existence of a gene expression product. However, BLAST-based Web servers available to the general user have not kept pace with these developments and do not provide appropriate tools for querying EST databases with large highly spliced genes, often spanning 50 000–100 000 bases or more. Here we describe Gene2EST (http://woody.embl-heidelberg.de/gene2est/), a server that brings together a set of tools enabling efficient retrieval of ESTs matching large DNA queries and their subsequent analysis. RepeatMasker is used to mask dispersed repetitive sequences (such as Alu elements) in the query, BLAST2 for searching EST databases and Artemis for graphical display of the findings. Gene2EST combines these components into a Web resource targeted at the researcher who wishes to study one or a few genes to a high level of detail.
Resumo:
The transformation-associated recombination (TAR) cloning technique allows selective and accurate isolation of chromosomal regions and genes from complex genomes. The technique is based on in vivo recombination between genomic DNA and a linearized vector containing homologous sequences, or hooks, to the gene of interest. The recombination occurs during transformation of yeast spheroplasts that results in the generation of a yeast artificial chromosome (YAC) containing the gene of interest. To further enhance and refine the TAR cloning technology, we determined the minimal size of a specific hook required for gene isolation utilizing the Tg.AC mouse transgene as a targeted region. For this purpose a set of vectors containing a B1 repeat hook and a Tg.AC-specific hook of variable sizes (from 20 to 800 bp) was constructed and checked for efficiency of transgene isolation by a radial TAR cloning. When vectors with a specific hook that was ≥60 bp were utilized, ∼2% of transformants contained circular YACs with the Tg.AC transgene sequences. Efficiency of cloning dramatically decreased when the TAR vector contained a hook of 40 bp or less. Thus, the minimal length of a unique sequence required for gene isolation by TAR is ∼60 bp. No transgene-positive YAC clones were detected when an ARS element was incorporated into a vector, demonstrating that the absence of a yeast origin of replication in a vector is a prerequisite for efficient gene isolation by TAR cloning.
Resumo:
While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi.shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed by first clustering, then assembling EST and annotated gene sequences from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes and as a resource for comparative sequence analysis.
Resumo:
We have systematically characterized gene expression patterns in 49 adult and embryonic mouse tissues by using cDNA microarrays with 18,816 mouse cDNAs. Cluster analysis defined sets of genes that were expressed ubiquitously or in similar groups of tissues such as digestive organs and muscle. Clustering of expression profiles was observed in embryonic brain, postnatal cerebellum, and adult olfactory bulb, reflecting similarities in neurogenesis and remodeling. Finally, clustering genes coding for known enzymes into 78 metabolic pathways revealed a surprising coordination of expression within each pathway among different tissues. On the other hand, a more detailed examination of glycolysis revealed tissue-specific differences in profiles of key regulatory enzymes. Thus, by surveying global gene expression by using microarrays with a large number of elements, we provide insights into the commonality and diversity of pathways responsible for the development and maintenance of the mammalian body plan.
Resumo:
Programmed cell death (PCD) during neuronal development and disease has been shown to require de novo RNA synthesis. However, the time course and regulation of target genes is poorly understood. By using a brain-biased array of over 7,500 cDNAs, we profiled this gene expression component of PCD in cerebellar granule neurons challenged separately by potassium withdrawal, combined potassium and serum withdrawal, and kainic acid administration. We found that hundreds of genes were significantly regulated in discreet waves including known genes whose protein products are involved in PCD. A restricted set of genes was regulated by all models, providing evidence that signals inducing PCD can regulate large assemblages of genes (of which a restricted subset may be shared in multiple pathways).
Resumo:
We performed a genome-wide analysis of gene expression in primary human CD15+ myeloid progenitor cells. By using the serial analysis of gene expression (SAGE) technique, we obtained quantitative information for the expression of 37,519 unique SAGE-tag sequences. Of these unique tags, (i) 25% were detected at high and intermediate levels, whereas 75% were present as single copies, (ii) 53% of the tags matched known expressed sequences, 34% of which were matched to more than one known expressed sequence, and (iii) 47% of the tags had no matches and represent potentially novel genes. The correct genes were confirmed by application of the generation of longer cDNA fragments from SAGE tags for gene identification (GLGI) technique for high-copy tags with multiple matches. A set of genes known to be important in myeloid differentiation were expressed at various levels and used different spliced forms. This study provides a normal baseline for comparison of gene expression in myeloid diseases. The strategy of using SAGE and GLGI techniques in this study has broad applications to the genome-wide identification of expressed genes.
Resumo:
Typical general transcription factors, such as TATA binding protein and TFII B, have not yet been identified in any member of the Trypanosomatidae family of parasitic protozoa. Interestingly, mRNA coding genes do not appear to have discrete transcriptional start sites, although in most cases they require an RNA polymerase that has the biochemical properties of eukaryotic RNA polymerase II. A discrete transcription initiation site may not be necessary for mRNA synthesis since the sequences upstream of each transcribed coding region are trimmed from the nascent transcript when a short m7G-capped RNA is added during mRNA maturation. This short 39 nt m7G-capped RNA, the spliced leader (SL) sequence, is expressed as an ∼100 nt long RNA from a set of reiterated, though independently transcribed, genes in the trypanosome genome. Punctuation of the 5′ end of mRNAs by a m7G cap-containing spliced leader is a developing theme in the lower eukaryotic world; organisms as diverse as Euglena and nematode worms, including Caenorhabditis elegans, utilize SL RNA in their mRNA maturation programs. Towards understanding the coordination of SL RNA and mRNA expression in trypanosomes, we have begun by characterizing SL RNA gene expression in the model trypanosome Leptomonas seymouri. Using a homologous in vitro transcription system, we demonstrate in this study that the SL RNA is transcribed by RNA polymerase II. During SL RNA transcription, accurate initiation is determined by an initiator element with a loose consensus of CYAC/AYR(+1). This element, as well as two additional basal promoter elements, is divergent in sequence from the basal transcription elements seen in other eukaryotic gene promoters. We show here that the in vitro transcription extract contains a binding activity that is specific for the initiator element and thus may participate in recruiting RNA polymerase II to the SL RNA gene promoter.
Resumo:
Aquatic photosynthetic organisms, including the green alga Chlamydomonas reinhardtii, induce a set of genes for a carbon-concentrating mechanism (CCM) to acclimate to CO2-limiting conditions. This acclimation is modulated by some mechanisms in the cell to sense CO2 availability. Previously, a high-CO2-requiring mutant C16 defective in an induction of the CCM was isolated from C. reinhardtii by gene tagging. By using this pleiotropic mutant, we isolated a nuclear regulatory gene, Ccm1, encoding a 699-aa hydrophilic protein with a putative zinc-finger motif in its N-terminal region and a Gln repeat characteristic of transcriptional activators. Introduction of Ccm1 into this mutant restored an active carbon transport through the CCM, development of a pyrenoid structure in the chloroplast, and induction of a set of CCM-related genes. That a 5,128-base Ccm1 transcript and also the translation product of 76 kDa were detected in both high- and low-CO2 conditions suggests that CCM1 might be modified posttranslationally. These data indicate that Ccm1 is essential to control the induction of CCM by sensing CO2 availability in Chlamydomonas cells. In addition, complementation assay and identification of the mutation site of another pleiotropic mutant, cia5, revealed that His-54 within the putative zinc-finger motif of the CCM1 is crucial to its regulatory function.