69 resultados para DNA-microarray data
em National Center for Biotechnology Information - NCBI
Resumo:
Methylation of cytosine in the 5 position of the pyrimidine ring is a major modification of the DNA in most organisms. In eukaryotes, the distribution and number of 5-methylcytosines (5mC) along the DNA is heritable but can also change with the developmental state of the cell and as a response to modifications of the environment. While DNA methylation probably has a number of functions, scientific interest has recently focused on the gene silencing effect methylation can have in eukaryotic cells. In particular, the discovery of changes in the methylation level during cancer development has increased the interest in this field. In the past, a vast amount of data has been generated with different levels of resolution ranging from 5mC content of total DNA to the methylation status of single nucleotides. We present here a database for DNA methylation data that attempts to unify these results in a common resource. The database is accessible via WWW (http://www.methdb.de). It stores information about the origin of the investigated sample and the experimental procedure, and contains the DNA methylation data. Query masks allow for searching for 5mC content, species, tissue, gene, sex, phenotype, sequence ID and DNA type. The output lists all available information including the relative gene expression level. DNA methylation patterns and methylation profiles are shown both as a graphical representation and as G/A/T/C/5mC-sequences or tables with sequence positions and methylation levels, respectively.
Resumo:
A statistical modeling approach is proposed for use in searching large microarray data sets for genes that have a transcriptional response to a stimulus. The approach is unrestricted with respect to the timing, magnitude or duration of the response, or the overall abundance of the transcript. The statistical model makes an accommodation for systematic heterogeneity in expression levels. Corresponding data analyses provide gene-specific information, and the approach provides a means for evaluating the statistical significance of such information. To illustrate this strategy we have derived a model to depict the profile expected for a periodically transcribed gene and used it to look for budding yeast transcripts that adhere to this profile. Using objective criteria, this method identifies 81% of the known periodic transcripts and 1,088 genes, which show significant periodicity in at least one of the three data sets analyzed. However, only one-quarter of these genes show significant oscillations in at least two data sets and can be classified as periodic with high confidence. The method provides estimates of the mean activation and deactivation times, induced and basal expression levels, and statistical measures of the precision of these estimates for each periodic transcript.
Resumo:
Precise classification of tumors is critically important for cancer diagnosis and treatment. It is also a scientifically challenging task. Recently, efforts have been made to use gene expression profiles to improve the precision of classification, with limited success. Using a published data set for purposes of comparison, we introduce a methodology based on classification trees and demonstrate that it is significantly more accurate for discriminating among distinct colon cancer tissues than other statistical approaches used heretofore. In addition, competing classification trees are displayed, which suggest that different genes may coregulate colon cancers.
Resumo:
We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.
Resumo:
We describe the time evolution of gene expression levels by using a time translational matrix to predict future expression levels of genes based on their expression levels at some initial time. We deduce the time translational matrix for previously published DNA microarray gene expression data sets by modeling them within a linear framework by using the characteristic modes obtained by singular value decomposition. The resulting time translation matrix provides a measure of the relationships among the modes and governs their time evolution. We show that a truncated matrix linking just a few modes is a good approximation of the full time translation matrix. This finding suggests that the number of essential connections among the genes is small.
Resumo:
The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77–80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73–76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10–14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332–333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45–48] and can be accessed at http://genome-www.stanford.edu/microarray.
Resumo:
Tuberculosis is a chronic infectious disease that is transmitted by cough-propelled droplets that carry the etiologic bacterium, Mycobacterium tuberculosis. Although currently available drugs kill most isolates of M. tuberculosis, strains resistant to each of these have emerged, and multiply resistant strains are increasingly widespread. The growing problem of drug resistance combined with a global incidence of seven million new cases per year underscore the urgent need for new antituberculosis therapies. The recent publication of the complete sequence of the M. tuberculosis genome has made possible, for the first time, a comprehensive genomic approach to the biology of this organism and to the drug discovery process. We used a DNA microarray containing 97% of the ORFs predicted from this sequence to monitor changes in M. tuberculosis gene expression in response to the antituberculous drug isoniazid. Here we show that isoniazid induced several genes that encode proteins physiologically relevant to the drug’s mode of action, including an operonic cluster of five genes encoding type II fatty acid synthase enzymes and fbpC, which encodes trehalose dimycolyl transferase. Other genes, not apparently within directly affected biosynthetic pathways, also were induced. These genes, efpA, fadE23, fadE24, and ahpC, likely mediate processes that are linked to the toxic consequences of the drug. Insights gained from this approach may define new drug targets and suggest new methods for identifying compounds that inhibit those targets.
Resumo:
A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
Resumo:
Analysis of previously published sets of DNA microarray gene expression data by singular value decomposition has uncovered underlying patterns or “characteristic modes” in their temporal profiles. These patterns contribute unequally to the structure of the expression profiles. Moreover, the essential features of a given set of expression profiles are captured using just a small number of characteristic modes. This leads to the striking conclusion that the transcriptional response of a genome is orchestrated in a few fundamental patterns of gene expression change. These patterns are both simple and robust, dominating the alterations in expression of genes throughout the genome. Moreover, the characteristic modes of gene expression change in response to environmental perturbations are similar in such distant organisms as yeast and human cells. This analysis reveals simple regularities in the seemingly complex transcriptional transitions of diverse cells to new states, and these provide insights into the operation of the underlying genetic networks.
Resumo:
The Enterococcus faecalis conjugative plasmid pAD1 (60 kb) encodes a mating response to the recipient-produced peptide sex pheromone cAD1. The response involves two key plasmid-encoded regulatory proteins: TraE1, which positively regulates all or most structural genes relating to conjugation, and TraA, which binds DNA and negatively regulates expression of traE1. In vitro studies that included development of a DNA-associated protein-tag affinity chromatography technique showed that TraA (37.9 kDa) binds directly to cAD1 near its carboxyl-terminal end and, as a consequence, loses its affinity for DNA. Analyses of genetically modified TraA proteins indicated that truncations within the carboxyl-terminal 9 residues significantly affected the specificity of peptide-directed association/dissociation of DNA. The data support earlier observations that transposon insertions near the 3′ end of traA eliminated the ability of cells to respond to cAD1.
Resumo:
The representational difference analysis (RDA) and other subtraction techniques are used to enrich sample-specific sequences by elimination of ubiquitous sequences existing in both the sample of interest (tester) and the subtraction partner (driver). While applying the RDA to genomic DNA of cutaneous lymphoma cells in order to identify tumor relevant alterations, we predominantly isolated repetitive sequences and artificial repeat-mediated fusion products of otherwise independent PCR fragments (PCR hybrids). Since these products severely interfered with the isolation of tester-specific fragments, we developed a considerably more robust and efficient approach, termed ligation-mediated subtraction (Limes). In first applications of Limes, genomic sequences and/or transcripts of genes involved in the regulation of transcription, such as transforming growth factor β stimulated clone 22 related gene (TSC-22R), cell death and cytokine production (caspase-1) or antigen presentation (HLA class II sequences), were found to be completely absent in a cutaneous lymphoma line. On the assumption that mutations in tumor-relevant genes can affect their transcription pattern, a protocol was developed and successfully applied that allows the identification of such sequences. Due to these results, Limes may substitute/supplement other subtraction/comparison techniques such as RDA or DNA microarray techniques in a variety of different research fields.
Resumo:
Gene expression profiling provides powerful analyses of transcriptional responses to cellular perturbation. In contrast to DNA array-based methods, reporter gene technology has been underused for this application. Here we describe a genomewide, genome-registered collection of Escherichia coli bioluminescent reporter gene fusions. DNA sequences from plasmid-borne, random fusions of E. coli chromosomal DNA to a Photorhabdus luminescens luxCDABE reporter allowed precise mapping of each fusion. The utility of this collection covering about 30% of the transcriptional units was tested by analyzing individual fusions representative of heat shock, SOS, OxyR, SoxRS, and cya/crp stress-responsive regulons. Each fusion strain responded as anticipated to environmental conditions known to activate the corresponding regulatory circuit. Thus, the collection mirrors E. coli's transcriptional wiring diagram. This genomewide collection of gene fusions provides an independent test of results from other gene expression analyses. Accordingly, a DNA microarray-based analysis of mitomycin C-treated E. coli indicated elevated expression of expected and unanticipated genes. Selected luxCDABE fusions corresponding to these up-regulated genes were used to confirm or contradict the DNA microarray results. The power of partnering gene fusion and DNA microarray technology to discover promoters and define operons was demonstrated when data from both suggested that a cluster of 20 genes encoding production of type I extracellular polysaccharide in E. coli form a single operon.
Resumo:
The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power of linking cDNA sequence data (including EST sequences) with transcript profiles revealed by cDNA-AFLP, a highly reproducible differential display method based on restriction enzyme digests and selective amplification under high stringency conditions. We have developed a computer program (GenEST) that predicts the sizes of virtual transcript-derived fragments (TDFs) of in silico-digested cDNA sequences retrieved from databases. The vast majority of the resulting virtual TDFs could be traced back among the thousands of TDFs displayed on cDNA-AFLP gels. Sequencing of the corresponding bands excised from cDNA-AFLP gels revealed no inconsistencies. As a consequence, cDNA sequence databases can be screened very efficiently to identify genes with relevant expression profiles. The other way round, it is possible to switch from cDNA-AFLP gels to sequences in the databases. Using the restriction enzyme recognition sites, the primer extensions and the estimated TDF size as identifiers, the DNA sequence(s) corresponding to a TDF with an interesting expression pattern can be identified. In this paper we show examples in both directions by analyzing the plant parasitic nematode Globodera rostochiensis. Various novel pathogenicity factors were identified by combining ESTs from the infective stage juveniles with expression profiles of ∼4000 genes in five developmental stages produced by cDNA-AFLP.
Resumo:
Microarray technology represents a potentially powerful method for identifying cell type- and regionally restricted genes expressed in the brain. Here we have combined a microarray analysis of differential gene expression among five selected brain regions, including the amygdala, cerebellum, hippocampus, olfactory bulb, and periaqueductal gray, with in situ hybridization. On average, 0.3% of the 34,000 genes interrogated were highly enriched in each of the five regions, relative to the others. In situ hybridization performed on a subset of amygdala-enriched genes confirmed in most cases the overall region-specificity predicted by the microarray data and identified additional sites of brain expression not examined on the microarrays. Strikingly, the majority of these genes exhibited boundaries of expression within the amygdala corresponding to cytoarchitectonically defined subnuclei. These results define a unique set of molecular markers for amygdaloid subnuclei and provide tools to genetically dissect their functional roles in different emotional behaviors.
Resumo:
A key step in the regulation of networks that control gene expression is the sequence-specific binding of transcription factors to their DNA recognition sites. A more complete understanding of these DNA–protein interactions will permit a more comprehensive and quantitative mapping of the regulatory pathways within cells, as well as a deeper understanding of the potential functions of individual genes regulated by newly identified DNA-binding sites. Here we describe a DNA microarray-based method to characterize sequence-specific DNA recognition by zinc-finger proteins. A phage display library, prepared by randomizing critical amino acid residues in the second of three fingers of the mouse Zif268 domain, provided a rich source of zinc-finger proteins with variant DNA-binding specificities. Microarrays containing all possible 3-bp binding sites for the variable zinc fingers permitted the quantitation of the binding site preferences of the entire library, pools of zinc fingers corresponding to different rounds of selection from this library, as well as individual Zif268 variants that were isolated from the library by using specific DNA sequences. The results demonstrate the feasibility of using DNA microarrays for genome-wide identification of putative transcription factor-binding sites.