954 resultados para Serial analysis of gene expression
Resumo:
Corticosteroids (aldosterone, cortisol/corticosterone) exert direct functional effects on cardiomyocytes. However, gene networks activated by corticosteroids in cardiomyocytes, as well as the involvement of the mineralocorticoid receptor (MR) vs the glucocorticoid receptor (GR) in these effects, remain largely unknown. Here we characterized the corticosteroid-dependent transcriptome in primary culture of neonatal mouse cardiomyocytes treated with 10(-6) M aldosterone, a concentration predicted to occupy both MR and GR. Serial analysis of gene expression revealed 101 aldosterone-regulated genes. The MR/GR specificity was characterized for one regulated transcript, namely ecto-ADP-ribosyltransferase-3 (Art3). Using cardiomyocytes from GR(null/null) or MR(null/null) mice we demonstrate that in GR(null/null) cardiomyocytes the response is abrogated, but it is fully maintained in MR(null/null) cardiomyocytes. We conclude that Art3 expression is regulated exclusively via the GR. Our study identifies a new set of corticosteroid-regulated genes in cardiomyocytes and demonstrates a new approach to studying the selectivity of MR- vs GR-dependent effects.
Resumo:
Of all Pacific salmonids, Chinook salmon Oncorhynchus tshawytscha display the greatest variability in return times to freshwater. The molecular mechanisms of these differential return times have not been well described. Current methods, such as long serial analysis of gene expression (LongSAGE) and microarrays, allow gene expression to be analyzed for thousands of genes simultaneously. To investigate whether differential gene expression is observed between fall- and spring-run Chinook salmon from California's Central Valley, LongSAGE libraries were constructed. Three libraries containing between 25,512 and 29,372 sequenced tags (21 base pairs/tag) were generated using messenger RNA from the brains of adult Chinook salmon returning in fall and spring and from one ocean-caught Chinook salmon. Tags were annotated to genes using complementary DNA libraries from Atlantic salmon Salmo salar and rainbow trout O. mykiss. Differentially expressed genes, as estimated by differences in the number of sequence tags, were found in all pairwise comparisons of libraries (freshwater versus saltwater = 40 genes; fall versus spring = 11 genes: and spawning versus nonspawning = 51 genes). The gene for ependymin, an extracellular glycoprotein involved in behavioral plasticity in fish, exhibited the most differential expression among the three groupings. Reverse transcription polymerase chain reaction analysis verified the differential expression of ependymin between the fall- and spring-run samples. These LongSAGE libraries, the first reported for Chinook salmon, provide a window of the transcriptional changes during Chinook salmon return migration to freshwater and spawning and increase the amount of expressed sequence data.
Resumo:
Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Resumo:
We describe a genome-wide characterization of mRNA transcript levels in yeast grown on the fatty acid oleate, determined using Serial Analysis of Gene Expression (SAGE). Comparison of this SAGE library with that reported for glucose grown cells revealed the dramatic adaptive response of yeast to a change in carbon source. A major fraction (>20%) of the 15,000 mRNA molecules in a yeast cell comprised differentially expressed transcripts, which were derived from only 2% of the total number of ∼6300 yeast genes. Most of the mRNAs that were differentially expressed code for enzymes or for other proteins participating in metabolism (e.g., metabolite transporters). In oleate-grown cells, this was exemplified by the huge increase of mRNAs encoding the peroxisomal β-oxidation enzymes required for degradation of fatty acids. The data provide evidence for the existence of redox shuttles across organellar membranes that involve peroxisomal, cytoplasmic, and mitochondrial enzymes. We also analyzed the mRNA profile of a mutant strain with deletions of the PIP2 and OAF1 genes, encoding transcription factors required for induction of genes encoding peroxisomal proteins. Induction of genes under the immediate control of these factors was abolished; other genes were up-regulated, indicating an adaptive response to the changed metabolism imposed by the genetic impairment. We describe a statistical method for analysis of data obtained by SAGE.
Resumo:
We have developed a technique called the generation of longer cDNA fragments from serial analysis of gene expression (SAGE) tags for gene identification (GLGI), to convert SAGE tags of 10 bases into their corresponding 3′ cDNA fragments covering hundred bases. A primer containing the 10-base SAGE tag is used as the sense primer, and a single base anchored oligo(dT) primer is used as an antisense primer in PCR, together with Pfu DNA polymerase. By using this approach, a cDNA fragment extending from the SAGE tag toward the 3′ end of the corresponding sequence can be generated. Application of the GLGI technique can solve two critical issues in applying the SAGE technique: one is that a longer fragment corresponding to a SAGE tag, which has no match in databases, can be generated for further studies; the other is that the specific fragment corresponding to a SAGE tag can be identified from multiple sequences that match the same SAGE tag. The development of the GLGI method provides several potential applications. First, it provides a strategy for even wider application of the SAGE technique for quantitative analysis of global gene expression. Second, a combined application of SAGE/GLGI can be used to complete the catalogue of the expressed genes in human and in other eukaryotic species. Third, it can be used to identify the 3′ cDNA sequence from any exon within a gene. It can also be used to confirm the reality of exons predicted by bioinformatic tools in genomic sequences. Fourth, a combined application of SAGE/GLGI can be applied to define the 3′ boundary of expressed genes in the genomic sequences in human and in other eukaryotic genomes.
Resumo:
Neurotrophic factors such as nerve growth factor (NGF) promote a wide variety of responses in neurons, including differentiation, survival, plasticity, and repair. Such actions often require changes in gene expression. To identify the regulated genes and thereby to more fully understand the NGF mechanism, we carried out serial analysis of gene expression (SAGE) profiling of transcripts derived from rat PC12 cells before and after NGF-promoted neuronal differentiation. Multiple criteria supported the reliability of the profile. Approximately 157,000 SAGE tags were analyzed, representing at least 21,000 unique transcripts. Of these, nearly 800 were regulated by 6-fold or more in response to NGF. Approximately 150 of the regulated transcripts have been matched to named genes, the majority of which were not previously known to be NGF-responsive. Functional categorization of the regulated genes provides insight into the complex, integrated mechanism by which NGF promotes its multiple actions. It is anticipated that as genomic sequence information accrues the data derived here will continue to provide information about neurotrophic factor mechanisms.
Resumo:
Crotalus durissus rattlesnakes are responsible for the most lethal cases of snakebites in Brazil. Crotalus durissus collilineatus subspecies is related to a great number of accidents in Southeast and Central West regions, but few studies on its venom composition have been carried out to date. In an attempt to describe the transcriptional profile of the C. durissus collilineatus venom gland, we generated a cDNA library and the sequences obtained could be identified by similarity searches on existing databases. Out of 673 expressed sequence tags (ESTs) 489 produced readable sequences comprising 201 singletons and 47 clusters of two or more ESTs. One hundred and fifty reads (60.5%) produced significant hits to known sequences. The results showed a predominance of toxin-coding ESTs instead of transcripts coding for proteins involved in all cellular functions. The most frequent toxin was crotoxin, comprising 88% of toxin-coding sequences. Crotoxin B, a basic phospholipase A(2) (PLA(2)) subunit of crotoxin, was represented in more variable forms comparing to the non-enzymatic subunit (crotoxin A), and most sequences coding this molecule were identified as CB1 isoform from Crotalus durissus terrificus venom. Four percent of toxin-related sequences in this study were identified as growth factors, comprising five sequences for vascular endothelial growth factor (VEGF) and one for nerve growth factor (NGF) that showed 100% of identity with C. durissus terrificus NGF. We also identified two clusters for metalloprotease from PII class comprising 3% of the toxins, and two for serine proteases, including gyroxin (2.5%). The remaining 2.5% of toxin-coding ESTs represent singletons identified as homologue sequences to cardiotoxin, convulxin, angiotensin-converting enzyme inhibitor and C-type natriuretic peptide, Ohanin, crotamin and PLA(2) inhibitor. These results allowed the identification of the most common classes of toxins in C. durissus collilineatus snake venom, also showing some unknown classes for this subspecies and even for C. durissus species, such as cardiotoxins and VEGF. (C) 2009 Published by Elsevier Masson SAS.
Resumo:
SUMMARY: Large sets of data, such as expression profiles from many samples, require analytic tools to reduce their complexity. The Iterative Signature Algorithm (ISA) is a biclustering algorithm. It was designed to decompose a large set of data into so-called 'modules'. In the context of gene expression data, these modules consist of subsets of genes that exhibit a coherent expression profile only over a subset of microarray experiments. Genes and arrays may be attributed to multiple modules and the level of required coherence can be varied resulting in different 'resolutions' of the modular mapping. In this short note, we introduce two BioConductor software packages written in GNU R: The isa2 package includes an optimized implementation of the ISA and the eisa package provides a convenient interface to run the ISA, visualize its output and put the biclusters into biological context. Potential users of these packages are all R and BioConductor users dealing with tabular (e.g. gene expression) data. AVAILABILITY: http://www.unil.ch/cbg/ISA CONTACT: sven.bergmann@unil.ch
Resumo:
The distal parts of the renal tubule play a critical role in maintaining homeostasis of extracellular fluids. In this review, we present an in-depth analysis of microarray-based gene expression profiles available for microdissected mouse distal nephron segments, i.e., the distal convoluted tubule (DCT) and the connecting tubule (CNT), and for the cortical portion of the collecting duct (CCD; Zuber et al., Proc Natl Acad Sci USA 106:16523-16528, 2009). Classification of expressed transcripts in 14 major functional gene categories demonstrated that all principal proteins involved in maintaining the salt and water balance are represented by highly abundant transcripts. However, a significant number of transcripts belonging, for instance, to categories of G-protein-coupled receptors or serine/threonine kinases exhibit high expression levels but remain unassigned to a specific renal function. We also established a list of genes differentially expressed between the DCT/CNT and the CCD. This list is enriched by genes related to segment-specific transport functions and by transcription factors directing the development of the distal nephron or collecting ducts. Collectively, this in silico analysis provides comprehensive information about relative abundance and tissue specificity of the DCT/CNT and the CCD expressed transcripts and identifies new candidate genes for renal homeostasis.
Resumo:
ABSTRACT: BACKGROUND: The degree of conservation of gene expression between homologous organs largely remains an open question. Several recent studies reported some evidence in favor of such conservation. Most studies compute organs' similarity across all orthologous genes, whereas the expression level of many genes are not informative about organ specificity. RESULTS: Here, we use a modularization algorithm to overcome this limitation through the identification of inter-species co-modules of organs and genes. We identify such co-modules using mouse and human microarray expression data. They are functionally coherent both in terms of genes and of organs from both organisms. We show that a large proportion of genes belonging to the same co-module are orthologous between mouse and human. Moreover, their zebrafish orthologs also tend to be expressed in the corresponding homologous organs. Notable exceptions to the general pattern of conservation are the testis and the olfactory bulb. Interestingly, some co-modules consist of single organs, while others combine several functionally related organs. For instance, amygdala, cerebral cortex, hypothalamus and spinal cord form a clearly discernible unit of expression, both in mouse and human. CONCLUSIONS: Our study provides a new framework for comparative analysis which will be applicable also to other sets of large-scale phenotypic data collected across different species.
Resumo:
During my PhD, my aim was to provide new tools to increase our capacity to analyse gene expression patterns, and to study on a large-scale basis the evolution of gene expression in animals. Gene expression patterns (when and where a gene is expressed) are a key feature in understanding gene function, notably in development. It appears clear now that the evolution of developmental processes and of phenotypes is shaped both by evolution at the coding sequence level, and at the gene expression level.Studying gene expression evolution in animals, with complex expression patterns over tissues and developmental time, is still challenging. No tools are available to routinely compare expression patterns between different species, with precision, and on a large-scale basis. Studies on gene expression evolution are therefore performed only on small genes datasets, or using imprecise descriptions of expression patterns.The aim of my PhD was thus to develop and use novel bioinformatics resources, to study the evolution of gene expression. To this end, I developed the database Bgee (Base for Gene Expression Evolution). The approach of Bgee is to transform heterogeneous expression data (ESTs, microarrays, and in-situ hybridizations) into present/absent calls, and to annotate them to standard representations of anatomy and development of different species (anatomical ontologies). An extensive mapping between anatomies of species is then developed based on hypothesis of homology. These precise annotations to anatomies, and this extensive mapping between species, are the major assets of Bgee, and have required the involvement of many co-workers over the years. My main personal contribution is the development and the management of both the Bgee database and the web-application.Bgee is now on its ninth release, and includes an important gene expression dataset for 5 species (human, mouse, drosophila, zebrafish, Xenopus), with the most data from mouse, human and zebrafish. Using these three species, I have conducted an analysis of gene expression evolution after duplication in vertebrates.Gene duplication is thought to be a major source of novelty in evolution, and to participate to speciation. It has been suggested that the evolution of gene expression patterns might participate in the retention of duplicate genes. I performed a large-scale comparison of expression patterns of hundreds of duplicated genes to their singleton ortholog in an outgroup, including both small and large-scale duplicates, in three vertebrate species (human, mouse and zebrafish), and using highly accurate descriptions of expression patterns. My results showed unexpectedly high rates of de novo acquisition of expression domains after duplication (neofunctionalization), at least as high or higher than rates of partitioning of expression domains (subfunctionalization). I found differences in the evolution of expression of small- and large-scale duplicates, with small-scale duplicates more prone to neofunctionalization. Duplicates with neofunctionalization seemed to evolve under more relaxed selective pressure on the coding sequence. Finally, even with abundant and precise expression data, the majority fate I recovered was neither neo- nor subfunctionalization of expression domains, suggesting a major role for other mechanisms in duplicate gene retention.
Resumo:
INTRODUCTION: Breast cancer subtyping and prognosis have been studied extensively by gene expression profiling, resulting in disparate signatures with little overlap in their constituent genes. Although a previous study demonstrated a prognostic concordance among gene expression signatures, it was limited to only one dataset and did not fully elucidate how the different genes were related to one another nor did it examine the contribution of well-known biological processes of breast cancer tumorigenesis to their prognostic performance. METHOD: To address the above issues and to further validate these initial findings, we performed the largest meta-analysis of publicly available breast cancer gene expression and clinical data, which are comprised of 2,833 breast tumors. Gene coexpression modules of three key biological processes in breast cancer (namely, proliferation, estrogen receptor [ER], and HER2 signaling) were used to dissect the role of constituent genes of nine prognostic signatures. RESULTS: Using a meta-analytical approach, we consolidated the signatures associated with ER signaling, ERBB2 amplification, and proliferation. Previously published expression-based nomenclature of breast cancer 'intrinsic' subtypes can be mapped to the three modules, namely, the ER-/HER2- (basal-like), the HER2+ (HER2-like), and the low- and high-proliferation ER+/HER2- subtypes (luminal A and B). We showed that all nine prognostic signatures exhibited a similar prognostic performance in the entire dataset. Their prognostic abilities are due mostly to the detection of proliferation activity. Although ER- status (basal-like) and ERBB2+ expression status correspond to bad outcome, they seem to act through elevated expression of proliferation genes and thus contain only indirect information about prognosis. Clinical variables measuring the extent of tumor progression, such as tumor size and nodal status, still add independent prognostic information to proliferation genes. CONCLUSION: This meta-analysis unifies various results of previous gene expression studies in breast cancer. It reveals connections between traditional prognostic factors, expression-based subtyping, and prognostic signatures, highlighting the important role of proliferation in breast cancer prognosis.
Resumo:
The focus of my PhD research was the concept of modularity. In the last 15 years, modularity has become a classic term in different fields of biology. On the conceptual level, a module is a set of interacting elements that remain mostly independent from the elements outside of the module. I used modular analysis techniques to study gene expression evolution in vertebrates. In particular, I identified ``natural'' modules of gene expression in mouse and human, and I showed that expression of organ-specific and system-specific genes tends to be conserved between such distance vertebrates as mammals and fishes. Also with a modular approach, I studied patterns of developmental constraints on transcriptome evolution. I showed that none of the two commonly accepted models of the evolution of embryonic development (``evo-devo'') are exclusively valid. In particular, I found that the conservation of the sequences of regulatory regions is highest during mid-development of zebrafish, and thus it supports the ``hourglass model''. In contrast, events of gene duplication and new gene introduction are most rare in early development, which supports the ``early conservation model''. In addition to the biological insights on transcriptome evolution, I have also discussed in detail the advantages of modular approaches in large-scale data analysis. Moreover, I re-analyzed several studies (published in high-ranking journals), and showed that their conclusions do not hold out under a detailed analysis. This demonstrates that complex analysis of high-throughput data requires a co-operation between biologists, bioinformaticians, and statisticians.
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.