919 resultados para Analysis of gene expression
Resumo:
DNA microarrays provide such a huge amount of data that unsupervised methods are required to reduce the dimension of the data set and to extract meaningful biological information. This work shows that Independent Component Analysis (ICA) is a promising approach for the analysis of genome-wide transcriptomic data. The paper first presents an overview of the most popular algorithms to perform ICA. These algorithms are then applied on a microarray breast-cancer data set. Some issues about the application of ICA and the evaluation of biological relevance of the results are discussed. This study indicates that ICA significantly outperforms Principal Component Analysis (PCA).
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Resumo:
We describe a genome-wide characterization of mRNA transcript levels in yeast grown on the fatty acid oleate, determined using Serial Analysis of Gene Expression (SAGE). Comparison of this SAGE library with that reported for glucose grown cells revealed the dramatic adaptive response of yeast to a change in carbon source. A major fraction (>20%) of the 15,000 mRNA molecules in a yeast cell comprised differentially expressed transcripts, which were derived from only 2% of the total number of ∼6300 yeast genes. Most of the mRNAs that were differentially expressed code for enzymes or for other proteins participating in metabolism (e.g., metabolite transporters). In oleate-grown cells, this was exemplified by the huge increase of mRNAs encoding the peroxisomal β-oxidation enzymes required for degradation of fatty acids. The data provide evidence for the existence of redox shuttles across organellar membranes that involve peroxisomal, cytoplasmic, and mitochondrial enzymes. We also analyzed the mRNA profile of a mutant strain with deletions of the PIP2 and OAF1 genes, encoding transcription factors required for induction of genes encoding peroxisomal proteins. Induction of genes under the immediate control of these factors was abolished; other genes were up-regulated, indicating an adaptive response to the changed metabolism imposed by the genetic impairment. We describe a statistical method for analysis of data obtained by SAGE.
Resumo:
The molecular mechanisms of pulmonary fibrosis are poorly understood. We have used oligonucleotide arrays to analyze the gene expression programs that underlie pulmonary fibrosis in response to bleomycin, a drug that causes lung inflammation and fibrosis, in two strains of susceptible mice (129 and C57BL/6). We then compared the gene expression patterns in these mice with 129 mice carrying a null mutation in the epithelial-restricted integrin β6 subunit (β6−/−), which develop inflammation but are protected from pulmonary fibrosis. Cluster analysis identified two distinct groups of genes involved in the inflammatory and fibrotic responses. Analysis of gene expression at multiple time points after bleomycin administration revealed sequential induction of subsets of genes that characterize each response. The availability of this comprehensive data set should accelerate the development of more effective strategies for intervention at the various stages in the development of fibrotic diseases of the lungs and other organs.
Resumo:
We have developed a technique called the generation of longer cDNA fragments from serial analysis of gene expression (SAGE) tags for gene identification (GLGI), to convert SAGE tags of 10 bases into their corresponding 3′ cDNA fragments covering hundred bases. A primer containing the 10-base SAGE tag is used as the sense primer, and a single base anchored oligo(dT) primer is used as an antisense primer in PCR, together with Pfu DNA polymerase. By using this approach, a cDNA fragment extending from the SAGE tag toward the 3′ end of the corresponding sequence can be generated. Application of the GLGI technique can solve two critical issues in applying the SAGE technique: one is that a longer fragment corresponding to a SAGE tag, which has no match in databases, can be generated for further studies; the other is that the specific fragment corresponding to a SAGE tag can be identified from multiple sequences that match the same SAGE tag. The development of the GLGI method provides several potential applications. First, it provides a strategy for even wider application of the SAGE technique for quantitative analysis of global gene expression. Second, a combined application of SAGE/GLGI can be used to complete the catalogue of the expressed genes in human and in other eukaryotic species. Third, it can be used to identify the 3′ cDNA sequence from any exon within a gene. It can also be used to confirm the reality of exons predicted by bioinformatic tools in genomic sequences. Fourth, a combined application of SAGE/GLGI can be applied to define the 3′ boundary of expressed genes in the genomic sequences in human and in other eukaryotic genomes.
Resumo:
As the study of microbes moves into the era of functional genomics, there is an increasing need for molecular tools for analysis of a wide diversity of microorganisms. Currently, biological study of many prokaryotes of agricultural, medical, and fundamental scientific interest is limited by the lack of adequate genetic tools. We report the application of the bacterial artificial chromosome (BAC) vector to prokaryotic biology as a powerful approach to address this need. We constructed a BAC library in Escherichia coli from genomic DNA of the Gram-positive bacterium Bacillus cereus. This library provides 5.75-fold coverage of the B. cereus genome, with an average insert size of 98 kb. To determine the extent of heterologous expression of B. cereus genes in the library, we screened it for expression of several B. cereus activities in the E. coli host. Clones expressing 6 of 10 activities tested were identified in the library, namely, ampicillin resistance, zwittermicin A resistance, esculin hydrolysis, hemolysis, orange pigment production, and lecithinase activity. We analyzed selected BAC clones genetically to identify rapidly specific B. cereus loci. These results suggest that BAC libraries will provide a powerful approach for studying gene expression from diverse prokaryotes.
Resumo:
Neurotrophic factors such as nerve growth factor (NGF) promote a wide variety of responses in neurons, including differentiation, survival, plasticity, and repair. Such actions often require changes in gene expression. To identify the regulated genes and thereby to more fully understand the NGF mechanism, we carried out serial analysis of gene expression (SAGE) profiling of transcripts derived from rat PC12 cells before and after NGF-promoted neuronal differentiation. Multiple criteria supported the reliability of the profile. Approximately 157,000 SAGE tags were analyzed, representing at least 21,000 unique transcripts. Of these, nearly 800 were regulated by 6-fold or more in response to NGF. Approximately 150 of the regulated transcripts have been matched to named genes, the majority of which were not previously known to be NGF-responsive. Functional categorization of the regulated genes provides insight into the complex, integrated mechanism by which NGF promotes its multiple actions. It is anticipated that as genomic sequence information accrues the data derived here will continue to provide information about neurotrophic factor mechanisms.
Resumo:
Early detection is an effective means of reducing cancer mortality. Here, we describe a highly sensitive high-throughput screen that can identify panels of markers for the early detection of solid tumor cells disseminated in peripheral blood. The method is a two-step combination of differential display and high-sensitivity cDNA arrays. In a primary screen, differential display identified 170 candidate marker genes differentially expressed between breast tumor cells and normal breast epithelial cells. In a secondary screen, high-sensitivity arrays assessed expression levels of these genes in 48 blood samples, 22 from healthy volunteers and 26 from breast cancer patients. Cluster analysis identified a group of 12 genes that were elevated in the blood of cancer patients. Permutation analysis of individual genes defined five core genes (P ≤ 0.05, permax test). As a group, the 12 genes generally distinguished accurately between healthy volunteers and patients with breast cancer. Mean expression levels of the 12 genes were elevated in 77% (10 of 13) untreated invasive cancer patients, whereas cluster analysis correctly classified volunteers and patients (P = 0.0022, Fisher's exact test). Quantitative real-time PCR confirmed array results and indicated that the sensitivity of the assay (1:2 × 108 transcripts) was sufficient to detect disseminated solid tumor cells in blood. Expression-based blood assays developed with the screening approach described here have the potential to detect and classify solid tumor cells originating from virtually any primary site in the body.
Resumo:
The transcriptional effects of deregulated myc gene overexpression are implicated in tumorigenesis in a spectrum of experimental and naturally occurring neoplasms. In follicles of the chicken bursa of Fabricius, myc induction of B-cell neoplasia requires a target cell population present during early bursal development and progresses through preneoplastic transformed follicles to metastatic lymphomas. We developed a chicken immune system cDNA microarray to analyze broad changes in gene expression that occur during normal embryonic B-cell development and during myc-induced neoplastic transformation in the bursa. The number of mRNAs showing at least 3-fold change was greater during myc-induced lymphomagenesis than during normal development, and hierarchical cluster analysis of expression patterns revealed that levels of several hundred mRNAs varied in concert with levels of myc overexpression. A set of 41 mRNAs were most consistently elevated in myc-overexpressing preneoplastic and neoplastic cells, most involved in processes thought to be subject to regulation by Myc. The mRNAs for another cluster of genes were overexpressed in neoplasia independent of myc expression level, including a small subset with the expression signature of embryonic bursal lymphocytes. Overexpression of myc, and some of the genes overexpressed with myc, may be important for generation of preneoplastic transformed follicles. However, expression profiles of late metastatic tumors showed a large variation in concert with myc expression levels, and some showed minimal myc overexpression. Therefore, high-level myc overexpression may be more important in the early induction of these lymphomas than in maintenance of late-stage metastases.