977 resultados para GENE-ONTOLOGY
Resumo:
Background: The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. Results: We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of similar to 58% and similar to 40% for localization and functions respectively of proteins were determined at a threshold of similar to 30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k nearest neighbor classifier confirmed that our results compared favorably. Conclusions: This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in GWAS studies to be associated with diseases, which are of translational interest.
Resumo:
Here we introduce a computer database that allows for the rapid retrieval of physicochemical properties, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes information about a protein or a list of proteins. We applied PIGOK analyzing Schizosaccharomyces pombe proteins displaying differential expression under oxidative stress and identified their biological functions and pathways. The database is available on the Internet at http://pc4-133.ludwig.ucl.ac.uk/pigok.html.
Resumo:
Staphylococcus aureus (S. aureus) is a prominent human and livestock pathogen investigated widely using omic technologies. Critically, due to availability, low visibility or scattered resources, robust network and statistical contextualisation of the resulting data is generally under-represented. Here, we present novel meta-analyses of freely-accessible molecular network and gene ontology annotation information resources for S. aureus omics data interpretation. Furthermore, through the application of the gene ontology annotation resources we demonstrate their value and ability (or lack-there-of) to summarise and statistically interpret the emergent properties of gene expression and protein abundance changes using publically available data. This analysis provides simple metrics for network selection and demonstrates the availability and impact that gene ontology annotation selection can have on the contextualisation of bacterial omics data.
Resumo:
In the present study, we identified a novel asthma susceptibility gene, NPSR1 (neuropeptide S receptor 1) on chromosome 7p14.3 by the positional cloning strategy. An earlier significant linkage mapping result among Finnish Kainuu asthma families was confirmed in two independent cohorts: in asthma families from Quebec, Canada and in allergy families from North Karelia, Finland. The linkage region was narrowed down to a 133-kb segment by a hierarchial genotyping method. The observed 77-kb haplotype block showed 7 haplotypes and a similar risk and nonrisk pattern in all three populations studied. All seven haplotypes occur in all three populations at frequences > 2%. Significant elevated relative risks were detected for elevated total IgE (immunoglobulin E) or asthma. Risk effects of the gene variants varied from 1.4 to 2.5. NPSR1 belongs to the G protein-coupled receptor (GPCR) family with a topology of seven transmembrane domains. NPSR1 has 9 exons, with the two main transcripts, A and B, encoding proteins of 371 and 377 amino acids, respectively. We detected a low but ubiquitous expression level of NPSR1-B in various tissues and endogenous cell lines while NPSR1-A has a more restricted expression pattern. Both isoforms were expressed in the lung epithelium. We observed aberrant expression levels of NPSR1-B in smooth muscle in asthmatic bronchi as compared to healthy. In an experimental mouse model, the induced lung inflammation resulted in elevated Npsr1 levels. Furthermore, we demonstrated that the activation of NPSR1 with its endogenous agonist, neuropeptide S (NPS), resulted in a significant inhibition of the growth of NPSR1-A overexpressing stable cell lines (NPSR1-A cells). To determine which target genes were regulated by the NPS-NPSR1 pathway, NPSR1-A cells were stimulated with NPS, and differentially expressed genes were identified using the Affymetrix HGU133Plus2 GeneChip. A total of 104 genes were found significantly up-regulated and 42 down-regulated 6 h after NPS administration. The up-regulated genes included many neuronal genes and some putative susceptibility genes for respiratory disorders. By Gene Ontology enrichment analysis, the biological process terms, cell proliferation, morphogenesis and immune response were among the most altered. The expression of four up-regulated genes, matrix metallopeptidase 10 (MMP10), INHBA (activin A), interleukin 8 (IL8) and EPH receptor A2 (EPHA2), were verified and confirmed by quantitative reverse-transcriptase-PCR. In conclusion, we identified a novel asthma susceptibility gene, NPSR1, on chromosome 7p14.3. NPS-NPSR1 represents a novel pathway that regulates cell proliferation and immune responses, and thus may have functional relevance in the pathogenesis of asthma.
Resumo:
Background: Recent research on glioblastoma (GBM) has focused on deducing gene signatures predicting prognosis. The present study evaluated the mRNA expression of selected genes and correlated with outcome to arrive at a prognostic gene signature. Methods: Patients with GBM (n = 123) were prospectively recruited, treated with a uniform protocol and followed up. Expression of 175 genes in GBM tissue was determined using qRT-PCR. A supervised principal component analysis followed by derivation of gene signature was performed. Independent validation of the signature was done using TCGA data. Gene Ontology and KEGG pathway analysis was carried out among patients from TCGA cohort. Results: A 14 gene signature was identified that predicted outcome in GBM. A weighted gene (WG) score was found to be an independent predictor of survival in multivariate analysis in the present cohort (HR = 2.507; B = 0.919; p < 0.001) and in TCGA cohort. Risk stratification by standardized WG score classified patients into low and high risk predicting survival both in our cohort (p = <0.001) and TCGA cohort (p = 0.001). Pathway analysis using the most differentially regulated genes (n = 76) between the low and high risk groups revealed association of activated inflammatory/immune response pathways and mesenchymal subtype in the high risk group. Conclusion: We have identified a 14 gene expression signature that can predict survival in GBM patients. A network analysis revealed activation of inflammatory response pathway specifically in high risk group. These findings may have implications in understanding of gliomagenesis, development of targeted therapies and selection of high risk cancer patients for alternate adjuvant therapies.
Resumo:
MOTIVATION: Synthetic lethal interactions represent pairs of genes whose individual mutations are not lethal, while the double mutation of both genes does incur lethality. Several studies have shown a correlation between functional similarity of genes and their distances in networks based on synthetic lethal interactions. However, there is a lack of algorithms for predicting gene function from synthetic lethality interaction networks. RESULTS: In this article, we present a novel technique called kernelROD for gene function prediction from synthetic lethal interaction networks based on kernel machines. We apply our novel algorithm to Gene Ontology functional annotation prediction in yeast. Our experiments show that our method leads to improved gene function prediction compared with state-of-the-art competitors and that combining genetic and congruence networks leads to a further improvement in prediction accuracy.
Resumo:
The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.
Resumo:
Empirically derived phenotypic measurements have the potential to enhance gene-finding efforts in schizophrenia. Previous research based on factor analyses of symptoms has typically included schizoaffective cases. Deriving factor loadings from analysis of only narrowly defined schizophrenia cases could yield more sensitive factor scores for gene pathway and gene ontology analyses. Using an Irish family sample, this study 1) factor analyzed clinician-rated Operational Criteria Checklist items in cases with schizophrenia only, 2) scored the full sample based on these factor loadings, and 3) implemented genome-wide association, gene-based, and gene-pathway analysis of these SCZ-based symptom factors (final N= 507). Three factors emerged from the analysis of the schizophrenia cases: a manic, a depressive, and a positive symptom factor. In gene-based analyses of these factors, multiple genes had q<. 0.01. Of particular interest are findings for PTPRG and WBP1L, both of which were previously implicated by the Psychiatric Genomics Consortium study of SCZ; results from this study suggest that variants in these genes might also act as modifiers of SCZ symptoms. Gene pathway analyses of the first factor indicated over-representation of glutamatergic transmission, GABA-A receptor, and cyclic GMP pathways. Results suggest that these pathways may have differential influence on affective symptom presentation in schizophrenia.
Resumo:
One of the major challenges in systems biology is to understand the complex responses of a biological system to external perturbations or internal signalling depending on its biological conditions. Genome-wide transcriptomic profiling of cellular systems under various chemical perturbations allows the manifestation of certain features of the chemicals through their transcriptomic expression profiles. The insights obtained may help to establish the connections between human diseases, associated genes and therapeutic drugs. The main objective of this study was to systematically analyse cellular gene expression data under various drug treatments to elucidate drug-feature specific transcriptomic signatures. We first extracted drug-related information (drug features) from the collected textual description of DrugBank entries using text-mining techniques. A novel statistical method employing orthogonal least square learning was proposed to obtain drug-feature-specific signatures by integrating gene expression with DrugBank data. To obtain robust signatures from noisy input datasets, a stringent ensemble approach was applied with the combination of three techniques: resampling, leave-one-out cross validation, and aggregation. The validation experiments showed that the proposed method has the capacity of extracting biologically meaningful drug-feature-specific gene expression signatures. It was also shown that most of signature genes are connected with common hub genes by regulatory network analysis. The common hub genes were further shown to be related to general drug metabolism by Gene Ontology analysis. Each set of genes has relatively few interactions with other sets, indicating the modular nature of each signature and its drug-feature-specificity. Based on Gene Ontology analysis, we also found that each set of drug feature (DF)-specific genes were indeed enriched in biological processes related to the drug feature. The results of these experiments demonstrated the pot- ntial of the method for predicting certain features of new drugs using their transcriptomic profiles, providing a useful methodological framework and a valuable resource for drug development and characterization.
A simple genetic basis for complex social behaviour mediates widespread gene expression differences.
Resumo:
A remarkable social polymorphism is controlled by a single Mendelian factor in the fire ant Solenopsis invicta. A genomic element marked by the gene Gp-9 determines whether workers tolerate one or many fertile queens in their colony. Gp-9 was recently shown to be part of a supergene with two nonrecombining variants, SB and Sb. SB/SB and SB/Sb queens differ in how they initiate new colonies, and in many physiological traits, for example odour and maturation rate. To understand how a single genetic element can affect all these traits, we used a microarray to compare gene expression patterns between SB/SB and SB/Sb queens of three different age classes: 1-day-old unmated queens, 11-day-old unmated queens and mated, fully reproductive queens collected from mature field colonies. The number of genes that were differentially expressed between SB/SB and SB/Sb queens of the same age class was smallest in 1-day-old queens, maximal in 11-day-old queens and intermediate in reproductive queens. Gene ontology analysis showed that SB/SB queens upregulate reproductive genes faster than SB/Sb queens. For all age classes, genes inside the supergene were overrepresented among the differentially expressed genes. Consistent with the hypothesized greater number of transposons in the Sb supergene, 13 transposon genes were upregulated in SB/Sb queens. Viral genes were also upregulated in SB/Sb mature queens, consistent with the known greater parasite load in colonies headed by SB/Sb queens compared with colonies headed by SB/SB queens. Eighteen differentially expressed genes between reproductive queens were involved in chemical signalling. Our results suggest that many genes in the supergene are involved in regulating social organization and queen phenotypes in fire ants.
Resumo:
RESUME : La douleur neuropathique est le résultat d'une lésion ou d'un dysfonctionnement du système nerveux. Les symptômes qui suivent la douleur neuropathique sont sévères et leur traitement inefficace. Une meilleure approche thérapeutique peut être proposée en se basant sur les mécanismes pathologiques de la douleur neuropathique. Lors d'une lésion périphérique une douleur neuropathique peut se développer et affecter le territoire des nerfs lésés mais aussi les territoires adjacents des nerfs non-lésés. Une hyperexcitabilité des neurones apparaît au niveau des ganglions spinaux (DRG) et de la corne dorsale (DH) de la moelle épinière. Le but de ce travail consiste à mettre en évidence les modifications moléculaires associées aux nocicepteurs lésés et non-lésés au niveau des DRG et des laminae I et II de la corne dorsale, là où l'information nociceptive est intégrée. Pour étudier les changements moléculaires liés à la douleur neuropathique nous utilisons le modèle animal d'épargne du nerf sural (spared nerve injury model, SNI) une semaine après la lésion. Pour la sélection du tissu d'intérêt nous avons employé la technique de la microdissection au laser, afin de sélectionner une sous-population spécifique de cellules (notamment les nocicepteurs lésés ou non-lésés) mais également de prélever le tissu correspondant dans les laminae superficielles. Ce travail est couplé à l'analyse à large spectre du transcriptome par puce ADN (microarray). Par ailleurs, nous avons étudié les courants électriques et les propriétés biophysiques des canaux sodiques (Na,,ls) dans les neurones lésés et non-lésés des DRG. Aussi bien dans le système nerveux périphérique, entre les neurones lésés et non-lésés, qu'au niveau central avec les aires recevant les projections des nocicepteurs lésés ou non-lésés, l'analyse du transcriptome montre des différences de profil d'expression. En effet, nous avons constaté des changements transcriptionnels importants dans les nocicepteurs lésés (1561 gènes, > 1.5x et pairwise comparaison > 77%) ainsi que dans les laminae correspondantes (618 gènes), alors que ces modifications transcriptionelles sont mineures au niveau des nocicepteurs non-lésés (60 gènes), mais important dans leurs laminae de projection (459 gènes). Au niveau des nocicepteurs, en utilisant la classification par groupes fonctionnels (Gene Ontology), nous avons observé que plusieurs processus biologiques sont modifiés. Ainsi des fonctions telles que la traduction des signaux cellulaires, l'organisation du cytosquelette ainsi que les mécanismes de réponse au stress sont affectés. Par contre dans les neurones non-lésés seuls les processus biologiques liés au métabolisme et au développement sont modifiés. Au niveau de la corne dorsale de la moelle, nous avons observé des modifications importantes des processus immuno-inflammatoires dans l'aire affectée par les nerfs lésés et des changements associés à l'organisation et la transmission synaptique au niveau de l'aire des nerfs non-lésés. L'analyse approfondie des canaux sodiques a démontré plusieurs changements d'expression, principalement dans les neurones lésés. Les analyses fonctionnelles n'indiquent aucune différence entre les densités de courant tétrodotoxine-sensible (TTX-S) dans les neurones lésés et non-lésés même si les niveaux d'expression des ARNm des sous-unités TTX-S sont modifiés dans les neurones lésés. L'inactivation basale dépendante du voltage des canaux tétrodotoxine-insensible (TTX-R) est déplacée vers des potentiels positifs dans les cellules lésées et non-lésées. En revanche la vitesse de récupération des courants TTX-S et TTX-R après inactivation est accélérée dans les neurones lésés. Ces changements pourraient être à l'origine de l'altération de l'activité électrique des neurones sensoriels dans le contexte des douleurs neuropathiques. En résumé, ces résultats suggèrent l'existence de mécanismes différenciés affectant les neurones lésés et les neurones adjacents non-lésés lors de la mise en place la douleur neuropathique. De plus, les changements centraux au niveau de la moelle épinière qui surviennent après lésion sont probablement intégrés différemment selon la perception de signaux des neurones périphériques lésés ou non-lésés. En conclusion, ces modulations complexes et distinctes sont probablement des acteurs essentiels impliqués dans la genèse et la persistance des douleurs neuropathiques. ABSTRACT : Neuropathic pain (NP) results from damage or dysfunction of the peripheral or central nervous system. Symptoms associated with NP are severe and difficult to treat. Targeting NP mechanisms and their translation into symptoms may offer a better therapeutic approach.Hyperexcitability of the peripheral and central nervous system occurs in the dorsal root ganglia (DRG) and the dorsal horn (DH) of the spinal cord. We aimed to identify transcriptional variations in injured and in adjacent non-injured nociceptors as well as in corresponding laminae I and II of DH receiving their inputs.We investigated changes one week after the injury induced by the spared nerve injury model of NP. We employed the laser capture microdissection (LCM) for the procurement of specific cell-types (enrichment in nociceptors of injured/non-injured neurons) and laminae in combination with transcriptional analysis by microarray. In addition, we studied functionál properties and currents of sodium channels (Nav1s) in injured and neighboring non-injured DRG neurons.Microarray analysis at the periphery between injured and non-injured DRG neurons and centrally between the area of central projections from injured and non-injured neurons show significant and differential expression patterns. We reported changes in injured nociceptors (1561 genes, > 1.5 fold, >77% pairwise comparison) and in corresponding DH laminae (618 genes), while less modifications occurred in non-injured nociceptors (60 genes) and in corresponding DH laminae (459 genes). At the periphery, we observed by Gene Ontology the involvement of multiple biological processes in injured neurons such as signal transduction, cytoskeleton organization or stress responses. On contrast, functional overrepresentations in non-injured neurons were noted only in metabolic or developmentally related mechanisms. At the level of superficial laminae of the dorsal horn, we reported changes of immune and inflammatory processes in injured-related DH and changes associated with synaptic organization and transmission in DH corresponding to non-injured neurons. Further transcriptional analysis of Nav1s indicated several changes in injured neurons. Functional analyses of Nav1s have established no difference in tetrodotoxin-sensitive (TTX-S) current densities in both injured and non-injured neurons, despite changes in TTX-S Nav1s subunit mRNA levels. The tetrodotoxin-resistant (TTX-R) voltage dependence of steady state inactivation was shifted to more positive potentials in both injured and non-injured neurons, and the rate of recovery from inactivation of TTX-S and TTX-R currents was accelerated in injured neurons. These changes may lead to alterations in neuronal electrogenesis. Taken together, these findings suggest different mechanisms occurring in the injured neurons and the adjacent non-injured ones. Moreover, central changes after injury are probably driven in a different manner if they receive inputs from injured or non-injured neurons. Together, these distinct and complex modulations may contribute to NP.
Resumo:
Les dinoflagellés sont des eucaryotes unicellulaires que l’on retrouve autant en eau douce qu’en milieu marin. Ils sont particulièrement connus pour causer des fleurs d’algues toxiques nommées ‘marée-rouge’, ainsi que pour leur symbiose avec les coraux et pour leur importante contribution à la fixation du carbone dans les océans. Au point de vue moléculaire, ils sont aussi connus pour leur caractéristiques nucléaires uniques, car on retrouve généralement une quantité immense d’ADN dans leurs chromosomes et ceux-ci sont empaquetés et condensés sous une forme cristalline liquide au lieu de nucléosomes. Les gènes encodés par le noyau sont souvent présents en multiples copies et arrangés en tandem et aucun élément de régulation transcriptionnelle, y compris la boite TATA, n’a encore été observé. L’organisation unique de la chromatine des dinoflagellés suggère que différentes stratégies sont nécessaires pour contrôler l’expression des gènes de ces organismes. Dans cette étude, j’ai abordé ce problème en utilisant le dinoflagellé photosynthétique Lingulodinium polyedrum comme modèle. L. polyedrum est d’un intérêt particulier, car il a plusieurs rythmes circadiens (journalier). À ce jour, toutes les études sur l’expression des gènes lors des changements circadiens ont démontrées une régulation à un niveau traductionnel. Pour mes recherches, j’ai utilisé les approches transcriptomique, protéomique et phosphoprotéomique ainsi que des études biochimiques pour donner un aperçu de la mécanique de la régulation des gènes des dinoflagellés, ceci en mettant l’accent sur l’importance de la phosphorylation du système circadien de L. polyedrum. L’absence des protéines histones et des nucléosomes est une particularité des dinoflagellés. En utilisant la technologie RNA-Seq, j’ai trouvé des séquences complètes encodant des histones et des enzymes modifiant les histones. L polyedrum exprime donc des séquences conservées codantes pour les histones, mais le niveau d’expression protéique est plus faible que les limites de détection par immunodétection de type Western. Les données de séquençage RNA-Seq ont également été utilisées pour générer un transcriptome, qui est une liste des gènes exprimés par L. polyedrum. Une recherche par homologie de séquences a d’abord été effectuée pour classifier les transcrits en diverses catégories (Gene Ontology; GO). Cette analyse a révélé une faible abondance des facteurs de transcription et une surprenante prédominance, parmi ceux-ci, des séquences à domaine Cold Shock. Chez L. polyedrum, plusieurs gènes sont répétés en tandem. Un alignement des séquences obtenues par RNA-Seq avec les copies génomiques de gènes organisés en tandem a été réalisé pour examiner la présence de transcrits polycistroniques, une hypothèse formulée pour expliquer le manque d’élément promoteur dans la région intergénique de la séquence de ces gènes. Cette analyse a également démontré une très haute conservation des séquences codantes des gènes organisés en tandem. Le transcriptome a également été utilisé pour aider à l’identification de protéines après leur séquençage par spectrométrie de masse, et une fraction enrichie en phosphoprotéines a été déterminée comme particulièrement bien adapté aux approches d’analyse à haut débit. La comparaison des phosphoprotéomes provenant de deux périodes différentes de la journée a révélée qu’une grande partie des protéines pour lesquelles l’état de phosphorylation varie avec le temps est reliées aux catégories de liaison à l’ARN et de la traduction. Le transcriptome a aussi été utilisé pour définir le spectre des kinases présentes chez L. polyedrum, qui a ensuite été utilisé pour classifier les différents peptides phosphorylés qui sont potentiellement les cibles de ces kinases. Plusieurs peptides identifiés comme étant phosphorylés par la Casein Kinase 2 (CK2), une kinase connue pour être impliquée dans l’horloge circadienne des eucaryotes, proviennent de diverses protéines de liaison à l’ARN. Pour évaluer la possibilité que quelques-unes des multiples protéines à domaine Cold Shock identifiées dans le transcriptome puissent moduler l’expression des gènes de L. polyedrum, tel qu’observé chez plusieurs autres systèmes procaryotiques et eucaryotiques, la réponse des cellules à des températures froides a été examinée. Les températures froides ont permis d’induire rapidement un enkystement, condition dans laquelle ces cellules deviennent métaboliquement inactives afin de résister aux conditions environnementales défavorables. Les changements dans le profil des phosphoprotéines seraient le facteur majeur causant la formation de kystes. Les phosphosites prédits pour être phosphorylés par la CK2 sont la classe la plus fortement réduite dans les kystes, une découverte intéressante, car le rythme de la bioluminescence confirme que l’horloge a été arrêtée dans le kyste.
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.