Biblioteca Digital

15 resultados para Information Gene

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo

Assessing the gain of biological data integration in gene networks inference

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.

Gene expression profiles displayed by peripheral blood mononuclear cells from patients with type 2 diabetes mellitus focusing on biological processes implicated on the pathogenesis of the disease

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Patients with type 2 diabetes mellitus (T2DM) exhibit insulin resistance associated with obesity and inflammatory response, besides an increased level of oxidative DNA damage as a consequence of the hyperglycemic condition and the generation of reactive oxygen species (ROS). In order to provide information on the mechanisms involved in the pathophysiology of T2DM, we analyzed the transcriptional expression patterns exhibited by peripheral blood mononuclear cells (PBMCs) from patients with T2DM compared to non-diabetic subjects, by investigating several biological processes: inflammatory and immune responses, responses to oxidative stress and hypoxia, fatty acid processing, and DNA repair. PBMCs were obtained from 20 T2DM patients and eight non-diabetic subjects. Total RNA was hybridized to Agilent whole human genome 4x44K one-color oligo-microarray. Microarray data were analyzed using the GeneSpring GX 11.0 software (Agilent). We used BRB-ArrayTools software (gene set analysis - GSA) to investigate significant gene sets and the Genomica tool to study a possible influence of clinical features on gene expression profiles. We showed that PBMCs from T2DM patients presented significant changes in gene expression, exhibiting 1320 differentially expressed genes compared to the control group. A great number of genes were involved in biological processes implicated in the pathogenesis of T2DM. Among the genes with high fold-change values, the up-regulated ones were associated with fatty acid metabolism and protection against lipid-induced oxidative stress, while the down-regulated ones were implicated in the suppression of pro-inflammatory cytokines production and DNA repair. Moreover, we identified two significant signaling pathways: adipocytokine, related to insulin resistance; and ceramide, related to oxidative stress and induction of apoptosis. In addition, expression profiles were not influenced by patient features, such as age, gender, obesity, pre/post-menopause age, neuropathy, glycemia, and HbA(1c) percentage. Hence, by studying expression profiles of PBMCs, we provided quantitative and qualitative differences and similarities between T2DM patients and non-diabetic individuals, contributing with new perspectives for a better understanding of the disease. (C) 2012 Elsevier B.V. All rights reserved.

The neuroimmune changes induced by cohabitation with an Ehrlich tumor-bearing cage mate rely on olfactory information

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cohabitation for 14 days with Ehrlich tumor-bearing mice was shown to increase locomotor activity, to decrease hypothalamic noradrenaline (NA) levels, to increase NA turnover and to decrease innate immune responses and decrease the animals' resistance to tumor growth. Cage mates of a B16F10 melanoma-bearer mice were also reported to show neuroimmune changes. Chemosignals released by Ehrlich tumor-bearing mice have been reported to be relevant for the neutrophil activity changes induced by cohabitation. The present experiment was designed to further analyze the effects of odor cues on neuroimmune changes induced by cohabitation with a sick cage mate. Specifically, the relevance of chemosignals released by an Ehrlich tumor-bearing mouse was assessed on the following: behavior (open-field and plus maze); hypothalamic NA levels and turnover; adrenaline (A) and NA plasmatic levels; and host resistance induced by tumor growth. To comply with such objectives, devices specifically constructed to analyze the influence of chemosignals released from tumor-bearing mice were employed. The results show that deprivation of odor cues released by Ehrlich tumor-bearing mice reversed the behavioral, neurochemical and immune changes induced by cohabitation. Mice use scents for intraspecies communication in many social contexts. Tumors produce volatile organic compounds released into the atmosphere through breath, sweat, and urine. Our results strongly suggest that volatile compounds released by Ehrlich tumor-injected mice are perceived by their conspecifics, inducing the neuroimmune changes reported for cohabitation with a sick companion. (C) 2011 Elsevier Inc. All rights reserved.

Identifying regulational alterations in gene regulatory networks by state space representation of vector autoregressive models and variational annealing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded. Methods: We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood. Results: For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib. Conclusions: From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.

Impact of hypoxia on IGF-I, IGF-II, IGFBP-3, ALS and IGFBP-1 regulation and on IGF1R gene expression in children

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hypoxia is one of many factors involved in the regulation of the IGF system. However, no information is available regarding the regulation of the IGF system by acute hypoxia in humans. Objective: The aim of this study was to evaluate the effect of acute hypoxia on the IGF system of children. Design: Twenty-seven previously health children (14 boys and 13 girls) aged 15 days to 9.5 years were studied in two different situations: during a hypoxemic state (HS) due to acute respiratory distress and after full recovery to a normoxemic state (NS). In these two situations oxygen saturation was assessed with a pulse-oximeter and blood samples were collected for serum IGF-I, IGF-II, IGFBP-1, IGFBP-3, ALS and insulin determination by ELISA; fluoroimmunometric assay determination for GH and also for IGF1R gene expression analysis in peripheral lymphocytes by quantitative real-time PCR. Data were paired and analyzed by the Wilcoxon non-parametric test. Results: Oxygen saturation was significantly lower during HS than in NS (P<0.0001). IGF-I and IGF-II levels were lower during HS than in NS (P<0.0001 and P=0.0004. respectively). IGFBP-3 levels were also lower in HS than in NS (P=0.0002) while ALS and basal GH levels were higher during HS (P=0.0015 and P=0.014, respectively). Moreover, IGFBP-1 levels were higher during HS than in NS (P=0.004). No difference was found regarding insulin levels. The expression of IGF1R mRNA as 2(-Delta Delta CT) was higher during HS than in NS (P=0.03). Conclusion: The above results confirm a role of hypoxia in the regulation of the IGF system also in humans. This effect could be direct on the liver and/or mediated by GH and it is not restricted to the hepatocytes but involves other cell lines. During acute hypoxia a combination of alterations usually associated with reduced IGF action was observed. The higher expression of IGF1R mRNA may reflect an up-regulation of the transcriptional process. (C) 2012 Elsevier Ltd. All rights reserved.

rpb2 is a reliable reference gene for quantitative gene expression analysis in the dermatophyte Trichophyton rubrum

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The selection of reference genes used for data normalization to quantify gene expression by real-time PCR amplifications (qRT-PCR) is crucial for the accuracy of this technique. In spite of this, little information regarding such genes for qRT-PCR is available for gene expression analyses in pathogenic fungi. Thus, we investigated the suitability of eight candidate reference genes in isolates of the human dermatophyte Trichophyton rubrum subjected to several environmental challenges, such as drug exposure, interaction with human nail and skin, and heat stress. The stability of these genes was determined by geNorm, NormFinder and Best-Keeper programs. The gene with the most stable expression in the majority of the conditions tested was rpb2 (DNA-dependent RNA polymerase II), which was validated in three T. rubrum strains. Moreover, the combination of rpb2 and chs1 (chitin synthase) genes provided for the most reliable qRT-PCR data normalization in T. rubrum under a broad range of biological conditions. To the best of our knowledge this is the first report on the selection of reference genes for qRT-PCR data normalization in dermatophytes and the results of these studies should permit further analysis of gene expression under several experimental conditions, with improved accuracy and reliability.

Using Prior Information from the Medical Literature in GWAS of Oral Cancer Identifies Novel Susceptibility Variant on Chromosome 4-the AdAPT Method

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods: We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results: Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [p(trend)] = 2.5 x 10(-3)). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76-0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion: This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config).

Mannose-binding lectin and MBL-associated serine protease-2 gene polymorphisms in a Brazilian population from Rio de Janeiro

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mannose-binding lectin (MBL) is a protein able to bind to carbohydrate patterns on pathogen membranes; upon MBL binding, its associated serine protease MBL-associated serine protease type 2 (MASP2) is autoactivated, promoting the activation of complement via the lectin pathway. For both MBL2 and MASP2 genes, the frequencies of polymorphisms are extremely variable between different ethnicities, and this aspect has to be carefully considered when performing genetic studies. While polymorphisms in the MBL-encoding gene (MBL2) have been associated, depending upon ethnicity, with several diseases in different populations, little is known about the distribution of MASP2 gene polymorphisms in human populations. The aim of our study was thus to determine the frequencies of MBL2 (exon 1 and promoter) and MASP2 (p.D371Y) polymorphisms in a Brazilian population from Rio de Janeiro. A total of 294 blood donor samples were genotyped for 27 polymorphisms in the MBL2 gene by direct sequencing of a region spanning from the promoter polymorphism H/L rs11003125 to the rs1800451 polymorphism (at codon 57 in the first exon of the gene). Genotyping for MASP2 p.D371Y was carried out using fluorogenic probes. To our knowledge, this is the first study reporting the prevalence of the MASP2 p.D371Y polymorphism in a Brazilian population. The C allele frequency 39% is something intermediate between the reported 14% in Europeans and 90% in Sub-Saharan Africans. MBL2 polymorphisms frequencies were quite comparable to those previously reported for admixed Brazilians. Both MBL2 and MASP2 polymorphisms frequencies reported in our study for the admixed Brazilian population are somehow intermediate between those reported in Europeans and Africans, reflecting the ethnic composition of the southern Brazilian population, estimated to derive from an admixture of Caucasian (31%), African (34%) and Native American (33%) populations. In conclusion, our population genetic study describes the frequencies of MBL2 and MASP2 functional SNPs in a population from Rio de Janeiro, with the aim of adding new information concerning the distribution of these SNPs in a previously unanalysed Brazilian population, thus providing a new genetic tool for the evaluation of the association of MBL2 and MASP2 functional SNPs with diseases in Brazil, with particular emphasis on the state of Rio de Janeiro.

Vitrification of primordial germ cells using whole embryos for gene-banking in loach, Misgurnus anguillicaudatus

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In gene-banking, primordial germ cells (PGCs), which are embryonic precursor cells of germ cells, are useful for cryopreservation because PGCs have a potential to differentiate into both eggs and sperm via germ-line chimera. Here, we have established vitrification methods for PGCs cryopreservation using 12- to 17-somite stage embryos in loach, Misgurnus anguillicaudatus, which were dechorionated, removed their yolk and injected with green fluorescent protein (GFP) -nos1 3'UTR mRNA to visualize their PGCs. In order to optimize cryopreservation medium for vitrification, the toxicity of cryoprotectants was analyzed. Different concentrations (2, 3, 4, 5 m) of dimethyl sulfoxide (DMSO), methanol (MeOH), ethylene glycol (EG) and propylene glycol (PG) as cryoprotectants were tested. Then, 5 m DMSO showed significantly-high toxicity. Based on this information, combinations called DMP (2 m (14.2% [v/v]) DMSO, 2 m (8.1% [v/v]) MeOH and 2 m (14.4% [v/v]) PG), DP (2 m (14.2% [v/v]) DMSO and 4 m (28.7% [v/v]) PG) and DE (2.1 m (15% [v/v]) DMSO and 2.7 m (15% [v/v]) EG) were evaluated for their toxicities and efficacy of PGCs cryopreservation using two types of equilibration step: direct immersion of cryopreservation media (one-step) and serial exposure to half and full concentration of cryopreservation media (two-step). Viable PGCs were obtained from post-thaw embryos which were cryopreserved by DP and DE with both 1- and 2-step equilibrations. Despite DP showing the highest toxicity, it gave the highest survival rate of embryonic cells after cryopreservation. When PGCs recovered from vitrified embryos were transplanted into host embryos at the blastula stage, the transplanted PGCs were able to migrate to a host genital ridge similarly as endogenous PGCs. It suggests that our methods could be useful to create a germ-line chimera for the production of gametes from PGCs of cryopreserved embryos.

Bayesian model accounting for within-class biological variability in Serial Analysis of Gene Expression (SAGE)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.

Modeling gene expression regulatory networks with the sparse vector autoregressive model

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.

Multivariate gene expression analysis reveals functional connectivity changes between normal/tumoral prostates

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Prostate cancer is a leading cause of death in the male population, therefore, a comprehensive study about the genes and the molecular networks involved in the tumoral prostate process becomes necessary. In order to understand the biological process behind potential biomarkers, we have analyzed a set of 57 cDNA microarrays containing ~25,000 genes. Results Principal Component Analysis (PCA) combined with the Maximum-entropy Linear Discriminant Analysis (MLDA) were applied in order to identify genes with the most discriminative information between normal and tumoral prostatic tissues. Data analysis was carried out using three different approaches, namely: (i) differences in gene expression levels between normal and tumoral conditions from an univariate point of view; (ii) in a multivariate fashion using MLDA; and (iii) with a dependence network approach. Our results show that malignant transformation in the prostatic tissue is more related to functional connectivity changes in their dependence networks than to differential gene expression. The MYLK, KLK2, KLK3, HAN11, LTF, CSRP1 and TGM4 genes presented significant changes in their functional connectivity between normal and tumoral conditions and were also classified as the top seven most informative genes for the prostate cancer genesis process by our discriminant analysis. Moreover, among the identified genes we found classically known biomarkers and genes which are closely related to tumoral prostate, such as KLK3 and KLK2 and several other potential ones. Conclusion We have demonstrated that changes in functional connectivity may be implicit in the biological process which renders some genes more informative to discriminate between normal and tumoral conditions. Using the proposed method, namely, MLDA, in order to analyze the multivariate characteristic of genes, it was possible to capture the changes in dependence networks which are related to cell transformation.

Functional clustering of time series gene expression data by Granger causality

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression proﬁles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identiﬁcation of functionally similar genes. Results: In this study we perform gene clustering through the identiﬁcation of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.

Functional markers for gene mapping and genetic diversity studies in sugarcane

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background The database of sugarcane expressed sequence tags (EST) offers a great opportunity for developing molecular markers that are directly associated with important agronomic traits. The development of new EST-SSR markers represents an important tool for genetic analysis. In sugarcane breeding programs, functional markers can be used to accelerate the process and select important agronomic traits, especially in the mapping of quantitative traits loci (QTL) and plant resistant pathogens or qualitative resistance loci (QRL). The aim of this work was to develop new simple sequence repeat (SSR) markers in sugarcane using the sugarcane expressed sequence tag (SUCEST database). Findings A total of 365 EST-SSR molecular markers with trinucleotide motifs were developed and evaluated in a collection of 18 genotypes of sugarcane (15 varieties and 3 species). In total, 287 of the EST-SSRs markers amplified fragments of the expected size and were polymorphic in the analyzed sugarcane varieties. The number of alleles ranged from 2-18, with an average of 6 alleles per locus, while polymorphism information content values ranged from 0.21-0.92, with an average of 0.69. The discrimination power was high for the majority of the EST-SSRs, with an average value of 0.80. Among the markers characterized in this study some have particular interest, those that are related to bacterial defense responses, generation of precursor metabolites and energy and those involved in carbohydrate metabolic process. Conclusions These EST-SSR markers presented in this work can be efficiently used for genetic mapping studies of segregating sugarcane populations. The high Polymorphism Information Content (PIC) and Discriminant Power (DP) presented facilitate the QTL identification and marker-assisted selection due the association with functional regions of the genome became an important tool for the sugarcane breeding program.

Semantic integration of gene expression analysis tools and data sources using software connectors

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.