965 resultados para MICROARRAY DATA
Resumo:
Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results: We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion: ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.
Resumo:
Xylella fastidiosa is a Gram negative plant pathogen causing many economically important diseases, and analyses of completely sequenced X. fastidiosa genome strains allowed the identification of many prophage-like elements and possibly phage remnants, accounting for up to 15% of the genome composition. To better evaluate the recent evolution of the X. fastidiosa chromosome backbone among distinct pathovars, the number and location of prophage-like regions on two finished genomes (9a5c and Temecula1), and in two candidate molecules (Ann1 and Dixon) were assessed. Based on comparative best bidirectional hit analyses, the majority (51%) of the predicted genes in the X. fastidiosa prophage-like regions are related to structural phage genes belonging to the Siphoviridae family. Electron micrograph reveals the existence of putative viral particles with similar morphology to lambda phages in the bacterial cell in planta. Moreover, analysis of microarray data indicates that 9a5c strain cultivated under stress conditions presents enhanced expression of phage anti-repressor genes, suggesting switches from lysogenic to lytic cycle of phages under stress-induced situations. Furthermore, virulence-associated proteins and toxins are found within these prophage-like elements, thus suggesting an important role in host adaptation. Finally, clustering analyses of phage integrase genes based on multiple alignment patterns reveal they group in five lineages, all possessing a tyrosine recombinase catalytic domain, and phylogenetically close to other integrases found in phages that are genetic mosaics and able to perform generalized and specialized transduction. Integration sites and tRNA association is also evidenced. In summary, we present comparative and experimental evidence supporting the association and contribution of phage activity on the differentiation of Xylella genomes.
Resumo:
Background: Microarray techniques have become an important tool to the investigation of genetic relationships and the assignment of different phenotypes. Since microarrays are still very expensive, most of the experiments are performed with small samples. This paper introduces a method to quantify dependency between data series composed of few sample points. The method is used to construct gene co-expression subnetworks of highly significant edges. Results: The results shown here are for an adapted subset of a Saccharomyces cerevisiae gene expression data set with low temporal resolution and poor statistics. The method reveals common transcription factors with a high confidence level and allows the construction of subnetworks with high biological relevance that reveals characteristic features of the processes driving the organism adaptations to specific environmental conditions. Conclusion: Our method allows a reliable and sophisticated analysis of microarray data even under severe constraints. The utilization of systems biology improves the biologists ability to elucidate the mechanisms underlying celular processes and to formulate new hypotheses.
Resumo:
Background: There are several studies in the literature depicting measurement error in gene expression data and also, several others about regulatory network models. However, only a little fraction describes a combination of measurement error in mathematical regulatory networks and shows how to identify these networks under different rates of noise. Results: This article investigates the effects of measurement error on the estimation of the parameters in regulatory networks. Simulation studies indicate that, in both time series (dependent) and non-time series (independent) data, the measurement error strongly affects the estimated parameters of the regulatory network models, biasing them as predicted by the theory. Moreover, when testing the parameters of the regulatory network models, p-values computed by ignoring the measurement error are not reliable, since the rate of false positives are not controlled under the null hypothesis. In order to overcome these problems, we present an improved version of the Ordinary Least Square estimator in independent (regression models) and dependent (autoregressive models) data when the variables are subject to noises. Moreover, measurement error estimation procedures for microarrays are also described. Simulation results also show that both corrected methods perform better than the standard ones (i.e., ignoring measurement error). The proposed methodologies are illustrated using microarray data from lung cancer patients and mouse liver time series data. Conclusions: Measurement error dangerously affects the identification of regulatory network models, thus, they must be reduced or taken into account in order to avoid erroneous conclusions. This could be one of the reasons for high biological false positive rates identified in actual regulatory network models.
Resumo:
Moniliophthora perniciosa is a hemibiotrophic fungus that causes witches` broom disease (WBD) in cacao. Marked dimorphism characterizes this fungus, showing a monokaryotic or biotrophic phase that causes disease symptoms and a later dikaryotic or saprotrophic phase. A combined strategy of DNA microarray, expressed sequence tag, and real-time reverse-transcriptase polymerase chain reaction analyses was employed to analyze differences between these two fungal stages in vitro. In all, 1,131 putative genes were hybridized with cDNA from different phases, resulting in 189 differentially expressed genes, and 4,595 reads were clusterized, producing 1,534 unigenes. The analysis of these genes, which represent approximately 21% of the total genes, indicates that the biotrophic-like phase undergoes carbon and nitrogen catabollite repression that correlates to the expression of phytopathogenicity genes. Moreover, downregulation of mitochondrial oxidative phosphorylation and the presence of a putative ngr1 of Saccharomyces cerevisiae could help explain its lower growth rate. In contrast, the saprotrophic mycelium expresses genes related to the metabolism of hexoses, ammonia, and oxidative phosphorylation, which could explain its faster growth. Antifungal toxins were upregulated and could prevent the colonization by competing fungi. This work significantly contributes to our understanding of the molecular mechanisms of WBD and, to our knowledge, is the first to analyze differential gene expression of the different phases of a hemibiotrophic fungus.
Resumo:
The Down syndrome (DS) immune phenotype is characterized by thymus hypotrophy, higher propensity to organ-specific autoimmune disorders, and higher susceptibility to infections, among other features. Considering that AIRE (autoimmune regulator) is located on 21q22.3, we analyzed protein and gene expression in surgically removed thymuses from 14 DS patients with congenital heart defects, who were compared with 42 age-matched controls with heart anomaly as an isolated malformation. Immunohistochemistry revealed 70.48 +/- 49.59 AIRE-positive cells/mm(2) in DS versus 154.70 +/- 61.16 AIRE-positive cells/mm(2) in controls (p < 0.0001), and quantitative PCR as well as DNA microarray data confirmed those results. The number of FOXP3-positive cells/mm(2) was equivalent in both groups. Thymus transcriptome analysis showed 407 genes significantly hypoexpressed in DS, most of which were related, according to network transcriptional analysis (FunNet), to cell division and to immunity. Immune response-related genes included those involved in 1) Ag processing and presentation (HLA-DQB1, HLA-DRB3, CD1A, CD1B, CD1C, ERAP) and 2) thymic T cell differentiation (IL2RG, RAG2, CD3D, CD3E, PRDX2, CDK6) and selection (SH2D1A, CD74). It is noteworthy that relevant AIRE-partner genes, such as TOP2A, LAMNB1, and NUP93, were found hypoexpressed in DNA microarrays and quantitative real-time PCR analyses. These findings on global thymic hypofunction in DS revealed molecular mechanisms underlying DS immune phenotype and strongly suggest that DS immune abnormalities are present since early development, rather than being a consequence of precocious aging, as widely hypothesized. Thus, DS should be considered as a non-monogenic primary immunodeficiency. The Journal of Immunology, 2011, 187: 3422-3430.
Resumo:
Chagas disease, characterized by acute myocarditis and chronic cardiomyopathy, is caused by infection with the protozoan parasite Trypanosoma cruzi. We sought to identify genes altered during the development of parasite-induced cardiomyopathy. Microarrays containing 27,400 sequence-verified mouse cDNAs were used to analyze global gene expression changes in the myocardium of a murine model of chagasic cardiomyopathy. Changes in gene expression were determined as the acute stage of infection developed into the chronic stage. This analysis was performed on the hearts of male CD-1 mice infected with trypomastigotes of T. cruzi (Brazil strain). At each interval we compared infected and uninfected mice and confirmed the microarray data with dye reversal. We identified eight distinct categories of mRNAs that were differentially regulated during infection and identified dysregulation of several key genes. These data may provide insight into the pathogenesis of chagasic cardiomyopathy and provide new targets for intervention. (c) 2008 Elsevier Inc. All rights reserved.
Resumo:
Objectives To evaluate the gene expression profile of fibroblasts from affected and non-affected skin of systemic sclerosis (SSc) patients and from controls. Materials and methods Labeled cDNA from fibroblast cultures from forearm (affected) and axillary (non-affected) skin from six diffuse SSc patients, from three normal controls, and from MOLT-4/HEp-2/normal fibroblasts (reference pool) was probed in microarrays generated with 4193 human cDNAs from the IMAGE Consortium. Microarray images were converted into numerical data and gene expression was calculated as the ratio between fibroblast cDNA (Cy5) and reference pool cDNA (Cy3) data and analyzed by R environment/Aroma, Cluster, Tree View, and SAM softwares. Differential expression was confirmed by real time PCR for a set of selected genes. Results Eighty-eight genes were up- and 241 genes down-regulated in SSc fibroblasts. Gene expression correlation was strong between affected and non-affected fibroblast samples from the same patient (r>0.8), moderate among fibroblasts from all patients (r=0.72) and among fibroblasts from all controls (r=0.70), and modest among fibroblasts from patients and controls (r=0.55). The differential expression was confirmed by real time PCR for all selected genes. Conclusions Fibroblasts from affected and non-affected skin of SSc patients shared a similar abnormal gene expression profile, suggesting that the widespread molecular disturbance in SSc fibroblasts is more sensitive than histological and clinical alterations. Novel molecular elements potentially involved in SSc pathogenesis were identified.
Resumo:
Urinary bladder cancer is the fourth most common malignancy in the Western world. Transitional cell carcinoma (TCC) is the most common subtype, accounting for about 90% of all bladder cancers. The TP53 gene plays an essential role in the regulation of the cell cycle and apoptosis and therefore contributes to cellular transformation and malignancy; however, little is known about the differential gene expression patterns in human tumors that present with the wild-type or mutated TP53 gene. Therefore, because gene profiling can provide new insights into the molecular biology of bladder cancer, the present study aimed to compare the molecular profiles of bladder cancer cell lines with different TP53 alleles, including the wild type (RT4) and two mutants (5637, with mutations in codons 280 and 72; and T24, a TP53 allele encoding an in-frame deletion of tyrosine 126). Unsupervised hierarchical clustering and gene networks were constructed based on data generated by cDNA microarrays using mRNA from the three cell lines. Differentially expressed genes related to the cell cycle, cell division, cell death, and cell proliferation were observed in the three cell lines. However, the cDNA microarray data did not cluster cell lines based on their TP53 allele. The gene profiles of the RT4 cells were more similar to those of T24 than to those of the 5637 cells. While the deregulation of both the cell cycle and the apoptotic pathways was particularly related to TCC, these alterations were not associated with the TP53 status.
Resumo:
Gene expression profiling by cDNA microarrays during murine thymus ontogeny has contributed to dissecting the large-scale molecular genetics of T cell maturation. Gene profiling, although useful for characterizing the thymus developmental phases and identifying the differentially expressed genes, does not permit the determination of possible interactions between genes. In order to reconstruct genetic interactions, on RNA level, within thymocyte differentiation, a pair of microarrays containing a total of 1,576 cDNA sequences derived from the IMAGE MTB library was applied on samples of developing thymuses (14-17 days of gestation). The data were analyzed using the GeneNetwork program. Genes that were previously identified as differentially expressed during thymus ontogeny showed their relationships with several other genes. The present method provided the detection of gene nodes coding for proteins implicated in the calcium signaling pathway, such as Prrg2 and Stxbp3, and in protein transport toward the cell membrane, such as Gosr2. The results demonstrate the feasibility of reconstructing networks based on cDNA microarray gene expression determinations, contributing to a clearer understanding of the complex interactions between genes involved in thymus/thymocyte development.
Resumo:
Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.
Resumo:
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
Resumo:
BACKGROUND: Zebrafish is a clinically-relevant model of heart regeneration. Unlike mammals, it has a remarkable heart repair capacity after injury, and promises novel translational applications. Amputation and cryoinjury models are key research tools for understanding injury response and regeneration in vivo. An understanding of the transcriptional responses following injury is needed to identify key players of heart tissue repair, as well as potential targets for boosting this property in humans. RESULTS: We investigated amputation and cryoinjury in vivo models of heart damage in the zebrafish through unbiased, integrative analyses of independent molecular datasets. To detect genes with potential biological roles, we derived computational prediction models with microarray data from heart amputation experiments. We focused on a top-ranked set of genes highly activated in the early post-injury stage, whose activity was further verified in independent microarray datasets. Next, we performed independent validations of expression responses with qPCR in a cryoinjury model. Across in vivo models, the top candidates showed highly concordant responses at 1 and 3 days post-injury, which highlights the predictive power of our analysis strategies and the possible biological relevance of these genes. Top candidates are significantly involved in cell fate specification and differentiation, and include heart failure markers such as periostin, as well as potential new targets for heart regeneration. For example, ptgis and ca2 were overexpressed, while usp2a, a regulator of the p53 pathway, was down-regulated in our in vivo models. Interestingly, a high activity of ptgis and ca2 has been previously observed in failing hearts from rats and humans. CONCLUSIONS: We identified genes with potential critical roles in the response to cardiac damage in the zebrafish. Their transcriptional activities are reproducible in different in vivo models of cardiac injury.
Resumo:
BACKGROUND: The Nuclear Factor I (NFI) family of DNA binding proteins (also called CCAAT box transcription factors or CTF) is involved in both DNA replication and gene expression regulation. Using chromatin immuno-precipitation and high throughput sequencing (ChIP-Seq), we performed a genome-wide mapping of NFI DNA binding sites in primary mouse embryonic fibroblasts. RESULTS: We found that in vivo and in vitro NFI DNA binding specificities are indistinguishable, as in vivo ChIP-Seq NFI binding sites matched predictions based on previously established position weight matrix models of its in vitro binding specificity. Combining ChIP-Seq with mRNA profiling data, we found that NFI preferentially associates with highly expressed genes that it up-regulates, while binding sites were under-represented at expressed but unregulated genes. Genomic binding also correlated with markers of transcribed genes such as histone modifications H3K4me3 and H3K36me3, even outside of annotated transcribed loci, implying NFI in the control of the deposition of these modifications. Positional correlation between + and - strand ChIP-Seq tags revealed that, in contrast to other transcription factors, NFI associates with a nucleosomal length of cleavage-resistant DNA, suggesting an interaction with positioned nucleosomes. In addition, NFI binding prominently occurred at boundaries displaying discontinuities in histone modifications specific of expressed and silent chromatin, such as loci submitted to parental allele-specific imprinted expression. CONCLUSIONS: Our data thus suggest that NFI nucleosomal interaction may contribute to the partitioning of distinct chromatin domains and to epigenetic gene expression regulation.NFI ChIP-Seq and input control DNA data were deposited at Gene Expression Omnibus (GEO) repository under accession number GSE15844. Gene expression microarray data for mouse embryonic fibroblasts are on GEO accession number GSE15871.
Resumo:
Dysregulation of intestinal epithelial cell performance is associated with an array of pathologies whose onset mechanisms are incompletely understood. While whole-genomics approaches have been valuable for studying the molecular basis of several intestinal diseases, a thorough analysis of gene expression along the healthy gastrointestinal tract is still lacking. The aim of this study was to map gene expression in gastrointestinal regions of healthy human adults and to implement a procedure for microarray data analysis that would allow its use as a reference when screening for pathological deviations. We analyzed the gene expression signature of antrum, duodenum, jejunum, ileum, and transverse colon biopsies using a biostatistical method based on a multivariate and univariate approach to identify region-selective genes. One hundred sixty-six genes were found responsible for distinguishing the five regions considered. Nineteen had never been described in the GI tract, including a semaphorin probably implicated in pathogen invasion and six novel genes. Moreover, by crossing these genes with those retrieved from an existing data set of gene expression in the intestine of ulcerative colitis and Crohn's disease patients, we identified genes that might be biomarkers of Crohn's and/or ulcerative colitis in ileum and/or colon. These include CLCA4 and SLC26A2, both implicated in ion transport. This study furnishes the first map of gene expression along the healthy human gastrointestinal tract. Furthermore, the approach implemented here, and validated by retrieving known gene profiles, allowed the identification of promising new leads in both healthy and disease states.