948 resultados para cDNA microarray analysis
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
Background: Understanding transcriptional regulation by genome-wide microarray studies can contribute to unravel complex relationships between genes. Attempts to standardize the annotation of microarray data include the Minimum Information About a Microarray Experiment (MIAME) recommendations, the MAGE-ML format for data interchange, and the use of controlled vocabularies or ontologies. The existing software systems for microarray data analysis implement the mentioned standards only partially and are often hard to use and extend. Integration of genomic annotation data and other sources of external knowledge using open standards is therefore a key requirement for future integrated analysis systems. Results: The EMMA 2 software has been designed to resolve shortcomings with respect to full MAGE-ML and ontology support and makes use of modern data integration techniques. We present a software system that features comprehensive data analysis functions for spotted arrays, and for the most common synthesized oligo arrays such as Agilent, Affymetrix and NimbleGen. The system is based on the full MAGE object model. Analysis functionality is based on R and Bioconductor packages and can make use of a compute cluster for distributed services. Conclusion: Our model-driven approach for automatically implementing a full MAGE object model provides high flexibility and compatibility. Data integration via SOAP-based web-services is advantageous in a distributed client-server environment as the collaborative analysis of microarray data is gaining more and more relevance in international research consortia. The adequacy of the EMMA 2 software design and implementation has been proven by its application in many distributed functional genomics projects. Its scalability makes the current architecture suited for extensions towards future transcriptomics methods based on high-throughput sequencing approaches which have much higher computational requirements than microarrays.
Resumo:
BACKGROUND Lactococcus garvieae is a bacterial pathogen that affects different animal species in addition to humans. Despite the widespread distribution and emerging clinical significance of L. garvieae in both veterinary and human medicine, there is almost a complete lack of knowledge about the genetic content of this microorganism. In the present study, the genomic content of L. garvieae CECT 4531 was analysed using bioinformatics tools and microarray-based comparative genomic hybridization (CGH) experiments. Lactococcus lactis subsp. lactis IL1403 and Streptococcus pneumoniae TIGR4 were used as reference microorganisms. RESULTS The combination and integration of in silico analyses and in vitro CGH experiments, performed in comparison with the reference microorganisms, allowed establishment of an inter-species hybridization framework with a detection threshold based on a sequence similarity of >or= 70%. With this threshold value, 267 genes were identified as having an analogue in L. garvieae, most of which (n = 258) have been documented for the first time in this pathogen. Most of the genes are related to ribosomal, sugar metabolism or energy conversion systems. Some of the identified genes, such as als and mycA, could be involved in the pathogenesis of L. garvieae infections. CONCLUSIONS In this study, we identified 267 genes that were potentially present in L. garvieae CECT 4531. Some of the identified genes could be involved in the pathogenesis of L. garvieae infections. These results provide the first insight into the genome content of L. garvieae.
Resumo:
We assessed associations between steroid receptors including: estrogen-alpha, estrogen-beta, androgen receptor, progesterone receptor, the HER2 status and triple-negative epithelial ovarian cancer (ERα-/PR-/HER2-; TNEOC) status and survival in women with epithelial ovarian cancer. The study included 152 women with primary epithelial ovarian cancer. The status of steroid receptor and HER2 was determined by immunohistochemistry. Disease-free and overall survival were calculated and compared with steroid receptor and HER2 status as well as clinicopathological features using the Cox Proportional Hazards model. A mean follow-up period of 43.6 months (interquartile range=41.4 months) was achieved where 44% of patients had serous tumor, followed by mucinous (23%), endometrioid (9%), mixed (9%), undifferentiated (8.5%) and clear cell tumors (5.3%). ER-alpha staining was associated with grade II-III tumors. Progesterone receptor staining was positively associated with a Body Mass Index≥25. Androgen receptor positivity was higher in serous tumors. In stand-alone analysis of receptor contribution to survival, estrogen-alpha positivity was associated with greater disease-free survival. However, there was no significant association between steroid receptor expression, HER2 status, or TNEOC status, and overall survival. Although estrogen-alpha, androgen receptor, progesterone receptor and the HER2 status were associated with key clinical features of the women and pathological characteristics of the tumors, these associations were not implicated in survival. Interestingly, women with TNEOC seem to fare the same way as their counterparts with non-TNEOC.
Resumo:
The phytopathogenic fungus Moniliophthora perniciosa (Stahel) Aime & Philips-Mora, causal agent of witches' broom disease of cocoa, causes countless damage to cocoa production in Brazil. Molecular studies have attempted to identify genes that play important roles in fungal survival and virulence. In this study, sequences deposited in the M. perniciosa Genome Sequencing Project database were analyzed to identify potential biological targets. For the first time, the ergosterol biosynthetic pathway in M. perniciosa was studied and the lanosterol 14α-demethylase gene (ERG11) that encodes the main enzyme of this pathway and is a target for fungicides was cloned, characterized molecularly and its phylogeny analyzed. ERG11 genomic DNA and cDNA were characterized and sequence analysis of the ERG11 protein identified highly conserved domains typical of this enzyme, such as SRS1, SRS4, EXXR and the heme-binding region (HBR). Comparison of the protein sequences and phylogenetic analysis revealed that the M. perniciosa enzyme was most closely related to that of Coprinopsis cinerea.
Resumo:
cDNA arrays are a powerful tool for discovering gene expression patterns. Nylon arrays have the advantage that they can be re-used several times. A key issue in high throughput gene expression analysis is sensitivity. In the case of nylon arrays, signal detection can be affected by the plastic bags used to keep membranes humid. In this study, we evaluated the effect of five types of plastics on the radioactive transmittance, number of genes with a signal above the background, and data variability. A polyethylene plastic bag 69 μm thick had a strong shielding effect that blocked 68.7% of the radioactive signal. The shielding effect on transmittance decreased the number of detected genes and increased the data variability. Other plastics which were thinner gave better results. Although plastics made from polyvinylidene chloride, polyvinyl chloride (both 13 μm thick) and polyethylene (29 and 7 μm thick) showed different levels of transmittance, they all gave similarly good performances. Polyvinylidene chloride and polyethylene 29 mm thick were the plastics of choice because of their easy handling. For other types of plastics, it is advisable to run a simple check on their performance in order to obtain the maximum information from nylon cDNA arrays.
Resumo:
Xylella fastidiosa genome sequencing has generated valuable data by identifying genes acting either on metabolic pathways or in associated pathogenicity and virulence. Based on available information on these genes, new strategies for studying their expression patterns, such as microarray technology, were employed. A total of 2,600 primer pairs were synthesized and then used to generate fragments using the PCR technique. The arrays were hybridized against cDNAs labeled during reverse transcription reactions and which were obtained from bacteria grown under two different conditions (liquid XDM2 and liquid BCYE). All data were statistically analyzed to verify which genes were differentially expressed. In addition to exploring conditions for X. fastidiosa genome-wide transcriptome analysis, the present work observed the differential expression of several classes of genes (energy, protein, amino acid and nucleotide metabolism, transport, degradation of substances, toxins and hypothetical proteins, among others). The understanding of expressed genes in these two different media will be useful in comprehending the metabolic characteristics of X. fastidiosa, and in evaluating how important certain genes are for the functioning and survival of these bacteria in plants.
Resumo:
The pathogenic fungus Fusarium graminearum is an ongoing threat to agriculture, causing losses in grain yield and quality in diverse crops. Substantial progress has been made in the identification of genes involved in the suppression of phytopathogens by antagonistic microorganisms; however, limited information regarding responses of plant pathogens to these biocontrol agents is available. Gene expression analysis was used to identify differentially expressed transcripts of the fungal plant pathogen F. graminearum under antagonistic effect of the bacterium Pantoea agglomerans. A macroarray was constructed, using 1014 transcripts from an F. graminearum cDNA library. Probes consisted of the cDNA of F. graminearum grown in the presence and in the absence of P. agglomerans. Twenty-nine genes were either up (19) or down (10) regulated during interaction with the antagonist bacterium. Genes encoding proteins associated with fungal defense and/or virulence or with nutritional and oxidative stress responses were induced. The repressed genes coded for a zinc finger protein associated with cell division, proteins containing cellular signaling domains, respiratory chain proteins, and chaperone-type proteins. These data give molecular and biochemical evidence of response of F. graminearum to an antagonist and could help develop effective biocontrol procedures for pathogenic plant fungi.
Resumo:
Background: The post-genomic era has brought new challenges regarding the understanding of the organization and function of the human genome. Many of these challenges are centered on the meaning of differential gene regulation under distinct biological conditions and can be performed by analyzing the Multiple Differential Expression (MDE) of genes associated with normal and abnormal biological processes. Currently MDE analyses are limited to usual methods of differential expression initially designed for paired analysis. Results: We proposed a web platform named ProbFAST for MDE analysis which uses Bayesian inference to identify key genes that are intuitively prioritized by means of probabilities. A simulated study revealed that our method gives a better performance when compared to other approaches and when applied to public expression data, we demonstrated its flexibility to obtain relevant genes biologically associated with normal and abnormal biological processes. Conclusions: ProbFAST is a free accessible web-based application that enables MDE analysis on a global scale. It offers an efficient methodological approach for MDE analysis of a set of genes that are turned on and off related to functional information during the evolution of a tumor or tissue differentiation. ProbFAST server can be accessed at http://gdm.fmrp.usp.br/probfast.
Resumo:
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.
Resumo:
Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over-or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.
Resumo:
Deficiencies of complement proteins of the classical pathway are strongly associated with the development of autoimmune diseases. Deficiency of Clr has been observed to occur concomitantly with deficiency in Cls and 9 out of 15 reported cases presented systemic lupus erythernatosus (SLE). Here, we describe a family in which all four children are deficient in Cls but only two of them developed SLE. Hemolytic activity mediated by the alternative and the lectin pathways were normal, but classical pathway activation was absent in all children`s sera. Cls was undetectable, while in the parents` sera it was lower than in the normal controls. The levels of Clr observed in the siblings and parents sera were lower than in the control, while the concentrations of other complement proteins (C3, C4, MBL and MASP-2) were normal in all family members. Impairment of Cls synthesis was observed in the patients` fibroblasts when analyzed by confocal microscopy. We show that all four siblings are homozygous for a mutation at position 938 in exon 6 of the Cls cDNA that creates a premature stop codon. Our investigations led us to reveal the presence of previously uncharacterized splice variants of Cls mRNA transcripts in normal human cells. These variants are derived from the skipping of exon 3 and from the use of an alternative 3` splice site within intron I which increases the size of exon 2 by 87 nucleotides. (c) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Calcineurin plays an important role in the control of cell morphology and virulence in fungi. Calcineurin is a serine/threonine-specific protein phosphatase heterodimer consisting of a catalytic subunit A and a regulatory subunit B. A mutant of Aspergillus fumigatus lacking the calcineurin A (calA) catalytic subunit exhibited defective hyphal morphology related to apical extension and branching growth, which resulted in drastically decreased filamentation. Here, we investigated which pathways are influenced by A. fumigatus calcineurin during proliferation by comparatively determining the transcriptional profile of A. fumigatus wild type and Delta calA mutant strains. Our results showed that the mitochondrial copy number is reduced in the Delta calA mutant strain, and the mutant has increased alternative oxidase (aoxA) mRNA accumulation and activity. Furthermore, we identified four genes that encode transcription factors that have increased mRNA expression in the Delta calA mutant. Deletion mutants for these transcription factors had reduced susceptibility to itraconazole, caspofungin, and sodium dodecyl sulfate (SDS). (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
Crotalus durissus rattlesnakes are responsible for the most lethal cases of snakebites in Brazil. Crotalus durissus collilineatus subspecies is related to a great number of accidents in Southeast and Central West regions, but few studies on its venom composition have been carried out to date. In an attempt to describe the transcriptional profile of the C. durissus collilineatus venom gland, we generated a cDNA library and the sequences obtained could be identified by similarity searches on existing databases. Out of 673 expressed sequence tags (ESTs) 489 produced readable sequences comprising 201 singletons and 47 clusters of two or more ESTs. One hundred and fifty reads (60.5%) produced significant hits to known sequences. The results showed a predominance of toxin-coding ESTs instead of transcripts coding for proteins involved in all cellular functions. The most frequent toxin was crotoxin, comprising 88% of toxin-coding sequences. Crotoxin B, a basic phospholipase A(2) (PLA(2)) subunit of crotoxin, was represented in more variable forms comparing to the non-enzymatic subunit (crotoxin A), and most sequences coding this molecule were identified as CB1 isoform from Crotalus durissus terrificus venom. Four percent of toxin-related sequences in this study were identified as growth factors, comprising five sequences for vascular endothelial growth factor (VEGF) and one for nerve growth factor (NGF) that showed 100% of identity with C. durissus terrificus NGF. We also identified two clusters for metalloprotease from PII class comprising 3% of the toxins, and two for serine proteases, including gyroxin (2.5%). The remaining 2.5% of toxin-coding ESTs represent singletons identified as homologue sequences to cardiotoxin, convulxin, angiotensin-converting enzyme inhibitor and C-type natriuretic peptide, Ohanin, crotamin and PLA(2) inhibitor. These results allowed the identification of the most common classes of toxins in C. durissus collilineatus snake venom, also showing some unknown classes for this subspecies and even for C. durissus species, such as cardiotoxins and VEGF. (C) 2009 Published by Elsevier Masson SAS.
Resumo:
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.