916 resultados para EXPRESSION DATA


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Analyzing large-scale gene expression data is a labor-intensive and time-consuming process. To make data analysis easier, we developed a set of pipelines for rapid processing and analysis poplar gene expression data for knowledge discovery. Of all pipelines developed, differentially expressed genes (DEGs) pipeline is the one designed to identify biologically important genes that are differentially expressed in one of multiple time points for conditions. Pathway analysis pipeline was designed to identify the differentially expression metabolic pathways. Protein domain enrichment pipeline can identify the enriched protein domains present in the DEGs. Finally, Gene Ontology (GO) enrichment analysis pipeline was developed to identify the enriched GO terms in the DEGs. Our pipeline tools can analyze both microarray gene data and high-throughput gene data. These two types of data are obtained by two different technologies. A microarray technology is to measure gene expression levels via microarray chips, a collection of microscopic DNA spots attached to a solid (glass) surface, whereas high throughput sequencing, also called as the next-generation sequencing, is a new technology to measure gene expression levels by directly sequencing mRNAs, and obtaining each mRNA’s copy numbers in cells or tissues. We also developed a web portal (http://sys.bio.mtu.edu/) to make all pipelines available to public to facilitate users to analyze their gene expression data. In addition to the analyses mentioned above, it can also perform GO hierarchy analysis, i.e. construct GO trees using a list of GO terms as an input.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Findings from the phase 3 First-Line ErbituX in lung cancer (FLEX) study showed that the addition of cetuximab to first-line chemotherapy significantly improved overall survival compared with chemotherapy alone (hazard ratio [HR] 0·871, 95% CI 0·762-0·996; p=0·044) in patients with advanced non-small-cell lung cancer (NSCLC). To define patients benefiting most from cetuximab, we studied the association of tumour EGFR expression level with clinical outcome in FLEX study patients. Methods: We used prospectively collected tumour EGFR expression data to generate an immunohistochemistry score for FLEX study patients on a continuous scale of 0-300. We used response data to select an outcome-based discriminatory threshold immunohistochemistry score for EGFR expression of 200. Treatment outcome was analysed in patients with low (immunohistochemistry score <200) and high (≥200) tumour EGFR expression. The primary endpoint in the FLEX study was overall survival. We analysed patients from the FLEX intention-to-treat (ITT) population. The FLEX study is registered with ClinicalTrials.gov, number NCT00148798. Findings: Tumour EGFR immunohistochemistry data were available for 1121 of 1125 (99·6%) patients from the FLEX study ITT population. High EGFR expression was scored for 345 (31%) evaluable patients and low for 776 (69%) patients. For patients in the high EGFR expression group, overall survival was longer in the chemotherapy plus cetuximab group than in the chemotherapy alone group (median 12·0 months [95% CI 10·2-15·2] vs 9·6 months [7·6-10·6]; HR 0·73, 0·58-0·93; p=0·011), with no meaningful increase in side-effects. We recorded no corresponding survival benefit for patients in the low EGFR expression group (median 9·8 months [8·9-12·2] vs 10·3 months [9·2-11·5]; HR 0·99, 0·84-1·16; p=0·88). A treatment interaction test assessing the difference in the HRs for overall survival between the EGFR expression groups suggested a predictive value for EGFR expression (p=0·044). Interpretation: High EGFR expression is a tumour biomarker that can predict survival benefit from the addition of cetuximab to first-line chemotherapy in patients with advanced NSCLC. Assessment of EGFR expression could offer a personalised treatment approach in this setting. Funding: Merck KGaA. © 2012 Elsevier Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Microarray data analysis is one of data mining tool which is used to extract meaningful information hidden in biological data. One of the major focuses on microarray data analysis is the reconstruction of gene regulatory network that may be used to provide a broader understanding on the functioning of complex cellular systems. Since cancer is a genetic disease arising from the abnormal gene function, the identification of cancerous genes and the regulatory pathways they control will provide a better platform for understanding the tumor formation and development. The major focus of this thesis is to understand the regulation of genes responsible for the development of cancer, particularly colorectal cancer by analyzing the microarray expression data. In this thesis, four computational algorithms namely fuzzy logic algorithm, modified genetic algorithm, dynamic neural fuzzy network and Takagi Sugeno Kang-type recurrent neural fuzzy network are used to extract cancer specific gene regulatory network from plasma RNA dataset of colorectal cancer patients. Plasma RNA is highly attractive for cancer analysis since it requires a collection of small amount of blood and it can be obtained at any time in repetitive fashion allowing the analysis of disease progression and treatment response.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A common interest in gene expression data analysis is to identify from a large pool of candidate genes the genes that present significant changes in expression levels between a treatment and a control biological condition. Usually, it is done using a statistic value and a cutoff value that are used to separate the genes differentially and nondifferentially expressed. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating sequentially credibility intervals from predictive densities which are constructed using the sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained report evidence that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a well-known publicly available data set on Escherichia coli bacterium.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

BACKGROUND: Previous studies in our laboratory have shown associations of specific nuclear receptor gene variants with sporadic breast cancer. In order to investigate these findings further, we conducted the present study to determine whether expression levels of the progesterone and glucocorticoid nuclear receptor genes vary in different breast cancer grades. METHODS: RNA was extracted from paraffin-embedded archival breast tumour tissue and converted into cDNA. Sample cDNA underwent PCR using labelled primers to enable quantitation of mRNA expression. Expression data were normalized against the 18S ribosomal gene multiplex and analyzed using analysis of variance. RESULTS: Analysis of variance indicated a variable level of expression of both genes with regard to breast cancer grade (P = 0.00033 for glucocorticoid receptor and P = 0.023 for progesterone receptor). CONCLUSION: Statistical analysis indicated that expression of the progesterone nuclear receptor is elevated in late grade breast cancer tissue.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

OBJECTIVE: This study explored gene expression differences in predicting response to chemoradiotherapy in esophageal cancer. PURPOSE:: A major pathological response to neoadjuvant chemoradiation is observed in about 40% of esophageal cancer patients and is associated with favorable outcomes. However, patients with tumors of similar histology, differentiation, and stage can have vastly different responses to the same neoadjuvant therapy. This dichotomy may be due to differences in the molecular genetic environment of the tumor cells. BACKGROUND DATA: Diagnostic biopsies were obtained from a training cohort of esophageal cancer patients (13), and extracted RNA was hybridized to genome expression microarrays. The resulting gene expression data was verified by qRT-PCR. In a larger, independent validation cohort (27), we examined differential gene expression by qRT-PCR. The ability of differentially-regulated genes to predict response to therapy was assessed in a multivariate leave-one-out cross-validation model. RESULTS: Although 411 genes were differentially expressed between normal and tumor tissue, only 103 genes were altered between responder and non-responder tumor; and 67 genes differentially expressed >2-fold. These included genes previously reported in esophageal cancer and a number of novel genes. In the validation cohort, 8 of 12 selected genes were significantly different between the response groups. In the predictive model, 5 of 8 genes could predict response to therapy with 95% accuracy in a subset (74%) of patients. CONCLUSIONS: This study has identified a gene microarray pattern and a set of genes associated with response to neoadjuvant chemoradiation in esophageal cancer. The potential of these genes as biomarkers of response to treatment warrants further investigation. Copyright © 2009 by Lippincott Williams & Wilkins.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Relative abundance data is common in the life sciences, but appreciation that it needs special analysis and interpretation is scarce. Correlation is popular as a statistical measure of pairwise association but should not be used on data that carry only relative information. Using timecourse yeast gene expression data, we show how correlation of relative abundances can lead to conclusions opposite to those drawn from absolute abundances, and that its value changes when different components are included in the analysis. Once all absolute information has been removed, only a subset of those associations will reliably endure in the remaining relative data, specifically, associations where pairs of values behave proportionally across observations. We propose a new statistic φ to describe the strength of proportionality between two variables and demonstrate how it can be straightforwardly used instead of correlation as the basis of familiar analyses and visualization methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Objective: To identify differentially expressed genes in peripheral blood mononuclear cells (PBMCs) from patients with ankylosing spondylitis (AS) compared with healthy individuals. Methods: RNA was extracted from PBMCs collected from 18 patients with active disease and 18 gender-matched and age-matched controls. Expression profiles of these cells were determined using microarray. Candidate genes with differential expressions were confirmed in the same samples using quantitative reverse transcription-PCR (qRT-PCR). These genes were then validated in a different sample cohort of 35 patients with AS and 18 controls by qRT-PCR. Results: Microarray analysis identified 452 genes detected with 485 probes which were differentially expressed between patients with AS and controls. Underexpression of NR4A2, tumour necrosis factor AIP3 (TNFAIP3) and CD69 was confirmed. These genes were further validated in a different sample group in which the patients with AS had a wider range of disease activity. Predictive algorithms were also developed from the expression data using receiver-operating characteristic curves, which demonstrated that the three candidate genes have ∼80% power to predict AS according to their expression levels. Conclusions: The findings show differences in global gene expression patterns between patients with AS and controls, suggesting an immunosuppressive phenotype in the patients. Furthermore, downregulated expression of three immune-related genes was confirmed. These candidate genes were also shown to be strong predictive markers for AS.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: Rhipicephalus (Boophilus) microplus evades the host's haemostatic system through a complex protein array secreted into tick saliva. Serine protease inhibitors (serpins) conform an important component of saliva which are represented by a large protease inhibitor family in Ixodidae. These secreted and non-secreted inhibitors modulate diverse and essential proteases involved in different physiological processes. Methods: The identification of R. microplus serpin sequences was performed through a web-based bioinformatics environment called Yabi. The database search was conducted on BmiGi V1, BmiGi V2.1, five SSH libraries, Australian tick transcriptome libraries and RmiTR V1 using bioinformatics methods. Semi quantitative PCR was carried out using different adult tissues and tick development stages. The cDNA of four identified R. microplus serpins were cloned and expressed in Pichia pastoris in order to determine biological targets of these serpins utilising protease inhibition assays. Results: A total of four out of twenty-two serpins identified in our analysis are new R. microplus serpins which were named as RmS-19 to RmS-22. The analyses of DNA and predicted amino acid sequences showed high conservation of the R. microplus serpin sequences. The expression data suggested ubiquitous expression of RmS except for RmS-6 and RmS-14 that were expressed only in nymphs and adult female ovaries, respectively. RmS-19, and -20 were expressed in all tissues samples analysed showing their important role in both parasitic and non-parasitic stages of R. microplus development. RmS-21 was not detected in ovaries and RmS-22 was not identified in ovary and nymph samples but were expressed in the rest of the samples analysed. A total of four expressed recombinant serpins showed protease specific inhibition for Chymotrypsin (RmS-1 and RmS-6), Chymotrypsin / Elastase (RmS-3) and Thrombin (RmS-15). Conclusion: This study constitutes an important contribution and improvement to the knowledge about the physiologic role of R. microplus serpins during the host-tick interaction.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Extrapulmonary manifestations constitute 15 to 20% of tuberculosis cases, with lymph node tuberculosis (LNTB) as the most common form of infection. However, diagnosis and treatment advances are hindered by lack of understanding of LNTB biology. To identify host response, Mycobacterium tuberculosis infected lymph nodes from LNTB patients were studied by means of transcriptomics and quantitative proteomics analyses. The selected targets obtained by comparative analyses were validated by quantitative PCR and immunohistochemistry. This approach provided expression data for 8,728 transcripts and 102 proteins, differentially regulated in the infected human lymph node. Enhanced inflammation with upregulation of T-helper1-related genes, combined with marked dysregulation of matrix metalloproteinases, indicates tissue damage due to high immunoactivity at infected niche. This expression signature was accompanied by significant upregulation of an immunoregulatory gene, leukotriene A4 hydrolase, at both transcript and protein levels. Comparative transcriptional analyses revealed LNTB-specific perturbations. In contrast to pulmonary TB-associated increase in lipid metabolism, genes involved in fatty-acid metabolism were found to be downregulated in LNTB suggesting differential lipid metabolic signature. This study investigates the tissue molecular signature of LNTB patients for the first time and presents findings that indicate the possible mechanism of disease pathology through dysregulation of inflammatory and tissue-repair processes.