916 resultados para EXPRESSION DATA
Resumo:
基因表达数据的爆炸性增长迫切需求自动、有效的数据分析工具.目前聚类分析己成为分析基因表达数据获取生物学信息的有力工具.为了更好地挖掘基因表达数据,近年来提出了许多改进的传统聚类算法和新聚类算法.本文首先简单介绍了基因表达数据的获取和表示,之后系统地介绍了近年来应用在基因表达数据分析中的聚类算法.根据聚类目标的不同将算法分为基于基因的聚类、基于样本的聚类和两路聚类,并对每类算法介绍了其生物学的含义及其难点,详细讨论了各种算法的基本原理及优缺点.最后总结了当前的基因表达数据的聚类分析方法,并对发展趋势作了进一步的展望.
Resumo:
根据基因表达数据的特点,提出一种高精度的基于密度的聚类算法DENGENE.DENGENE通过定义一致性检测和引进峰点改进搜索方向,使得算法能够更好地处理基因表达数据.为了评价算法的性能,选取了两组广为使用的测试数据,即啤酒酵母基因表达数据集对算法来进行测试.实验结果表明,与基于模型的五种算法、CAST算法、K-均值聚类等相比,DENGENE在滤除噪声和聚类精度方面取得了显著的改善.
Resumo:
In this paper, we present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal trade-off between goodness of fit and model complexity (including the number of discretization levels). Using the so-called finest grid implied by the data, our scoring function depends only on the number of data points in the various discretization levels. Not only can it be computed efficiently, but it is also independent of the metric used in the continuous space. Our experiments with gene expression data show that discretization plays a crucial role regarding the resulting network structure.
Resumo:
Rowland, J.J. (2003) Model Selection Methodology in Supervised Learning with Evolutionary Computation. BioSystems 72, 1-2, pp 187-196, Nov
Resumo:
Human adipose stem cells (hASCs) can differentiate into a variety of phenotypes. Native extracellular matrix (e.g., demineralized bone matrix or small intestinal submucosa) can influence the growth and differentiation of stem cells. The hypothesis of this study was that a novel ligament-derived matrix (LDM) would enhance expression of a ligamentous phenotype in hASCs compared to collagen gel alone. LDM prepared using phosphate-buffered saline or 0.1% peracetic acid was mixed with collagen gel (COL) and was evaluated for its ability to induce proliferation, differentiation, and extracellular matrix synthesis in hASCs over 28 days in culture at different seeding densities (0, 0.25 x 10(6), 1 x 10(6), or 2 x 10(6) hASC/mL). Biochemical and gene expression data were analyzed using analysis of variance. Fisher's least significant difference test was used to determine differences between treatments following analysis of variance. hASCs in either LDM or COL demonstrated changes in gene expression consistent with ligament development. hASCs cultured with LDM demonstrated more dsDNA content, sulfated-glycosaminoglycan accumulation, and type I and III collagen synthesis, and released more sulfated-glycosaminoglycan and collagen into the medium compared to hASCs in COL (p
Resumo:
BACKGROUND: A major challenge in oncology is the selection of the most effective chemotherapeutic agents for individual patients, while the administration of ineffective chemotherapy increases mortality and decreases quality of life in cancer patients. This emphasizes the need to evaluate every patient's probability of responding to each chemotherapeutic agent and limiting the agents used to those most likely to be effective. METHODS AND RESULTS: Using gene expression data on the NCI-60 and corresponding drug sensitivity, mRNA and microRNA profiles were developed representing sensitivity to individual chemotherapeutic agents. The mRNA signatures were tested in an independent cohort of 133 breast cancer patients treated with the TFAC (paclitaxel, 5-fluorouracil, adriamycin, and cyclophosphamide) chemotherapy regimen. To further dissect the biology of resistance, we applied signatures of oncogenic pathway activation and performed hierarchical clustering. We then used mRNA signatures of chemotherapy sensitivity to identify alternative therapeutics for patients resistant to TFAC. Profiles from mRNA and microRNA expression data represent distinct biologic mechanisms of resistance to common cytotoxic agents. The individual mRNA signatures were validated in an independent dataset of breast tumors (P = 0.002, NPV = 82%). When the accuracy of the signatures was analyzed based on molecular variables, the predictive ability was found to be greater in basal-like than non basal-like patients (P = 0.03 and P = 0.06). Samples from patients with co-activated Myc and E2F represented the cohort with the lowest percentage (8%) of responders. Using mRNA signatures of sensitivity to other cytotoxic agents, we predict that TFAC non-responders are more likely to be sensitive to docetaxel (P = 0.04), representing a viable alternative therapy. CONCLUSIONS: Our results suggest that the optimal strategy for chemotherapy sensitivity prediction integrates molecular variables such as ER and HER2 status with corresponding microRNA and mRNA expression profiles. Importantly, we also present evidence to support the concept that analysis of molecular variables can present a rational strategy to identifying alternative therapeutic opportunities.
Resumo:
While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns.
Resumo:
Determining how information flows along anatomical brain pathways is a fundamental requirement for understanding how animals perceive their environments, learn, and behave. Attempts to reveal such neural information flow have been made using linear computational methods, but neural interactions are known to be nonlinear. Here, we demonstrate that a dynamic Bayesian network (DBN) inference algorithm we originally developed to infer nonlinear transcriptional regulatory networks from gene expression data collected with microarrays is also successful at inferring nonlinear neural information flow networks from electrophysiology data collected with microelectrode arrays. The inferred networks we recover from the songbird auditory pathway are correctly restricted to a subset of known anatomical paths, are consistent with timing of the system, and reveal both the importance of reciprocal feedback in auditory processing and greater information flow to higher-order auditory areas when birds hear natural as opposed to synthetic sounds. A linear method applied to the same data incorrectly produces networks with information flow to non-neural tissue and over paths known not to exist. To our knowledge, this study represents the first biologically validated demonstration of an algorithm to successfully infer neural information flow networks.
Resumo:
Estimation of the skeleton of a directed acyclic graph (DAG) is of great importance for understanding the underlying DAG and causal effects can be assessed from the skeleton when the DAG is not identifiable. We propose a novel method named PenPC to estimate the skeleton of a high-dimensional DAG by a two-step approach. We first estimate the nonzero entries of a concentration matrix using penalized regression, and then fix the difference between the concentration matrix and the skeleton by evaluating a set of conditional independence hypotheses. For high-dimensional problems where the number of vertices p is in polynomial or exponential scale of sample size n, we study the asymptotic property of PenPC on two types of graphs: traditional random graphs where all the vertices have the same expected number of neighbors, and scale-free graphs where a few vertices may have a large number of neighbors. As illustrated by extensive simulations and applications on gene expression data of cancer patients, PenPC has higher sensitivity and specificity than the state-of-the-art method, the PC-stable algorithm.
Resumo:
Clustering analysis of data from DNA microarray hybridization studies is an essential task for identifying biologically relevant groups of genes. Attribute cluster algorithm (ACA) has provided an attractive way to group and select meaningful genes. However, ACA needs much prior knowledge about the genes to set the number of clusters. In practical applications, if the number of clusters is misspecified, the performance of the ACA will deteriorate rapidly. In fact, it is a very demanding to do that because of our little knowledge. We propose the Cooperative Competition Cluster Algorithm (CCCA) in this paper. In the algorithm, we assume that both cooperation and competition exist simultaneously between clusters in the process of clustering. By using this principle of Cooperative Competition, the number of clusters can be found in the process of clustering. Experimental results on a synthetic and gene expression data are demonstrated. The results show that CCCA can choose the number of clusters automatically and get excellent performance with respect to other competing methods.
Resumo:
In this paper, we introduce a method to detect pathological pathways of a disease. We aim to identify biological processes rather than single genes affected by the chronic fatigue syndrome (CFS). So far, CFS has neither diagnostic clinical signals nor abnormalities that could be diagnosed by laboratory examinations. It is also unclear if the CFS represents one disease or can be subdivided in different categories. We use information from clinical trials, the gene ontology (GO) database as well as gene expression data to identify undirected dependency graphs (UDGs) representing biological processes according to the GO database. The structural comparison of UDGs of sick versus non-sick patients allows us to make predictions about the modification of pathways due to pathogenesis.
Resumo:
Motivation: The inference of regulatory networks from large-scale expression data holds great promise because of the potentially causal interpretation of these networks. However, due to the difficulty to establish reliable methods based on observational data there is so far only incomplete knowledge about possibilities and limitations of such inference methods in this context.
Results: In this article, we conduct a statistical analysis investigating differences and similarities of four network inference algorithms, ARACNE, CLR, MRNET and RN, with respect to local network-based measures. We employ ensemble methods allowing to assess the inferability down to the level of individual edges. Our analysis reveals the bias of these inference methods with respect to the inference of various network components and, hence, provides guidance in the interpretation of inferred regulatory networks from expression data. Further, as application we predict the total number of regulatory interactions in human B cells and hypothesize about the role of Myc and its targets regarding molecular information processing.
Resumo:
Background
Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically.
Results
In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently.
Conclusions
For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.
Resumo:
This study examined variations in gene expression between FFPE blocks within tumors of individual patients. Microarray data were used to measure tumor heterogeneity within and between patients and disease states. Data were used to determine the number of samples needed to power biomarker discovery studies. Bias and variation in gene expression were assessed at the intrapatient and interpatient levels and between adenocarcinoma and squamous samples. A mixed-model analysis of variance was fitted to gene expression data and model signatures to assess the statistical significance of observed variations within and between samples and disease states. Sample size analysis, adjusted for sample heterogeneity, was used to determine the number of samples required to support biomarker discovery studies. Variation in gene expression was observed between blocks taken from a single patient. However, this variation was considerably less than differences between histological characteristics. This degree of block-to-block variation still permits biomarker discovery using either macrodissected tumors or whole FFPE sections, provided that intratumor heterogeneity is taken into account. Failure to consider intratumor heterogeneity may result in underpowered biomarker studies that may result in either the generation of longer gene signatures or the inability to identify a viable biomarker. Moreover, the results of this study indicate that a single biopsy sample is suitable for applying a biomarker in nonsmall-cell lung cancer. © 2012 American Society for Investigative Pathology and the Association for Molecular Pathology.
Resumo:
Background: MicroRNAs (miRNAs) are a class of small RNA molecules that regulate expression of specific mRNA targets. They can be released from cells, often encapsulated within extracellular vesicles (EVs), and therefore have the potential to mediate intercellular communication. It has been suggested that certain miRNAs may be selectively exported, although the mechanism has yet to be identified. Manipulation of the miRNA content of EVs will be important for future therapeutic applications. We therefore wished to assess which endogenous miRNAs are enriched in EVs and how effectively an overexpressed miRNA would be exported.
Results: Small RNA libraries from HEK293T cells and vesicles before or after transfection with a vector for miR-146a overexpression were analysed by deep sequencing. A subset of miRNAs was found to be enriched in EVs; pathway analysis of their predicted target genes suggests a potential role in regulation of endocytosis. RT-qPCR in additional cell types and analysis of publicly available data revealed that many of these miRNAs tend to be widely preferentially exported. Whilst overexpressed miR-146a was highly enriched both in transfected cells and their EVs, the cellular:EV ratios of endogenous miRNAs were not grossly altered. MiR-451 was consistently the most highly exported miRNA in many different cell types. Intriguingly, Argonaute2 (Ago2) is required for miR-451 maturation and knock out of Ago2 has been shown to decrease expression of other preferentially exported miRNAs (eg miR-150 and miR-142-3p).
Conclusion: The global expression data provided by deep sequencing confirms that specific miRNAs are enriched in EVs released by HEK293T cells. Observation of similar patterns in a range of cell types suggests that a common mechanism for selective miRNA export may exist.