3 resultados para Inference process
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
In this article we introduce a three-parameter extension of the bivariate exponential-geometric (BEG) law (Kozubowski and Panorska, 2005) [4]. We refer to this new distribution as the bivariate gamma-geometric (BGG) law. A bivariate random vector (X, N) follows the BGG law if N has geometric distribution and X may be represented (in law) as a sum of N independent and identically distributed gamma variables, where these variables are independent of N. Statistical properties such as moment generation and characteristic functions, moments and a variance-covariance matrix are provided. The marginal and conditional laws are also studied. We show that BBG distribution is infinitely divisible, just as the BEG model is. Further, we provide alternative representations for the BGG distribution and show that it enjoys a geometric stability property. Maximum likelihood estimation and inference are discussed and a reparametrization is proposed in order to obtain orthogonality of the parameters. We present an application to a real data set where our model provides a better fit than the BEG model. Our bivariate distribution induces a bivariate Levy process with correlated gamma and negative binomial processes, which extends the bivariate Levy motion proposed by Kozubowski et al. (2008) [6]. The marginals of our Levy motion are a mixture of gamma and negative binomial processes and we named it BMixGNB motion. Basic properties such as stochastic self-similarity and the covariance matrix of the process are presented. The bivariate distribution at fixed time of our BMixGNB process is also studied and some results are derived, including a discussion about maximum likelihood estimation and inference. (C) 2012 Elsevier Inc. All rights reserved.