13 resultados para text vector space model
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.
Resumo:
XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.
Resumo:
Background: In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded. Methods: We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood. Results: For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib. Conclusions: From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.
Resumo:
We review recent visualization techniques aimed at supporting tasks that require the analysis of text documents, from approaches targeted at visually summarizing the relevant content of a single document to those aimed at assisting exploratory investigation of whole collections of documents.Techniques are organized considering their target input materialeither single texts or collections of textsand their focus, which may be at displaying content, emphasizing relevant relationships, highlighting the temporal evolution of a document or collection, or helping users to handle results from a query posed to a search engine.We describe the approaches adopted by distinct techniques and briefly review the strategies they employ to obtain meaningful text models, discuss how they extract the information required to produce representative visualizations, the tasks they intend to support and the interaction issues involved, and strengths and limitations. Finally, we show a summary of techniques, highlighting their goals and distinguishing characteristics. We also briefly discuss some open problems and research directions in the fields of visual text mining and text analytics.
Resumo:
Model predictive control (MPC) applications in the process industry usually deal with process systems that show time delays (dead times) between the system inputs and outputs. Also, in many industrial applications of MPC, integrating outputs resulting from liquid level control or recycle streams need to be considered as controlled outputs. Conventional MPC packages can be applied to time-delay systems but stability of the closed loop system will depend on the tuning parameters of the controller and cannot be guaranteed even in the nominal case. In this work, a state space model based on the analytical step response model is extended to the case of integrating time systems with time delays. This model is applied to the development of two versions of a nominally stable MPC, which is designed to the practical scenario in which one has targets for some of the inputs and/or outputs that may be unreachable and zone control (or interval tracking) for the remaining outputs. The controller is tested through simulation of a multivariable industrial reactor system. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Despite their generality, conventional Volterra filters are inadequate for some applications, due to the huge number of parameters that may be needed for accurate modelling. When a state-space model of the target system is known, this can be assessed by computing its kernels, which also provides valuable information for choosing an adequate alternate Volterra filter structure, if necessary, and is useful for validating parameter estimation procedures. In this letter, we derive expressions for the kernels by using the Carleman bilinearization method, for which an efficient algorithm is given. Simulation results are presented, which confirm the usefulness of the proposed approach.
Resumo:
Let G = Z(pk) be a cyclic group of prime power order and let V and W be orthogonal representations of G with V-G = W-G = W-G = {0}. Let S(V) be the sphere of V and suppose f: S(V) -> W is a G-equivariant mapping. We give an estimate for the dimension of the set f(-1){0} in terms of V and W. This extends the Bourgin-Yang version of the Borsuk-Ulam theorem to this class of groups. Using this estimate, we also estimate the size of the G-coincidences set of a continuous map from S(V) into a real vector space W'.
Resumo:
Bol algebras appear as the tangent algebra of Bol loops. A (left) Bol algebra is a vector space equipped with a binary operation [a, b] and a ternary operation {a, b, c} that satisfy five defining identities. If A is a left or right alternative algebra then A(b) is a Bol algebra, where [a, b] := ab - ba is the commutator and {a, b, c} := < b, c, a > is the Jordan associator. A special identity is an identity satisfied by Ab for all right alternative algebras A, but not satisfied by the free Bol algebra. We show that there are no special identities of degree <= 7, but there are special identities of degree 8. We obtain all the special identities of degree 8 in partition six-two. (C) 2011 Elsevier Inc. All rights reserved.
Resumo:
A subspace representation of a poset S = {s(1), ..., S-t} is given by a system (V; V-1, ..., V-t) consisting of a vector space V and its sub-spaces V-i such that V-i subset of V-j if s(i) (sic) S-j. For each real-valued vector chi = (chi(1), ..., chi(t)) with positive components, we define a unitary chi-representation of S as a system (U: U-1, ..., U-t) that consists of a unitary space U and its subspaces U-i such that U-i subset of U-j if S-i (sic) S-j and satisfies chi 1 P-1 + ... + chi P-t(t) = 1, in which P-i is the orthogonal projection onto U-i. We prove that S has a finite number of unitarily nonequivalent indecomposable chi-representations for each weight chi if and only if S has a finite number of nonequivalent indecomposable subspace representations; that is, if and only if S contains any of Kleiner's critical posets. (c) 2012 Elsevier Inc. All rights reserved.
Resumo:
Gelfand and Ponomarev [I.M. Gelfand, V.A. Ponomarev, Remarks on the classification of a pair of commuting linear transformations in a finite dimensional vector space, Funct. Anal. Appl. 3 (1969) 325-326] proved that the problem of classifying pairs of commuting linear operators contains the problem of classifying k-tuples of linear operators for any k. We prove an analogous statement for semilinear operators. (C) 2011 Elsevier Inc. All rights reserved.
Resumo:
We employ the approach of stochastic dynamics to describe the dissemination of vector-borne diseases such as dengue, and we focus our attention on the characterization of the threshold of the epidemic. The coexistence space comprises two representative spatial structures for both human and mosquito populations. The human population has its evolution described by a process that is similar to the Susceptible-Infected-Recovered (SIR) dynamics. The population of mosquitoes follows a dynamic of the type of the Susceptible Infected-Susceptible (SIS) model. The coexistence space is a bipartite lattice constituted by two structures representing the human and mosquito populations. We develop a truncation scheme to solve the evolution equations for the densities and the two-site correlations from which we get the threshold of the disease and the reproductive ratio. We present a precise deØnition of the reproductive ratio which reveals the importance of the correlations developed in the early stage of the disease. According to our deØnition, the reproductive rate is directed related to the conditional probability of the occurrence of a susceptible human (mosquito) given the presence in the neighborhood of an infected mosquito (human). The threshold of the epidemic as well as the phase transition between the epidemic and the non-epidemic states are also obtained by performing Monte Carlo simulations. References: [1] David R. de Souza, T^ania Tom∂e, , Suani R. T. Pinho, Florisneide R. Barreto and M∂ario J. de Oliveira, Phys. Rev. E 87, 012709 (2013). [2] D. R. de Souza, T. Tom∂e and R. M. ZiÆ, J. Stat. Mech. P03006 (2011).
Resumo:
Backgrounds Ea aims: The boundaries between the categories of body composition provided by vectorial analysis of bioimpedance are not well defined. In this paper, fuzzy sets theory was used for modeling such uncertainty. Methods: An Italian database with 179 cases 18-70 years was divided randomly into developing (n = 20) and testing samples (n = 159). From the 159 registries of the testing sample, 99 contributed with unequivocal diagnosis. Resistance/height and reactance/height were the input variables in the model. Output variables were the seven categories of body composition of vectorial analysis. For each case the linguistic model estimated the membership degree of each impedance category. To compare such results to the previously established diagnoses Kappa statistics was used. This demanded singling out one among the output set of seven categories of membership degrees. This procedure (defuzzification rule) established that the category with the highest membership degree should be the most likely category for the case. Results: The fuzzy model showed a good fit to the development sample. Excellent agreement was achieved between the defuzzified impedance diagnoses and the clinical diagnoses in the testing sample (Kappa = 0.85, p < 0.001). Conclusions: fuzzy linguistic model was found in good agreement with clinical diagnoses. If the whole model output is considered, information on to which extent each BIVA category is present does better advise clinical practice with an enlarged nosological framework and diverse therapeutic strategies. (C) 2012 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.
Resumo:
In this work we present an agent-based model for the spread of tuberculosis where the individuals can be infected with either drug-susceptible or drug-resistant strains and can also receive a treatment. The dynamics of the model and the role of each one of the parameters are explained. The whole set of parameters is explored to check their importance in the numerical simulation results. The model captures the beneficial impact of the adequate treatment on the prevalence of tuberculosis. Nevertheless, depending on the treatment parameters range, it also captures the emergence of drug resistance. Drug resistance emergence is particularly likely to occur for parameter values corresponding to less efficacious treatment, as usually found in developing countries.