175 resultados para Applied Mathematics|Computer Engineering|Computer science

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: Understanding the patterns of association between polymorphisms at different loci in a population ( linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging. Results: We present a more practical method to build GM that describe LD. The method is based on learning weighted Bayesian network structures from haplotype data, extracting equivalence structure classes and using them to model LD. The results obtained in public data from the HapMap database showed that the method is a promising tool for modeling LD. The associations represented by the learned models are correlated with the traditional measure of LD D`. The method was able to represent LD blocks found by standard tools. The granularity of the association blocks and the readability of the models can be controlled in the method. The results suggest that the causality information gained by our method can be useful to tell about the conservability of the genetic markers and to guide the selection of subset of representative markers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: DNA assembly programs classically perform an all-against-all comparison of reads to identify overlaps, followed by a multiple sequence alignment and generation of a consensus sequence. If the aim is to assemble a particular segment, instead of a whole genome or transcriptome, a target-specific assembly is a more sensible approach. GenSeed is a Perl program that implements a seed-driven recursive assembly consisting of cycles comprising a similarity search, read selection and assembly. The iterative process results in a progressive extension of the original seed sequence. GenSeed was tested and validated on many applications, including the reconstruction of nuclear genes or segments, full-length transcripts, and extrachromosomal genomes. The robustness of the method was confirmed through the use of a variety of DNA and protein seeds, including short sequences derived from SAGE and proteome projects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of pharmacokinetic properties (PK) is of great importance in drug discovery and development. In the present work, PK/DB (a new freely available database for PK) was designed with the aim of creating robust databases for pharmacokinetic studies and in silico absorption, distribution, metabolism and excretion (ADME) prediction. Comprehensive, web-based and easy to access, PK/DB manages 1203 compounds which represent 2973 pharmacokinetic measurements, including five models for in silico ADME prediction (human intestinal absorption, human oral bioavailability, plasma protein binding, bloodbrain barrier and water solubility).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a likelihood ratio test ( LRT) with Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a previously published bootstrapbased approach. LRT is shown to be significantly faster and statistically powerful even within non- Normal distributions. An R package named gGranger containing an implementation for both Granger causality identification tests is also provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe AMIN (Amidase N-terminal domain), a novel protein domain found specifically in bacterial periplasmic proteins. AMIN domains are widely distributed among peptidoglycan hydrolases and transporter protein families. Based on experimental data, contextual information and phyletic profiles, we suggest that AMIN domains mediate the targeting of periplasmic or extracellular proteins to specific regions of the bacterial envelope.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the analysis of stability of a variant of the Crank-Nicolson (C-N) method for the heat equation on a staggered grid a class of non-symmetric matrices appear that have an interesting property: their eigenvalues are all real and lie within the unit circle. In this note we shall show how this class of matrices is derived from the C-N method and prove that their eigenvalues are inside [-1, 1] for all values of m (the order of the matrix) and all values of a positive parameter a, the stability parameter sigma. As the order of the matrix is general, and the parameter sigma lies on the positive real line this class of matrices turns out to be quite general and could be of interest as a test set for eigenvalue solvers, especially as examples of very large matrices. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dynamic Time Warping (DTW), a pattern matching technique traditionally used for restricted vocabulary speech recognition, is based on a temporal alignment of the input signal with the template models. The principal drawback of DTW is its high computational cost as the lengths of the signals increase. This paper shows extended results over our previously published conference paper, which introduces an optimized version of the DTW I hat is based on the Discrete Wavelet Transform (DWT). (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a new wavelet-based algorithm for low-cost computation of the cepstrum. It can be used for real time precise pitch determination in automatic speech and speaker recognition systems. Many wavelet families are examined to determine the one that works best. The results confirm the efficacy and accuracy of the proposed technique for pitch extraction. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We study the following problem. Given two sequences x and y over a finite alphabet, find a repetition-free longest common subsequence of x and y. We show several algorithmic results, a computational complexity result, and we describe a preliminary experimental study based on the proposed algorithms. We also show that this problem is APX-hard. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For a fixed family F of graphs, an F-packing in a graph G is a set of pairwise vertex-disjoint subgraphs of G, each isomorphic to an element of F. Finding an F-packing that maximizes the number of covered edges is a natural generalization of the maximum matching problem, which is just F = {K(2)}. In this paper we provide new approximation algorithms and hardness results for the K(r)-packing problem where K(r) = {K(2), K(3,) . . . , K(r)}. We show that already for r = 3 the K(r)-packing problem is APX-complete, and, in fact, we show that it remains so even for graphs with maximum degree 4. On the positive side, we give an approximation algorithm with approximation ratio at most 2 for every fixed r. For r = 3, 4, 5 we obtain better approximations. For r = 3 we obtain a simple 3/2-approximation, achieving a known ratio that follows from a more involved algorithm of Halldorsson. For r = 4, we obtain a (3/2 + epsilon)-approximation, and for r = 5 we obtain a (25/14 + epsilon)-approximation. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Let M = (V, E, A) be a mixed graph with vertex set V, edge set E and arc set A. A cycle cover of M is a family C = {C(1), ... , C(k)} of cycles of M such that each edge/arc of M belongs to at least one cycle in C. The weight of C is Sigma(k)(i=1) vertical bar C(i)vertical bar. The minimum cycle cover problem is the following: given a strongly connected mixed graph M without bridges, find a cycle cover of M with weight as small as possible. The Chinese postman problem is: given a strongly connected mixed graph M, find a minimum length closed walk using all edges and arcs of M. These problems are NP-hard. We show that they can be solved in polynomial time if M has bounded tree-width. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Given two strings A and B of lengths n(a) and n(b), n(a) <= n(b), respectively, the all-substrings longest common subsequence (ALCS) problem obtains, for every substring B` of B, the length of the longest string that is a subsequence of both A and B. The ALCS problem has many applications, such as finding approximate tandem repeats in strings, solving the circular alignment of two strings and finding the alignment of one string with several others that have a common substring. We present an algorithm to prepare the basic data structure for ALCS queries that takes O(n(a)n(b)) time and O(n(a) + n(b)) space. After this preparation, it is possible to build that allows any LCS length to be retrieved in constant time. Some trade-offs between the space required and a matrix of size O(n(b)(2)) the querying time are discussed. To our knowledge, this is the first algorithm in the literature for the ALCS problem. (C) 2007 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Since the computer viruses pose a serious problem to individual and corporative computer systems, a lot of effort has been dedicated to study how to avoid their deleterious actions, trying to create anti-virus programs acting as vaccines in personal computers or in strategic network nodes. Another way to combat viruses propagation is to establish preventive policies based on the whole operation of a system that can be modeled with population models, similar to those that are used in epidemiological studies. Here, a modified version of the SIR (Susceptible-Infected-Removed) model is presented and how its parameters are related to network characteristics is explained. Then, disease-free and endemic equilibrium points are calculated, stability and bifurcation conditions are derived and some numerical simulations are shown. The relations among the model parameters in the several bifurcation conditions allow a network design minimizing viruses risks. (C) 2009 Elsevier Inc. All rights reserved.