69 resultados para inference problem

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Chagas disease is still a major public health problem in Latin America. Its causative agent, Trypanosoma cruzi, can be typed into three major groups, T. cruzi I, T. cruzi II and hybrids. These groups each have specific genetic characteristics and epidemiological distributions. Several highly virulent strains are found in the hybrid group; their origin is still a matter of debate. The null hypothesis is that the hybrids are of polyphyletic origin, evolving independently from various hybridization events. The alternative hypothesis is that all extant hybrid strains originated from a single hybridization event. We sequenced both alleles of genes encoding EF-1 alpha, actin and SSU rDNA of 26 T. cruzi strains and DHFR-TS and TR of 12 strains. This information was used for network genealogy analysis and Bayesian phylogenies. We found T. cruzi I and T. cruzi II to be monophyletic and that all hybrids had different combinations of T. cruzi I and T. cruzi II haplotypes plus hybrid-specific haplotypes. Bootstrap values (networks) and posterior probabilities (Bayesian phylogenies) of clades supporting the monophyly of hybrids were far below the 95% confidence interval, indicating that the hybrid group is polyphyletic. We hypothesize that T. cruzi I and T. cruzi II are two different species and that the hybrids are extant representatives of independent events of genome hybridization, which sporadically have sufficient fitness to impact on the epidemiology of Chagas disease.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hepatitis B is a worldwide health problem affecting about 2 billion people and more than 350 million are chronic carriers of the virus. Nine HBV genotypes (A to I) have been described. The geographical distribution of HBV genotypes is not completely understood due to the limited number of samples from some parts of the world. One such example is Colombia, in which few studies have described the HBV genotypes. In this study, we characterized HBV genotypes in 143 HBsAg-positive volunteer blood donors from Colombia. A fragment of 1306 bp partially comprising HBsAg and the DNA polymerase coding regions (S/POL) was amplified and sequenced. Bayesian phylogenetic analyses were conducted using the Markov Chain Monte Carlo (MCMC) approach to obtain the maximum clade credibility (MCC) tree using BEAST v.1.5.3. Of all samples, 68 were positive and 52 were successfully sequenced. Genotype F was the most prevalent in this population (77%) - subgenotypes F3 (75%) and Fib (2%). Genotype G (7.7%) and subgenotype A2 (15.3%) were also found. Genotype G sequence analysis suggests distinct introductions of this genotype in the country. Furthermore, we estimated the time of the most recent common ancestor (TMRCA) for each HBV/F subgenotype and also for Colombian F3 sequences using two different datasets: (i) 77 sequences comprising 1306 bp of S/POL region and (ii) 283 sequences comprising 681 bp of S/POL region. We also used two other previously estimated evolutionary rates: (i) 2.60 x 10(-4) s/s/y and (ii) 1.5 x 10(-5) s/s/y. Here we report the HBV genotypes circulating in Colombia and estimated the TMRCA for the four different subgenotypes of genotype F. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For many learning tasks the duration of the data collection can be greater than the time scale for changes of the underlying data distribution. The question we ask is how to include the information that data are aging. Ad hoc methods to achieve this include the use of validity windows that prevent the learning machine from making inferences based on old data. This introduces the problem of how to define the size of validity windows. In this brief, a new adaptive Bayesian inspired algorithm is presented for learning drifting concepts. It uses the analogy of validity windows in an adaptive Bayesian way to incorporate changes in the data distribution over time. We apply a theoretical approach based on information geometry to the classification problem and measure its performance in simulations. The uncertainty about the appropriate size of the memory windows is dealt with in a Bayesian manner by integrating over the distribution of the adaptive window size. Thus, the posterior distribution of the weights may develop algebraic tails. The learning algorithm results from tracking the mean and variance of the posterior distribution of the weights. It was found that the algebraic tails of this posterior distribution give the learning algorithm the ability to cope with an evolving environment by permitting the escape from local traps.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we introduce a semi-parametric Bayesian approach based on Dirichlet process priors for the discrete calibration problem in binomial regression models. An interesting topic is the dosimetry problem related to the dose-response model. A hierarchical formulation is provided so that a Markov chain Monte Carlo approach is developed. The methodology is applied to simulated and real data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the capacitated lot sizing problem (CLSP) with a single stage composed of multiple plants, items and periods with setup carry-over among the periods. The CLSP is well studied and many heuristics have been proposed to solve it. Nevertheless, few researches explored the multi-plant capacitated lot sizing problem (MPCLSP), which means that few solution methods were proposed to solve it. Furthermore, to our knowledge, no study of the MPCLSP with setup carry-over was found in the literature. This paper presents a mathematical model and a GRASP (Greedy Randomized Adaptive Search Procedure) with path relinking to the MPCLSP with setup carry-over. This solution method is an extension and adaptation of a previously adopted methodology without the setup carry-over. Computational tests showed that the improvement of the setup carry-over is significant in terms of the solution value with a low increase in computational time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work we study the problem of modeling identification of a population employing a discrete dynamic model based on the Richards growth model. The population is subjected to interventions due to consumption, such as hunting or farming animals. The model identification allows us to estimate the probability or the average time for a population number to reach a certain level. The parameter inference for these models are obtained with the use of the likelihood profile technique as developed in this paper. The identification method here developed can be applied to evaluate the productivity of animal husbandry or to evaluate the risk of extinction of autochthon populations. It is applied to data of the Brazilian beef cattle herd population, and the the population number to reach a certain goal level is investigated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduction: Work disability is a major consequence of rheumatoid arthritis (RA), associated not only with traditional disease activity variables, but also more significantly with demographic, functional, occupational, and societal variables. Recent reports suggest that the use of biologic agents offers potential for reduced work disability rates, but the conclusions are based on surrogate disease activity measures derived from studies primarily from Western countries. Methods: The Quantitative Standard Monitoring of Patients with RA (QUEST-RA) multinational database of 8,039 patients in 86 sites in 32 countries, 16 with high gross domestic product (GDP) (>24K US dollars (USD) per capita) and 16 low-GDP countries (<11K USD), was analyzed for work and disability status at onset and over the course of RA and clinical status of patients who continued working or had stopped working in high-GDP versus low-GDP countries according to all RA Core Data Set measures. Associations of work disability status with RA Core Data Set variables and indices were analyzed using descriptive statistics and regression analyses. Results: At the time of first symptoms, 86% of men (range 57%-100% among countries) and 64% (19%-87%) of women <65 years were working. More than one third (37%) of these patients reported subsequent work disability because of RA. Among 1,756 patients whose symptoms had begun during the 2000s, the probabilities of continuing to work were 80% (95% confidence interval (CI) 78%-82%) at 2 years and 68% (95% CI 65%-71%) at 5 years, with similar patterns in high-GDP and low-GDP countries. Patients who continued working versus stopped working had significantly better clinical status for all clinical status measures and patient self-report scores, with similar patterns in high-GDP and low-GDP countries. However, patients who had stopped working in high-GDP countries had better clinical status than patients who continued working in low-GDP countries. The most significant identifier of work disability in all subgroups was Health Assessment Questionnaire (HAQ) functional disability score. Conclusions: Work disability rates remain high among people with RA during this millennium. In low-GDP countries, people remain working with high levels of disability and disease activity. Cultural and economic differences between societies affect work disability as an outcome measure for RA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aims. An analytical solution for the discrepancy between observed core-like profiles and predicted cusp profiles in dark matter halos is studied. Methods. We calculate the distribution function for Navarro-Frenk-White halos and extract energy from the distribution, taking into account the effects of baryonic physics processes. Results. We show with a simple argument that we can reproduce the evolution of a cusp to a flat density profile by a decrease of the initial potential energy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The energy spectrum of an electron confined in a quantum dot (QD) with a three-dimensional anisotropic parabolic potential in a tilted magnetic field was found analytically. The theory describes exactly the mixing of in-plane and out-of-plane motions of an electron caused by a tilted magnetic field, which could be seen, for example, in the level anticrossing. For charged QDs in a tilted magnetic field we predict three strong resonant lines in the far-infrared-absorption spectra.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results: In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions: A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 <= q <= 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Identifying local similarity between two or more sequences, or identifying repeats occurring at least twice in a sequence, is an essential part in the analysis of biological sequences and of their phylogenetic relationship. Finding such fragments while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computationally expensive task, and consequently exact methods can usually not be applied in practice. Results: The filter TUIUIU that we introduce in this paper provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment or repeats inference method, eliminating a possibly large fraction of the input that is guaranteed not to contain any approximate repeat. It consists in the verification of several strong necessary conditions that can be checked in a fast way. We implemented three versions of the filter. The first is simply a straightforward extension to the case of multiple sequences of an application of conditions already existing in the literature. The second uses a stronger condition which, as our results show, enable to filter sensibly more with negligible (if any) additional time. The third version uses an additional condition and pushes the sensibility of the filter even further with a non negligible additional time in many circumstances; our experiments show that it is particularly useful with large error rates. The latter version was applied as a preprocessing of a multiple alignment tool, obtaining an overall time (filter plus alignment) on average 63 and at best 530 times smaller than before (direct alignment), with in most cases a better quality alignment. Conclusion: To the best of our knowledge, TUIUIU is the first filter designed for multiple repeats and for dealing with error rates greater than 10% of the repeats length.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article presents maximum likelihood estimators (MLEs) and log-likelihood ratio (LLR) tests for the eigenvalues and eigenvectors of Gaussian random symmetric matrices of arbitrary dimension, where the observations are independent repeated samples from one or two populations. These inference problems are relevant in the analysis of diffusion tensor imaging data and polarized cosmic background radiation data, where the observations are, respectively, 3 x 3 and 2 x 2 symmetric positive definite matrices. The parameter sets involved in the inference problems for eigenvalues and eigenvectors are subsets of Euclidean space that are either affine subspaces, embedded submanifolds that are invariant under orthogonal transformations or polyhedral convex cones. We show that for a class of sets that includes the ones considered in this paper, the MLEs of the mean parameter do not depend on the covariance parameters if and only if the covariance structure is orthogonally invariant. Closed-form expressions for the MLEs and the associated LLRs are derived for this covariance structure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The width of a closed convex subset of n-dimensional Euclidean space is the distance between two parallel supporting hyperplanes. The Blaschke-Lebesgue problem consists of minimizing the volume in the class of convex sets of fixed constant width and is still open in dimension n >= 3. In this paper we describe a necessary condition that the minimizer of the Blaschke-Lebesgue must satisfy in dimension n = 3: we prove that the smooth components of the boundary of the minimizer have their smaller principal curvature constant and therefore are either spherical caps or pieces of tubes (canal surfaces).