13 resultados para Probabilistic latent semantic model
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
We propose a new general Bayesian latent class model for evaluation of the performance of multiple diagnostic tests in situations in which no gold standard test exists based on a computationally intensive approach. The modeling represents an interesting and suitable alternative to models with complex structures that involve the general case of several conditionally independent diagnostic tests, covariates, and strata with different disease prevalences. The technique of stratifying the population according to different disease prevalence rates does not add further marked complexity to the modeling, but it makes the model more flexible and interpretable. To illustrate the general model proposed, we evaluate the performance of six diagnostic screening tests for Chagas disease considering some epidemiological variables. Serology at the time of donation (negative, positive, inconclusive) was considered as a factor of stratification in the model. The general model with stratification of the population performed better in comparison with its concurrents without stratification. The group formed by the testing laboratory Biomanguinhos FIOCRUZ-kit (c-ELISA and rec-ELISA) is the best option in the confirmation process by presenting false-negative rate of 0.0002% from the serial scheme. We are 100% sure that the donor is healthy when these two tests have negative results and he is chagasic when they have positive results.
Resumo:
OBJECTIVE: The frequent occurrence of inconclusive serology in blood banks and the absence of a gold standard test for Chagas'disease led us to examine the efficacy of the blood culture test and five commercial tests (ELISA, IIF, HAI, c-ELISA, rec-ELISA) used in screening blood donors for Chagas disease, as well as to investigate the prevalence of Trypanosoma cruzi infection among donors with inconclusive serology screening in respect to some epidemiological variables. METHODS: To obtain estimates of interest we considered a Bayesian latent class model with inclusion of covariates from the logit link. RESULTS: A better performance was observed with some categories of epidemiological variables. In addition, all pairs of tests (excluding the blood culture test) presented as good alternatives for both screening (sensitivity > 99.96% in parallel testing) and for confirmation (specificity > 99.93% in serial testing) of Chagas disease. The prevalence of 13.30% observed in the stratum of donors with inconclusive serology, means that probably most of these are non-reactive serology. In addition, depending on the level of specific epidemiological variables, the absence of infection can be predicted with a probability of 100% in this group from the pairs of tests using parallel testing. CONCLUSION: The epidemiological variables can lead to improved test results and thus assist in the clarification of inconclusive serology screening results. Moreover, all combinations of pairs using the five commercial tests are good alternatives to confirm results.
Resumo:
In this paper, a new family of survival distributions is presented. It is derived by considering that the latent number of failure causes follows a Poisson distribution and the time for these causes to be activated follows an exponential distribution. Three different activation schemes are also considered. Moreover, we propose the inclusion of covariates in the model formulation in order to study their effect on the expected value of the number of causes and on the failure rate function. Inferential procedure based on the maximum likelihood method is discussed and evaluated via simulation. The developed methodology is illustrated on a real data set on ovarian cancer.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.
Resumo:
This paper addresses the numerical solution of random crack propagation problems using the coupling boundary element method (BEM) and reliability algorithms. Crack propagation phenomenon is efficiently modelled using BEM, due to its mesh reduction features. The BEM model is based on the dual BEM formulation, in which singular and hyper-singular integral equations are adopted to construct the system of algebraic equations. Two reliability algorithms are coupled with BEM model. The first is the well known response surface method, in which local, adaptive polynomial approximations of the mechanical response are constructed in search of the design point. Different experiment designs and adaptive schemes are considered. The alternative approach direct coupling, in which the limit state function remains implicit and its gradients are calculated directly from the numerical mechanical response, is also considered. The performance of both coupling methods is compared in application to some crack propagation problems. The investigation shows that direct coupling scheme converged for all problems studied, irrespective of the problem nonlinearity. The computational cost of direct coupling has shown to be a fraction of the cost of response surface solutions, regardless of experiment design or adaptive scheme considered. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Model diagnostics is an integral part of model determination and an important part of the model diagnostics is residual analysis. We adapt and implement residuals considered in the literature for the probit, logistic and skew-probit links under binary regression. New latent residuals for the skew-probit link are proposed here. We have detected the presence of outliers using the residuals proposed here for different models in a simulated dataset and a real medical dataset.
Resumo:
Scientists predict that global agricultural lands will expand over the next few decades due to increasing demands for food production and an exponential increase in crop-based biofuel production. These changes in land use will greatly impact biogeochemical and biogeophysical cycles across the globe. It is therefore important to develop models that can accurately simulate the interactions between the atmosphere and important crops. In this study, we develop and validate a new process-based sugarcane model (included as a module within the Agro-IBIS dynamic agro-ecosystem model) which can be applied at multiple spatial scales. At site level, the model systematically under/overestimated the daily sensible/latent heat flux (by -10.5% and 14.8%, H and E, respectively) when compared against the micrometeorological observations from southeast Brazil. The model underestimated ET (relative bias between -10.1% and 12.5%) when compared against an agro-meteorological field experiment from northeast Australia. At the regional level, the model accurately simulated average yield for the four largest mesoregions (clusters of municipalities) in the state of Sao Paulo, Brazil, over a period of 16 years, with a yield relative bias of -0.68% to 1.08%. Finally, the simulated annual average sugarcane yield over 31 years for the state of Louisiana (US) had a low relative bias (-2.67%), but exhibited a lower interannual variability than the observed yields.
Resumo:
We study a probabilistic model of interacting spins indexed by elements of a finite subset of the d-dimensional integer lattice, da parts per thousand yen1. Conditions of time reversibility are examined. It is shown that the model equilibrium distribution converges to a limit distribution as the indexing set expands to the whole lattice. The occupied site percolation problem is solved for the limit distribution. Two models with similar dynamics are also discussed.
Resumo:
Fraud is a global problem that has required more attention due to an accentuated expansion of modern technology and communication. When statistical techniques are used to detect fraud, whether a fraud detection model is accurate enough in order to provide correct classification of the case as a fraudulent or legitimate is a critical factor. In this context, the concept of bootstrap aggregating (bagging) arises. The basic idea is to generate multiple classifiers by obtaining the predicted values from the adjusted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy. In this paper, for the first time, we aim to present a pioneer study of the performance of the discrete and continuous k-dependence probabilistic networks within the context of bagging predictors classification. Via a large simulation study and various real datasets, we discovered that the probabilistic networks are a strong modeling option with high predictive capacity and with a high increment using the bagging procedure when compared to traditional techniques. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.
A Robust Structural PGN Model for Control of Cell-Cycle Progression Stabilized by Negative Feedbacks
Resumo:
The cell division cycle comprises a sequence of phenomena controlled by a stable and robust genetic network. We applied a probabilistic genetic network (PGN) to construct a hypothetical model with a dynamical behavior displaying the degree of robustness typical of the biological cell cycle. The structure of our PGN model was inspired in well-established biological facts such as the existence of integrator subsystems, negative and positive feedback loops, and redundant signaling pathways. Our model represents genes interactions as stochastic processes and presents strong robustness in the presence of moderate noise and parameters fluctuations. A recently published deterministic yeast cell-cycle model does not perform as well as our PGN model, even upon moderate noise conditions. In addition, self stimulatory mechanisms can give our PGN model the possibility of having a pacemaker activity similar to the observed in the oscillatory embryonic cell cycle.
Resumo:
Semi-qualitative probabilistic networks (SQPNs) merge two important graphical model formalisms: Bayesian networks and qualitative probabilistic networks. They provade a very Complexity of inferences in polytree-shaped semi-qualitative probabilistic networks and qualitative probabilistic networks. They provide a very general modeling framework by allowing the combination of numeric and qualitative assessments over a discrete domain, and can be compactly encoded by exploiting the same factorization of joint probability distributions that are behind the bayesian networks. This paper explores the computational complexity of semi-qualitative probabilistic networks, and takes the polytree-shaped networks as its main target. We show that the inference problem is coNP-Complete for binary polytrees with multiple observed nodes. We also show that interferences can be performed in time linear in the number of nodes if there is a single observed node. Because our proof is construtive, we obtain an efficient linear time algorithm for SQPNs under such assumptions. To the best of our knowledge, this is the first exact polynominal-time algorithm for SQPn. Together these results provide a clear picture of the inferential complexity in polytree-shaped SQPNs.
Resumo:
Due to the growing interest in social networks, link prediction has received significant attention. Link prediction is mostly based on graph-based features, with some recent approaches focusing on domain semantics. We propose algorithms for link prediction that use a probabilistic ontology to enhance the analysis of the domain and the unavoidable uncertainty in the task (the ontology is specified in the probabilistic description logic crALC). The scalability of the approach is investigated, through a combination of semantic assumptions and graph-based features. We evaluate empirically our proposal, and compare it with standard solutions in the literature.