6 resultados para Probabilistic choice models

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Patterns of species interactions affect the dynamics of food webs. An important component of species interactions that is rarely considered with respect to food webs is the strengths of interactions, which may affect both structure and dynamics. In natural systems, these strengths are variable, and can be quantified as probability distributions. We examined how variation in strengths of interactions can be described hierarchically, and how this variation impacts the structure of species interactions in predator-prey networks, both of which are important components of ecological food webs. The stable isotope ratios of predator and prey species may be particularly useful for quantifying this variability, and we show how these data can be used to build probabilistic predator-prey networks. Moreover, the distribution of variation in strengths among interactions can be estimated from a limited number of observations. This distribution informs network structure, especially the key role of dietary specialization, which may be useful for predicting structural properties in systems that are difficult to observe. Finally, using three mammalian predator-prey networks ( two African and one Canadian) quantified from stable isotope data, we show that exclusion of link-strength variability results in biased estimates of nestedness and modularity within food webs, whereas the inclusion of body size constraints only marginally increases the predictive accuracy of the isotope-based network. We find that modularity is the consequence of strong link-strengths in both African systems, while nestedness is not significantly present in any of the three predator-prey networks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The choice of an appropriate family of linear models for the analysis of longitudinal data is often a matter of concern for practitioners. To attenuate such difficulties, we discuss some issues that emerge when analyzing this type of data via a practical example involving pretestposttest longitudinal data. In particular, we consider log-normal linear mixed models (LNLMM), generalized linear mixed models (GLMM), and models based on generalized estimating equations (GEE). We show how some special features of the data, like a nonconstant coefficient of variation, may be handled in the three approaches and evaluate their performance with respect to the magnitude of standard errors of interpretable and comparable parameters. We also show how different diagnostic tools may be employed to identify outliers and comment on available software. We conclude by noting that the results are similar, but that GEE-based models may be preferable when the goal is to compare the marginal expected responses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fraud is a global problem that has required more attention due to an accentuated expansion of modern technology and communication. When statistical techniques are used to detect fraud, whether a fraud detection model is accurate enough in order to provide correct classification of the case as a fraudulent or legitimate is a critical factor. In this context, the concept of bootstrap aggregating (bagging) arises. The basic idea is to generate multiple classifiers by obtaining the predicted values from the adjusted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy. In this paper, for the first time, we aim to present a pioneer study of the performance of the discrete and continuous k-dependence probabilistic networks within the context of bagging predictors classification. Via a large simulation study and various real datasets, we discovered that the probabilistic networks are a strong modeling option with high predictive capacity and with a high increment using the bagging procedure when compared to traditional techniques. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation. Results For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results. Conclusions Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Structural durability is an important criterion that must be evaluated for every type of structure. Concerning reinforced concrete members, chloride diffusion process is widely used to evaluate durability, especially when these structures are constructed in aggressive atmospheres. The chloride ingress triggers the corrosion of reinforcements; therefore, by modelling this phenomenon, the corrosion process can be better evaluated as well as the structural durability. The corrosion begins when a threshold level of chloride concentration is reached at the steel bars of reinforcements. Despite the robustness of several models proposed in literature, deterministic approaches fail to predict accurately the corrosion time initiation due the inherent randomness observed in this process. In this regard, structural durability can be more realistically represented using probabilistic approaches. This paper addresses the analyses of probabilistic corrosion time initiation in reinforced concrete structures exposed to chloride penetration. The chloride penetration is modelled using the Fick's diffusion law. This law simulates the chloride diffusion process considering time-dependent effects. The probability of failure is calculated using Monte Carlo simulation and the first order reliability method, with a direct coupling approach. Some examples are considered in order to study these phenomena. Moreover, a simplified method is proposed to determine optimal values for concrete cover.