998 resultados para Dirichlet distribution


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The algebraic-geometric structure of the simplex, known as Aitchison geometry, is used to look at the Dirichlet family of distributions from a new perspective. A classical Dirichlet density function is expressed with respect to the Lebesgue measure on real space. We propose here to change this measure by the Aitchison measure on the simplex, and study some properties and characteristic measures of the resulting density

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many phase II clinical studies in oncology use two-stage frequentist design such as Simon's optimal design. However, they have a common logistical problem regarding the patient accrual at the interim. Strictly speaking, patient accrual at the end of the first stage may have to be suspended until all patients have events, success or failure. For example, when the study endpoint is six-month progression free survival, patient accrual has to be stopped until all outcomes from stage I is observed. However, study investigators may have concern when accrual is suspended after the first stage due to the loss of accrual momentum during this hiatus. We propose a two-stage phase II design that resolves the patient accrual problem due to an interim analysis, and it can be used as an alternative way to frequentist two-stage phase II studies in oncology. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Dirichlet distribution is a multivariate generalization of the Beta distribution. It is an important multivariate continuous distribution in probability and statistics. In this report, we review the Dirichlet distribution and study its properties, including statistical and information-theoretic quantities involving this distribution. Also, relationships between the Dirichlet distribution and other distributions are discussed. There are some different ways to think about generating random variables with a Dirichlet distribution. The stick-breaking approach and the Pólya urn method are discussed. In Bayesian statistics, the Dirichlet distribution and the generalized Dirichlet distribution can both be a conjugate prior for the Multinomial distribution. The Dirichlet distribution has many applications in different fields. We focus on the unsupervised learning of a finite mixture model based on the Dirichlet distribution. The Initialization Algorithm and Dirichlet Mixture Estimation Algorithm are both reviewed for estimating the parameters of a Dirichlet mixture. Three experimental results are shown for the estimation of artificial histograms, summarization of image databases and human skin detection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Dirichlet family owes its privileged status within simplex distributions to easyness of interpretation and good mathematical properties. In particular, we recall fundamental properties for the analysis of compositional data such as closure under amalgamation and subcomposition. From a probabilistic point of view, it is characterised (uniquely) by a variety of independence relationships which makes it indisputably the reference model for expressing the non trivial idea of substantial independence for compositions. Indeed, its well known inadequacy as a general model for compositional data stems from such an independence structure together with the poorness of its parametrisation. In this paper a new class of distributions (called Flexible Dirichlet) capable of handling various dependence structures and containing the Dirichlet as a special case is presented. The new model exhibits a considerably richer parametrisation which, for example, allows to model the means and (part of) the variance-covariance matrix separately. Moreover, such a model preserves some good mathematical properties of the Dirichlet, i.e. closure under amalgamation and subcomposition with new parameters simply related to the parent composition parameters. Furthermore, the joint and conditional distributions of subcompositions and relative totals can be expressed as simple mixtures of two Flexible Dirichlet distributions. The basis generating the Flexible Dirichlet, though keeping compositional invariance, shows a dependence structure which allows various forms of partitional dependence to be contemplated by the model (e.g. non-neutrality, subcompositional dependence and subcompositional non-invariance), independence cases being identified by suitable parameter configurations. In particular, within this model substantial independence among subsets of components of the composition naturally occurs when the subsets have a Dirichlet distribution

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social sciences and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this paper, we develop multilevel latent class model, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the Expectation-Maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the Obsessive Compulsive Disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for latent class analysis of multilevel data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A comment about the article “Local sensitivity analysis for compositional data with application to soil texture in hydrologic modelling” writen by L. Loosvelt and co-authors. The present comment is centered in three specific points. The first one is related to the fact that the authors avoid the use of ilr-coordinates. The second one refers to some generalization of sensitivity analysis when input parameters are compositional. The third tries to show that the role of the Dirichlet distribution in the sensitivity analysis is irrelevant

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We describe and evaluate a new estimator of the effective population size (N-e), a critical parameter in evolutionary and conservation biology. This new "SummStat" N-e. estimator is based upon the use of summary statistics in an approximate Bayesian computation framework to infer N-e. Simulations of a Wright-Fisher population with known N-e show that the SummStat estimator is useful across a realistic range of individuals and loci sampled, generations between samples, and N-e values. We also address the paucity of information about the relative performance of N-e estimators by comparing the SUMMStat estimator to two recently developed likelihood-based estimators and a traditional moment-based estimator. The SummStat estimator is the least biased of the four estimators compared. In 32 of 36 parameter combinations investigated rising initial allele frequencies drawn from a Dirichlet distribution, it has the lowest bias. The relative mean square error (RMSE) of the SummStat estimator was generally intermediate to the others. All of the estimators had RMSE > 1 when small samples (n = 20, five loci) were collected a generation apart. In contrast, when samples were separated by three or more generations and Ne less than or equal to 50, the SummStat and likelihood-based estimators all had greatly reduced RMSE. Under the conditions simulated, SummStat confidence intervals were more conservative than the likelihood-based estimators and more likely to include true N-e. The greatest strength of the SummStat estimator is its flexible structure. This flexibility allows it to incorporate any, potentially informative summary statistic from Population genetic data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The application of forecast ensembles to probabilistic weather prediction has spurred considerable interest in their evaluation. Such ensembles are commonly interpreted as Monte Carlo ensembles meaning that the ensemble members are perceived as random draws from a distribution. Under this interpretation, a reasonable property to ask for is statistical consistency, which demands that the ensemble members and the verification behave like draws from the same distribution. A widely used technique to assess statistical consistency of a historical dataset is the rank histogram, which uses as a criterion the number of times that the verification falls between pairs of members of the ordered ensemble. Ensemble evaluation is rendered more specific by stratification, which means that ensembles that satisfy a certain condition (e.g., a certain meteorological regime) are evaluated separately. Fundamental relationships between Monte Carlo ensembles, their rank histograms, and random sampling from the probability simplex according to the Dirichlet distribution are pointed out. Furthermore, the possible benefits and complications of ensemble stratification are discussed. The main conclusion is that a stratified Monte Carlo ensemble might appear inconsistent with the verification even though the original (unstratified) ensemble is consistent. The apparent inconsistency is merely a result of stratification. Stratified rank histograms are thus not necessarily flat. This result is demonstrated by perfect ensemble simulations and supplemented by mathematical arguments. Possible methods to avoid or remove artifacts that stratification induces in the rank histogram are suggested.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we develop a method, termed the Interaction Distribution (ID) method, for analysis of quantitative ecological network data. In many cases, quantitative network data sets are under-sampled, i.e. many interactions are poorly sampled or remain unobserved. Hence, the output of statistical analyses may fail to differentiate between patterns that are statistical artefacts and those which are real characteristics of ecological networks. The ID method can support assessment and inference of under-sampled ecological network data. In the current paper, we illustrate and discuss the ID method based on the properties of plant-animal pollination data sets of flower visitation frequencies. However, the ID method may be applied to other types of ecological networks. The method can supplement existing network analyses based on two definitions of the underlying probabilities for each combination of pollinator and plant species: (1), pi,j: the probability for a visit made by the i’th pollinator species to take place on the j’th plant species; (2), qi,j: the probability for a visit received by the j’th plant species to be made by the i’th pollinator. The method applies the Dirichlet distribution to estimate these two probabilities, based on a given empirical data set. The estimated mean values for pi,j and qi,j reflect the relative differences between recorded numbers of visits for different pollinator and plant species, and the estimated uncertainty of pi,j and qi,j decreases with higher numbers of recorded visits.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This project aims to explore the many methods used for the development of recommendation systems to user ’ s items and apply the content - based recommendation method on a prototype system whose purpose is to recommend books to users. This paper exposes the most popular methods for creating systems capable of providing items (products) according to user preferences, such as collaborat ive filtering and content - based. It also point different techniques that can be applied to calculate the similarity between two entities, for items or users, as the Pearson ’s method, calculating the cosine of vectors and more recently, a proposal to use a Bayesian system under a Dirichlet distribution. In addition, this work has the purpose to go through various points on the design of an online application, or a website, dealing not only oriented algorithms issues, but also the definition of development to ols and techniques to improve the user’s experience. The tools used for the development of the page are listed, and a topic about web design is also discussed in order to emphasize the importance of the layout of the application. At the end, some examples of recommender systems are presented for curiosity , learning and research purposes

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Archaeozoological mortality profiles have been used to infer site-specific subsistence strategies. There is however no common agreement on the best way to present these profiles and confidence intervals around age class proportions. In order to deal with these issues, we propose the use of the Dirichlet distribution and present a new approach to perform age-at-death multivariate graphical comparisons. We demonstrate the efficiency of this approach using domestic sheep/goat dental remains from 10 Cardial sites (Early Neolithic) located in South France and the Iberian Peninsula. We show that the Dirichlet distribution in age-at-death analysis can be used: (i) to generate Bayesian credible intervals around each age class of a mortality profile, even when not all age classes are observed; and (ii) to create 95% kernel density contours around each age-at-death frequency distribution when multiple sites are compared using correspondence analysis. The statistical procedure we present is applicable to the analysis of any categorical count data and particularly well-suited to archaeological data (e.g. potsherds, arrow heads) where sample sizes are typically small.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Le sujet principal de cette thèse est la distribution des nombres premiers dans les progressions arithmétiques, c'est-à-dire des nombres premiers de la forme $qn+a$, avec $a$ et $q$ des entiers fixés et $n=1,2,3,\dots$ La thèse porte aussi sur la comparaison de différentes suites arithmétiques par rapport à leur comportement dans les progressions arithmétiques. Elle est divisée en quatre chapitres et contient trois articles. Le premier chapitre est une invitation à la théorie analytique des nombres, suivie d'une revue des outils qui seront utilisés plus tard. Cette introduction comporte aussi certains résultats de recherche, que nous avons cru bon d'inclure au fil du texte. Le deuxième chapitre contient l'article \emph{Inequities in the Shanks-Rényi prime number race: an asymptotic formula for the densities}, qui est le fruit de recherche conjointe avec le professeur Greg Martin. Le but de cet article est d'étudier un phénomène appelé le <>, qui s'observe dans les <>. Chebyshev a observé qu'il semble y avoir plus de premiers de la forme $4n+3$ que de la forme $4n+1$. De manière plus générale, Rubinstein et Sarnak ont montré l'existence d'une quantité $\delta(q;a,b)$, qui désigne la probabilité d'avoir plus de premiers de la forme $qn+a$ que de la forme $qn+b$. Dans cet article nous prouvons une formule asymptotique pour $\delta(q;a,b)$ qui peut être d'un ordre de précision arbitraire (en terme de puissance négative de $q$). Nous présentons aussi des résultats numériques qui supportent nos formules. Le troisième chapitre contient l'article \emph{Residue classes containing an unexpected number of primes}. Le but est de fixer un entier $a\neq 0$ et ensuite d'étudier la répartition des premiers de la forme $qn+a$, en moyenne sur $q$. Nous montrons que l'entier $a$ fixé au départ a une grande influence sur cette répartition, et qu'il existe en fait certaines progressions arithmétiques contenant moins de premiers que d'autres. Ce phénomène est plutôt surprenant, compte tenu du théorème des premiers dans les progressions arithmétiques qui stipule que les premiers sont équidistribués dans les classes d'équivalence $\bmod q$. Le quatrième chapitre contient l'article \emph{The influence of the first term of an arithmetic progression}. Dans cet article on s'intéresse à des irrégularités similaires à celles observées au troisième chapitre, mais pour des suites arithmétiques plus générales. En effet, nous étudions des suites telles que les entiers s'exprimant comme la somme de deux carrés, les valeurs d'une forme quadratique binaire, les $k$-tuplets de premiers et les entiers sans petit facteur premier. Nous démontrons que dans chacun de ces exemples, ainsi que dans une grande classe de suites arithmétiques, il existe des irrégularités dans les progressions arithmétiques $a\bmod q$, avec $a$ fixé et en moyenne sur $q$.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We solve two inverse spectral problems for star graphs of Stieltjes strings with Dirichlet and Neumann boundary conditions, respectively, at a selected vertex called root. The root is either the central vertex or, in the more challenging problem, a pendant vertex of the star graph. At all other pendant vertices Dirichlet conditions are imposed; at the central vertex, at which a mass may be placed, continuity and Kirchhoff conditions are assumed. We derive conditions on two sets of real numbers to be the spectra of the above Dirichlet and Neumann problems. Our solution for the inverse problems is constructive: we establish algorithms to recover the mass distribution on the star graph (i.e. the point masses and lengths of subintervals between them) from these two spectra and from the lengths of the separate strings. If the root is a pendant vertex, the two spectra uniquely determine the parameters on the main string (i.e. the string incident to the root) if the length of the main string is known. The mass distribution on the other edges need not be unique; the reason for this is the non-uniqueness caused by the non-strict interlacing of the given data in the case when the root is the central vertex. Finally, we relate of our results to tree-patterned matrix inverse problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Dirichlet process mixture model (DPMM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPMM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. For example, they would not be practical for digital signal processing on embedded hardware, where computational resources are at a serious premium. Here, we develop a simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithm for DPMMs. This algorithm is as simple as DP-means clustering, solves the MAP problem as well as Gibbs sampling, while requiring only a fraction of the computational effort. (For freely available code that implements the MAP-DP algorithm for Gaussian mixtures see http://www.maxlittle.net/.) Unlike related small variance asymptotics (SVA), our method is non-degenerate and so inherits the “rich get richer” property of the Dirichlet process. It also retains a non-degenerate closed-form likelihood which enables out-of-sample calculations and the use of standard tools such as cross-validation. We illustrate the benefits of our algorithm on a range of examples and contrast it to variational, SVA and sampling approaches from both a computational complexity perspective as well as in terms of clustering performance. We demonstrate the wide applicabiity of our approach by presenting an approximate MAP inference method for the infinite hidden Markov model whose performance contrasts favorably with a recently proposed hybrid SVA approach. Similarly, we show how our algorithm can applied to a semiparametric mixed-effects regression model where the random effects distribution is modelled using an infinite mixture model, as used in longitudinal progression modelling in population health science. Finally, we propose directions for future research on approximate MAP inference in Bayesian nonparametrics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although various abutment connections and materials have recently been introduced, insufficient data exist regarding the effect of stress distribution on their mechanical performance. The purpose of this study was to investigate the effect of different abutment materials and platform connections on stress distribution in single anterior implant-supported restorations with the finite element method. Nine experimental groups were modeled from the combination of 3 platform connections (external hexagon, internal hexagon, and Morse tapered) and 3 abutment materials (titanium, zirconia, and hybrid) as follows: external hexagon-titanium, external hexagon-zirconia, external hexagon-hybrid, internal hexagon-titanium, internal hexagon-zirconia, internal hexagon-hybrid, Morse tapered-titanium, Morse tapered-zirconia, and Morse tapered-hybrid. Finite element models consisted of a 4×13-mm implant, anatomic abutment, and lithium disilicate central incisor crown cemented over the abutment. The 49 N occlusal loading was applied in 6 steps to simulate the incisal guidance. Equivalent von Mises stress (σvM) was used for both the qualitative and quantitative evaluation of the implant and abutment in all the groups and the maximum (σmax) and minimum (σmin) principal stresses for the numerical comparison of the zirconia parts. The highest abutment σvM occurred in the Morse-tapered groups and the lowest in the external hexagon-hybrid, internal hexagon-titanium, and internal hexagon-hybrid groups. The σmax and σmin values were lower in the hybrid groups than in the zirconia groups. The stress distribution concentrated in the abutment-implant interface in all the groups, regardless of the platform connection or abutment material. The platform connection influenced the stress on abutments more than the abutment material. The stress values for implants were similar among different platform connections, but greater stress concentrations were observed in internal connections.