578 resultados para Dirichlet-multinomial
Resumo:
Testing for differences within data sets is an important issue across various applications. Our work is primarily motivated by the analysis of microbiomial composition, which has been increasingly relevant and important with the rise of DNA sequencing. We first review classical frequentist tests that are commonly used in tackling such problems. We then propose a Bayesian Dirichlet-multinomial framework for modeling the metagenomic data and for testing underlying differences between the samples. A parametric Dirichlet-multinomial model uses an intuitive hierarchical structure that allows for flexibility in characterizing both the within-group variation and the cross-group difference and provides very interpretable parameters. A computational method for evaluating the marginal likelihoods under the null and alternative hypotheses is also given. Through simulations, we show that our Bayesian model performs competitively against frequentist counterparts. We illustrate the method through analyzing metagenomic applications using the Human Microbiome Project data.
Resumo:
Abstract Traditionally, the common reserving methods used by the non-life actuaries are based on the assumption that future claims are going to behave in the same way as they did in the past. There are two main sources of variability in the processus of development of the claims: the variability of the speed with which the claims are settled and the variability between the severity of the claims from different accident years. High changes in these processes will generate distortions in the estimation of the claims reserves. The main objective of this thesis is to provide an indicator which firstly identifies and quantifies these two influences and secondly to determine which model is adequate for a specific situation. Two stochastic models were analysed and the predictive distributions of the future claims were obtained. The main advantage of the stochastic models is that they provide measures of variability of the reserves estimates. The first model (PDM) combines one conjugate family Dirichlet - Multinomial with the Poisson distribution. The second model (NBDM) improves the first one by combining two conjugate families Poisson -Gamma (for distribution of the ultimate amounts) and Dirichlet Multinomial (for distribution of the incremental claims payments). It was found that the second model allows to find the speed variability in the reporting process and development of the claims severity as function of two above mentioned distributions' parameters. These are the shape parameter of the Gamma distribution and the Dirichlet parameter. Depending on the relation between them we can decide on the adequacy of the claims reserve estimation method. The parameters have been estimated by the Methods of Moments and Maximum Likelihood. The results were tested using chosen simulation data and then using real data originating from the three lines of business: Property/Casualty, General Liability, and Accident Insurance. These data include different developments and specificities. The outcome of the thesis shows that when the Dirichlet parameter is greater than the shape parameter of the Gamma, resulting in a model with positive correlation between the past and future claims payments, suggests the Chain-Ladder method as appropriate for the claims reserve estimation. In terms of claims reserves, if the cumulated payments are high the positive correlation will imply high expectations for the future payments resulting in high claims reserves estimates. The negative correlation appears when the Dirichlet parameter is lower than the shape parameter of the Gamma, meaning low expected future payments for the same high observed cumulated payments. This corresponds to the situation when claims are reported rapidly and fewer claims remain expected subsequently. The extreme case appears in the situation when all claims are reported at the same time leading to expectations for the future payments of zero or equal to the aggregated amount of the ultimate paid claims. For this latter case, the Chain-Ladder is not recommended.
Resumo:
The Dirichlet distribution is a multivariate generalization of the Beta distribution. It is an important multivariate continuous distribution in probability and statistics. In this report, we review the Dirichlet distribution and study its properties, including statistical and information-theoretic quantities involving this distribution. Also, relationships between the Dirichlet distribution and other distributions are discussed. There are some different ways to think about generating random variables with a Dirichlet distribution. The stick-breaking approach and the Pólya urn method are discussed. In Bayesian statistics, the Dirichlet distribution and the generalized Dirichlet distribution can both be a conjugate prior for the Multinomial distribution. The Dirichlet distribution has many applications in different fields. We focus on the unsupervised learning of a finite mixture model based on the Dirichlet distribution. The Initialization Algorithm and Dirichlet Mixture Estimation Algorithm are both reviewed for estimating the parameters of a Dirichlet mixture. Three experimental results are shown for the estimation of artificial histograms, summarization of image databases and human skin detection.
Resumo:
This paper is an elaboration of the DECA algorithm [1] to blindly unmix hyperspectral data. The underlying mixing model is linear, meaning that each pixel is a linear mixture of the endmembers signatures weighted by the correspondent abundance fractions. The proposed method, as DECA, is tailored to highly mixed mixtures in which the geometric based approaches fail to identify the simplex of minimum volume enclosing the observed spectral vectors. We resort then to a statitistical framework, where the abundance fractions are modeled as mixtures of Dirichlet densities, thus enforcing the constraints on abundance fractions imposed by the acquisition process, namely non-negativity and constant sum. With respect to DECA, we introduce two improvements: 1) the number of Dirichlet modes are inferred based on the minimum description length (MDL) principle; 2) The generalized expectation maximization (GEM) algorithm we adopt to infer the model parameters is improved by using alternating minimization and augmented Lagrangian methods to compute the mixing matrix. The effectiveness of the proposed algorithm is illustrated with simulated and read data.
Resumo:
We discuss existence and multiplicity of positive solutions of the Dirichlet problem for the quasilinear ordinary differential equation-(u' / root 1 - u'(2))' = f(t, u). Depending on the behaviour of f = f(t, s) near s = 0, we prove the existence of either one, or two, or three, or infinitely many positive solutions. In general, the positivity of f is not required. All results are obtained by reduction to an equivalent non-singular problem to which variational or topological methods apply in a classical fashion.
Resumo:
We study the existence and multiplicity of positive radial solutions of the Dirichlet problem for the Minkowski-curvature equation { -div(del upsilon/root 1-vertical bar del upsilon vertical bar(2)) in B-R, upsilon=0 on partial derivative B-R,B- where B-R is a ball in R-N (N >= 2). According to the behaviour off = f (r, s) near s = 0, we prove the existence of either one, two or three positive solutions. All results are obtained by reduction to an equivalent non-singular one-dimensional problem, to which variational methods can be applied in a standard way.
Resumo:
This paper introduces a new unsupervised hyperspectral unmixing method conceived to linear but highly mixed hyperspectral data sets, in which the simplex of minimum volume, usually estimated by the purely geometrically based algorithms, is far way from the true simplex associated with the endmembers. The proposed method, an extension of our previous studies, resorts to the statistical framework. The abundance fraction prior is a mixture of Dirichlet densities, thus automatically enforcing the constraints on the abundance fractions imposed by the acquisition process, namely, nonnegativity and sum-to-one. A cyclic minimization algorithm is developed where the following are observed: 1) The number of Dirichlet modes is inferred based on the minimum description length principle; 2) a generalized expectation maximization algorithm is derived to infer the model parameters; and 3) a sequence of augmented Lagrangian-based optimizations is used to compute the signatures of the endmembers. Experiments on simulated and real data are presented to show the effectiveness of the proposed algorithm in unmixing problems beyond the reach of the geometrically based state-of-the-art competitors.
Resumo:
telligence applications for the banking industry. Searches were performed in relevant journals resulting in 219 articles published between 2002 and 2013. To analyze such a large number of manuscripts, text mining techniques were used in pursuit for relevant terms on both business intelligence and banking domains. Moreover, the latent Dirichlet allocation modeling was used in or- der to group articles in several relevant topics. The analysis was conducted using a dictionary of terms belonging to both banking and business intelli- gence domains. Such procedure allowed for the identification of relationships between terms and topics grouping articles, enabling to emerge hypotheses regarding research directions. To confirm such hypotheses, relevant articles were collected and scrutinized, allowing to validate the text mining proce- dure. The results show that credit in banking is clearly the main application trend, particularly predicting risk and thus supporting credit approval or de- nial. There is also a relevant interest in bankruptcy and fraud prediction. Customer retention seems to be associated, although weakly, with targeting, justifying bank offers to reduce churn. In addition, a large number of ar- ticles focused more on business intelligence techniques and its applications, using the banking industry just for evaluation, thus, not clearly acclaiming for benefits in the banking business. By identifying these current research topics, this study also highlights opportunities for future research.
Resumo:
Magdeburg, Univ., Fak. für Mathematik, Diss., 2015
Resumo:
"Vegeu el resum a l'inici del document del fitxer adjunt."
Resumo:
We prove existence theorems for the Dirichlet problem for hypersurfaces of constant special Lagrangian curvature in Hadamard manifolds. The first results are obtained using the continuity method and approximation and then refined using two iterations of the Perron method. The a-priori estimates used in the continuity method are valid in any ambient manifold.
Resumo:
"Vegeu el resum a l'inici del document del fitxer adjunt."
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models
Resumo:
The authors focus on one of the methods for connection acceptance control (CAC) in an ATM network: the convolution approach. With the aim of reducing the cost in terms of calculation and storage requirements, they propose the use of the multinomial distribution function. This permits direct computation of the associated probabilities of the instantaneous bandwidth requirements. This in turn makes possible a simple deconvolution process. Moreover, under certain conditions additional improvements may be achieved