997 resultados para multinomial distribution


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The authors focus on one of the methods for connection acceptance control (CAC) in an ATM network: the convolution approach. With the aim of reducing the cost in terms of calculation and storage requirements, they propose the use of the multinomial distribution function. This permits direct computation of the associated probabilities of the instantaneous bandwidth requirements. This in turn makes possible a simple deconvolution process. Moreover, under certain conditions additional improvements may be achieved

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The authors focus on one of the methods for connection acceptance control (CAC) in an ATM network: the convolution approach. With the aim of reducing the cost in terms of calculation and storage requirements, they propose the use of the multinomial distribution function. This permits direct computation of the associated probabilities of the instantaneous bandwidth requirements. This in turn makes possible a simple deconvolution process. Moreover, under certain conditions additional improvements may be achieved

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A novel test of spatial independence of the distribution of crystals or phases in rocksbased on compositional statistics is introduced. It improves and generalizes the commonjoins-count statistics known from map analysis in geographic information systems.Assigning phases independently to objects in RD is modelled by a single-trial multinomialrandom function Z(x), where the probabilities of phases add to one and areexplicitly modelled as compositions in the K-part simplex SK. Thus, apparent inconsistenciesof the tests based on the conventional joins{count statistics and their possiblycontradictory interpretations are avoided. In practical applications we assume that theprobabilities of phases do not depend on the location but are identical everywhere inthe domain of de nition. Thus, the model involves the sum of r independent identicalmultinomial distributed 1-trial random variables which is an r-trial multinomialdistributed random variable. The probabilities of the distribution of the r counts canbe considered as a composition in the Q-part simplex SQ. They span the so calledHardy-Weinberg manifold H that is proved to be a K-1-affine subspace of SQ. This isa generalisation of the well-known Hardy-Weinberg law of genetics. If the assignmentof phases accounts for some kind of spatial dependence, then the r-trial probabilitiesdo not remain on H. This suggests the use of the Aitchison distance between observedprobabilities to H to test dependence. Moreover, when there is a spatial uctuation ofthe multinomial probabilities, the observed r-trial probabilities move on H. This shiftcan be used as to check for these uctuations. A practical procedure and an algorithmto perform the test have been developed. Some cases applied to simulated and realdata are presented.Key words: Spatial distribution of crystals in rocks, spatial distribution of phases,joins-count statistics, multinomial distribution, Hardy-Weinberg law, Hardy-Weinbergmanifold, Aitchison geometry

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A novel test of spatial independence of the distribution of crystals or phases in rocks based on compositional statistics is introduced. It improves and generalizes the common joins-count statistics known from map analysis in geographic information systems. Assigning phases independently to objects in RD is modelled by a single-trial multinomial random function Z(x), where the probabilities of phases add to one and are explicitly modelled as compositions in the K-part simplex SK. Thus, apparent inconsistencies of the tests based on the conventional joins{count statistics and their possibly contradictory interpretations are avoided. In practical applications we assume that the probabilities of phases do not depend on the location but are identical everywhere in the domain of de nition. Thus, the model involves the sum of r independent identical multinomial distributed 1-trial random variables which is an r-trial multinomial distributed random variable. The probabilities of the distribution of the r counts can be considered as a composition in the Q-part simplex SQ. They span the so called Hardy-Weinberg manifold H that is proved to be a K-1-affine subspace of SQ. This is a generalisation of the well-known Hardy-Weinberg law of genetics. If the assignment of phases accounts for some kind of spatial dependence, then the r-trial probabilities do not remain on H. This suggests the use of the Aitchison distance between observed probabilities to H to test dependence. Moreover, when there is a spatial uctuation of the multinomial probabilities, the observed r-trial probabilities move on H. This shift can be used as to check for these uctuations. A practical procedure and an algorithm to perform the test have been developed. Some cases applied to simulated and real data are presented. Key words: Spatial distribution of crystals in rocks, spatial distribution of phases, joins-count statistics, multinomial distribution, Hardy-Weinberg law, Hardy-Weinberg manifold, Aitchison geometry

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Dirichlet distribution is a multivariate generalization of the Beta distribution. It is an important multivariate continuous distribution in probability and statistics. In this report, we review the Dirichlet distribution and study its properties, including statistical and information-theoretic quantities involving this distribution. Also, relationships between the Dirichlet distribution and other distributions are discussed. There are some different ways to think about generating random variables with a Dirichlet distribution. The stick-breaking approach and the Pólya urn method are discussed. In Bayesian statistics, the Dirichlet distribution and the generalized Dirichlet distribution can both be a conjugate prior for the Multinomial distribution. The Dirichlet distribution has many applications in different fields. We focus on the unsupervised learning of a finite mixture model based on the Dirichlet distribution. The Initialization Algorithm and Dirichlet Mixture Estimation Algorithm are both reviewed for estimating the parameters of a Dirichlet mixture. Three experimental results are shown for the estimation of artificial histograms, summarization of image databases and human skin detection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The quantitative estimation of Sea Surface Temperatures from fossils assemblages is afundamental issue in palaeoclimatic and paleooceanographic investigations. TheModern Analogue Technique, a widely adopted method based on direct comparison offossil assemblages with modern coretop samples, was revised with the aim ofconforming it to compositional data analysis. The new CODAMAT method wasdeveloped by adopting the Aitchison metric as distance measure. Modern coretopdatasets are characterised by a large amount of zeros. The zero replacement was carriedout by adopting a Bayesian approach to the zero replacement, based on a posteriorestimation of the parameter of the multinomial distribution. The number of modernanalogues from which reconstructing the SST was determined by means of a multipleapproach by considering the Proxies correlation matrix, Standardized Residual Sum ofSquares and Mean Squared Distance. This new CODAMAT method was applied to theplanktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea.Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix,Standardized Residual Sum of Squares

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper focuses on one of the methods for bandwidth allocation in an ATM network: the convolution approach. The convolution approach permits an accurate study of the system load in statistical terms by accumulated calculations, since probabilistic results of the bandwidth allocation can be obtained. Nevertheless, the convolution approach has a high cost in terms of calculation and storage requirements. This aspect makes real-time calculations difficult, so many authors do not consider this approach. With the aim of reducing the cost we propose to use the multinomial distribution function: the enhanced convolution approach (ECA). This permits direct computation of the associated probabilities of the instantaneous bandwidth requirements and makes a simple deconvolution process possible. The ECA is used in connection acceptance control, and some results are presented

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVE: To better understand the structure of the Patient Assessment of Chronic Illness Care (PACIC) instrument. More specifically to test all published validation models, using one single data set and appropriate statistical tools. DESIGN: Validation study using data from cross-sectional survey. PARTICIPANTS: A population-based sample of non-institutionalized adults with diabetes residing in Switzerland (canton of Vaud). MAIN OUTCOME MEASURE: French version of the 20-items PACIC instrument (5-point response scale). We conducted validation analyses using confirmatory factor analysis (CFA). The original five-dimension model and other published models were tested with three types of CFA: based on (i) a Pearson estimator of variance-covariance matrix, (ii) a polychoric correlation matrix and (iii) a likelihood estimation with a multinomial distribution for the manifest variables. All models were assessed using loadings and goodness-of-fit measures. RESULTS: The analytical sample included 406 patients. Mean age was 64.4 years and 59% were men. Median of item responses varied between 1 and 4 (range 1-5), and range of missing values was between 5.7 and 12.3%. Strong floor and ceiling effects were present. Even though loadings of the tested models were relatively high, the only model showing acceptable fit was the 11-item single-dimension model. PACIC was associated with the expected variables of the field. CONCLUSIONS: Our results showed that the model considering 11 items in a single dimension exhibited the best fit for our data. A single score, in complement to the consideration of single-item results, might be used instead of the five dimensions usually described.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L'objectif de cette thèse est de présenter différentes applications du programme de recherche de calcul conditionnel distribué. On espère que ces applications, ainsi que la théorie présentée ici, mènera à une solution générale du problème d'intelligence artificielle, en particulier en ce qui a trait à la nécessité d'efficience. La vision du calcul conditionnel distribué consiste à accélérer l'évaluation et l'entraînement de modèles profonds, ce qui est très différent de l'objectif usuel d'améliorer sa capacité de généralisation et d'optimisation. Le travail présenté ici a des liens étroits avec les modèles de type mélange d'experts. Dans le chapitre 2, nous présentons un nouvel algorithme d'apprentissage profond qui utilise une forme simple d'apprentissage par renforcement sur un modèle d'arbre de décisions à base de réseau de neurones. Nous démontrons la nécessité d'une contrainte d'équilibre pour maintenir la distribution d'exemples aux experts uniforme et empêcher les monopoles. Pour rendre le calcul efficient, l'entrainement et l'évaluation sont contraints à être éparse en utilisant un routeur échantillonnant des experts d'une distribution multinomiale étant donné un exemple. Dans le chapitre 3, nous présentons un nouveau modèle profond constitué d'une représentation éparse divisée en segments d'experts. Un modèle de langue à base de réseau de neurones est construit à partir des transformations éparses entre ces segments. L'opération éparse par bloc est implémentée pour utilisation sur des cartes graphiques. Sa vitesse est comparée à deux opérations denses du même calibre pour démontrer le gain réel de calcul qui peut être obtenu. Un modèle profond utilisant des opérations éparses contrôlées par un routeur distinct des experts est entraîné sur un ensemble de données d'un milliard de mots. Un nouvel algorithme de partitionnement de données est appliqué sur un ensemble de mots pour hiérarchiser la couche de sortie d'un modèle de langage, la rendant ainsi beaucoup plus efficiente. Le travail présenté dans cette thèse est au centre de la vision de calcul conditionnel distribué émis par Yoshua Bengio. Elle tente d'appliquer la recherche dans le domaine des mélanges d'experts aux modèles profonds pour améliorer leur vitesse ainsi que leur capacité d'optimisation. Nous croyons que la théorie et les expériences de cette thèse sont une étape importante sur la voie du calcul conditionnel distribué car elle cadre bien le problème, surtout en ce qui concerne la compétitivité des systèmes d'experts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper focuses on one of the methods for bandwidth allocation in an ATM network: the convolution approach. The convolution approach permits an accurate study of the system load in statistical terms by accumulated calculations, since probabilistic results of the bandwidth allocation can be obtained. Nevertheless, the convolution approach has a high cost in terms of calculation and storage requirements. This aspect makes real-time calculations difficult, so many authors do not consider this approach. With the aim of reducing the cost we propose to use the multinomial distribution function: the enhanced convolution approach (ECA). This permits direct computation of the associated probabilities of the instantaneous bandwidth requirements and makes a simple deconvolution process possible. The ECA is used in connection acceptance control, and some results are presented

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The characteristics of service independence and flexibility of ATM networks make the control problems of such networks very critical. One of the main challenges in ATM networks is to design traffic control mechanisms that enable both economically efficient use of the network resources and desired quality of service to higher layer applications. Window flow control mechanisms of traditional packet switched networks are not well suited to real time services, at the speeds envisaged for the future networks. In this work, the utilisation of the Probability of Congestion (PC) as a bandwidth decision parameter is presented. The validity of PC utilisation is compared with QOS parameters in buffer-less environments when only the cell loss ratio (CLR) parameter is relevant. The convolution algorithm is a good solution for CAC in ATM networks with small buffers. If the source characteristics are known, the actual CLR can be very well estimated. Furthermore, this estimation is always conservative, allowing the retention of the network performance guarantees. Several experiments have been carried out and investigated to explain the deviation between the proposed method and the simulation. Time parameters for burst length and different buffer sizes have been considered. Experiments to confine the limits of the burst length with respect to the buffer size conclude that a minimum buffer size is necessary to achieve adequate cell contention. Note that propagation delay is a no dismiss limit for long distance and interactive communications, then small buffer must be used in order to minimise delay. Under previous premises, the convolution approach is the most accurate method used in bandwidth allocation. This method gives enough accuracy in both homogeneous and heterogeneous networks. But, the convolution approach has a considerable computation cost and a high number of accumulated calculations. To overcome this drawbacks, a new method of evaluation is analysed: the Enhanced Convolution Approach (ECA). In ECA, traffic is grouped in classes of identical parameters. By using the multinomial distribution function instead of the formula-based convolution, a partial state corresponding to each class of traffic is obtained. Finally, the global state probabilities are evaluated by multi-convolution of the partial results. This method avoids accumulated calculations and saves storage requirements, specially in complex scenarios. Sorting is the dominant factor for the formula-based convolution, whereas cost evaluation is the dominant factor for the enhanced convolution. A set of cut-off mechanisms are introduced to reduce the complexity of the ECA evaluation. The ECA also computes the CLR for each j-class of traffic (CLRj), an expression for the CLRj evaluation is also presented. We can conclude that by combining the ECA method with cut-off mechanisms, utilisation of ECA in real-time CAC environments as a single level scheme is always possible.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.

Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.

The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.

The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.

All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models