9 resultados para multinomial logit
em Universitat de Girona, Spain
Resumo:
The application of compositional data analysis through log ratio trans- formations corresponds to a multinomial logit model for the shares themselves. This model is characterized by the property of Independence of Irrelevant Alter- natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactly this invariance of the ratio that underlies the commonly used zero replacement procedure in compositional data analysis. In this paper we investigate using the nested logit model that does not embody IIA and an associated zero replacement procedure and compare its performance with that of the more usual approach of using the multinomial logit model. Our comparisons exploit a data set that com- bines voting data by electoral division with corresponding census data for each division for the 2001 Federal election in Australia
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models
Resumo:
The authors focus on one of the methods for connection acceptance control (CAC) in an ATM network: the convolution approach. With the aim of reducing the cost in terms of calculation and storage requirements, they propose the use of the multinomial distribution function. This permits direct computation of the associated probabilities of the instantaneous bandwidth requirements. This in turn makes possible a simple deconvolution process. Moreover, under certain conditions additional improvements may be achieved
Resumo:
A novel test of spatial independence of the distribution of crystals or phases in rocks based on compositional statistics is introduced. It improves and generalizes the common joins-count statistics known from map analysis in geographic information systems. Assigning phases independently to objects in RD is modelled by a single-trial multinomial random function Z(x), where the probabilities of phases add to one and are explicitly modelled as compositions in the K-part simplex SK. Thus, apparent inconsistencies of the tests based on the conventional joins{count statistics and their possibly contradictory interpretations are avoided. In practical applications we assume that the probabilities of phases do not depend on the location but are identical everywhere in the domain of de nition. Thus, the model involves the sum of r independent identical multinomial distributed 1-trial random variables which is an r-trial multinomial distributed random variable. The probabilities of the distribution of the r counts can be considered as a composition in the Q-part simplex SQ. They span the so called Hardy-Weinberg manifold H that is proved to be a K-1-affine subspace of SQ. This is a generalisation of the well-known Hardy-Weinberg law of genetics. If the assignment of phases accounts for some kind of spatial dependence, then the r-trial probabilities do not remain on H. This suggests the use of the Aitchison distance between observed probabilities to H to test dependence. Moreover, when there is a spatial uctuation of the multinomial probabilities, the observed r-trial probabilities move on H. This shift can be used as to check for these uctuations. A practical procedure and an algorithm to perform the test have been developed. Some cases applied to simulated and real data are presented. Key words: Spatial distribution of crystals in rocks, spatial distribution of phases, joins-count statistics, multinomial distribution, Hardy-Weinberg law, Hardy-Weinberg manifold, Aitchison geometry
Resumo:
The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares
Resumo:
This paper focuses on one of the methods for bandwidth allocation in an ATM network: the convolution approach. The convolution approach permits an accurate study of the system load in statistical terms by accumulated calculations, since probabilistic results of the bandwidth allocation can be obtained. Nevertheless, the convolution approach has a high cost in terms of calculation and storage requirements. This aspect makes real-time calculations difficult, so many authors do not consider this approach. With the aim of reducing the cost we propose to use the multinomial distribution function: the enhanced convolution approach (ECA). This permits direct computation of the associated probabilities of the instantaneous bandwidth requirements and makes a simple deconvolution process possible. The ECA is used in connection acceptance control, and some results are presented
Resumo:
Aquesta tesi estudia com estimar la distribució de les variables regionalitzades l'espai mostral i l'escala de les quals admeten una estructura d'espai Euclidià. Apliquem el principi del treball en coordenades: triem una base ortonormal, fem estadística sobre les coordenades de les dades, i apliquem els output a la base per tal de recuperar un resultat en el mateix espai original. Aplicant-ho a les variables regionalitzades, obtenim una aproximació única consistent, que generalitza les conegudes propietats de les tècniques de kriging a diversos espais mostrals: dades reals, positives o composicionals (vectors de components positives amb suma constant) són tractades com casos particulars. D'aquesta manera, es generalitza la geostadística lineal, i s'ofereix solucions a coneguts problemes de la no-lineal, tot adaptant la mesura i els criteris de representativitat (i.e., mitjanes) a les dades tractades. L'estimador per a dades positives coincideix amb una mitjana geomètrica ponderada, equivalent a l'estimació de la mediana, sense cap dels problemes del clàssic kriging lognormal. El cas composicional ofereix solucions equivalents, però a més permet estimar vectors de probabilitat multinomial. Amb una aproximació bayesiana preliminar, el kriging de composicions esdevé també una alternativa consistent al kriging indicador. Aquesta tècnica s'empra per estimar funcions de probabilitat de variables qualsevol, malgrat que sovint ofereix estimacions negatives, cosa que s'evita amb l'alternativa proposada. La utilitat d'aquest conjunt de tècniques es comprova estudiant la contaminació per amoníac a una estació de control automàtic de la qualitat de l'aigua de la conca de la Tordera, i es conclou que només fent servir les tècniques proposades hom pot detectar en quins instants l'amoni es transforma en amoníac en una concentració superior a la legalment permesa.
Resumo:
La tesi doctoral és un estudi de l'evolució del paisatge de les closes de l'Alt Empordà entre els anys 1957 i 2001. Dins el marc teòric s'hi explora el concepte de paisatge cultural agrari tradicional, es fa una introducció de les aportacions que fa la geografia històrica a l'estudi de les closes arreu d'Europa i es presenten els fonaments de l'ecologia del paisatge. La recerca es basa en la fotointerpretació de fotografies aèries ampliades i l'aplicació d'índexs espacials propis de l'ecologia del paisatge per a l'anàlisi de l'evolució de l'estructura del paisatge. Es complementa amb la realització d'entrevistes personals als gestors del paisatge per tal de conèixer com han dut a terme la gestió de les closes al llarg del període estudiat. Es combinen ambdós tipus d'informació a través de la tècnica estadística de la regressió logística multinomial per tal de descobrir el funcionament de les interaccions entre aquests àmbits.
Resumo:
The characteristics of service independence and flexibility of ATM networks make the control problems of such networks very critical. One of the main challenges in ATM networks is to design traffic control mechanisms that enable both economically efficient use of the network resources and desired quality of service to higher layer applications. Window flow control mechanisms of traditional packet switched networks are not well suited to real time services, at the speeds envisaged for the future networks. In this work, the utilisation of the Probability of Congestion (PC) as a bandwidth decision parameter is presented. The validity of PC utilisation is compared with QOS parameters in buffer-less environments when only the cell loss ratio (CLR) parameter is relevant. The convolution algorithm is a good solution for CAC in ATM networks with small buffers. If the source characteristics are known, the actual CLR can be very well estimated. Furthermore, this estimation is always conservative, allowing the retention of the network performance guarantees. Several experiments have been carried out and investigated to explain the deviation between the proposed method and the simulation. Time parameters for burst length and different buffer sizes have been considered. Experiments to confine the limits of the burst length with respect to the buffer size conclude that a minimum buffer size is necessary to achieve adequate cell contention. Note that propagation delay is a no dismiss limit for long distance and interactive communications, then small buffer must be used in order to minimise delay. Under previous premises, the convolution approach is the most accurate method used in bandwidth allocation. This method gives enough accuracy in both homogeneous and heterogeneous networks. But, the convolution approach has a considerable computation cost and a high number of accumulated calculations. To overcome this drawbacks, a new method of evaluation is analysed: the Enhanced Convolution Approach (ECA). In ECA, traffic is grouped in classes of identical parameters. By using the multinomial distribution function instead of the formula-based convolution, a partial state corresponding to each class of traffic is obtained. Finally, the global state probabilities are evaluated by multi-convolution of the partial results. This method avoids accumulated calculations and saves storage requirements, specially in complex scenarios. Sorting is the dominant factor for the formula-based convolution, whereas cost evaluation is the dominant factor for the enhanced convolution. A set of cut-off mechanisms are introduced to reduce the complexity of the ECA evaluation. The ECA also computes the CLR for each j-class of traffic (CLRj), an expression for the CLRj evaluation is also presented. We can conclude that by combining the ECA method with cut-off mechanisms, utilisation of ECA in real-time CAC environments as a single level scheme is always possible.