4 resultados para Data anonymization and sanitization

em Universitat de Girona, Spain


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The application of compositional data analysis through log ratio trans- formations corresponds to a multinomial logit model for the shares themselves. This model is characterized by the property of Independence of Irrelevant Alter- natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactly this invariance of the ratio that underlies the commonly used zero replacement procedure in compositional data analysis. In this paper we investigate using the nested logit model that does not embody IIA and an associated zero replacement procedure and compare its performance with that of the more usual approach of using the multinomial logit model. Our comparisons exploit a data set that com- bines voting data by electoral division with corresponding census data for each division for the 2001 Federal election in Australia

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aquesta tesi estudia com estimar la distribució de les variables regionalitzades l'espai mostral i l'escala de les quals admeten una estructura d'espai Euclidià. Apliquem el principi del treball en coordenades: triem una base ortonormal, fem estadística sobre les coordenades de les dades, i apliquem els output a la base per tal de recuperar un resultat en el mateix espai original. Aplicant-ho a les variables regionalitzades, obtenim una aproximació única consistent, que generalitza les conegudes propietats de les tècniques de kriging a diversos espais mostrals: dades reals, positives o composicionals (vectors de components positives amb suma constant) són tractades com casos particulars. D'aquesta manera, es generalitza la geostadística lineal, i s'ofereix solucions a coneguts problemes de la no-lineal, tot adaptant la mesura i els criteris de representativitat (i.e., mitjanes) a les dades tractades. L'estimador per a dades positives coincideix amb una mitjana geomètrica ponderada, equivalent a l'estimació de la mediana, sense cap dels problemes del clàssic kriging lognormal. El cas composicional ofereix solucions equivalents, però a més permet estimar vectors de probabilitat multinomial. Amb una aproximació bayesiana preliminar, el kriging de composicions esdevé també una alternativa consistent al kriging indicador. Aquesta tècnica s'empra per estimar funcions de probabilitat de variables qualsevol, malgrat que sovint ofereix estimacions negatives, cosa que s'evita amb l'alternativa proposada. La utilitat d'aquest conjunt de tècniques es comprova estudiant la contaminació per amoníac a una estació de control automàtic de la qualitat de l'aigua de la conca de la Tordera, i es conclou que només fent servir les tècniques proposades hom pot detectar en quins instants l'amoni es transforma en amoníac en una concentració superior a la legalment permesa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability, and the possible processes associated with compositional data sets from many disciplines. In this paper we concentrate on geochemical data sets. First we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of subcompositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained together with the necessary tools for a staying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland. Finally we point out a number of unresolved problems in the statistical analysis of compositional processes

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel density estimation techniques in the context of compositional data analysis. Indeed, they gave two options for the choice of the kernel to be used in the kernel estimator. One of these kernels is based on the use the alr transformation on the simplex SD jointly with the normal distribution on RD-1. However, these authors themselves recognized that this method has some deficiencies. A method for overcoming these dificulties based on recent developments for compositional data analysis and multivariate kernel estimation theory, combining the ilr transformation with the use of the normal density with a full bandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu- Figueras (2006). Here we present an extensive simulation study that compares both methods in practice, thus exploring the finite-sample behaviour of both estimators