10 resultados para Missing values
em Universitat de Girona, Spain
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced
Resumo:
R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computing and graphics. The environment in which many classical and modern statistical techniques have been implemented, but many are supplied as packages. There are 8 standard packages and many more are available through the cran family of Internet sites http://cran.r-project.org . We started to develop a library of functions in R to support the analysis of mixtures and our goal is a MixeR package for compositional data analysis that provides support for operations on compositions: perturbation and power multiplication, subcomposition with or without residuals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc. graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features: barycenter, geometric mean of the data set, the percentiles lines, marking and coloring of subsets of the data set, theirs geometric means, notation of individual data in the set . . . dealing with zeros and missing values in compositional data sets with R procedures for simple and multiplicative replacement strategy, the time series analysis of compositional data. We’ll present the current status of MixeR development and illustrate its use on selected data sets
Resumo:
The R-package “compositions”is a tool for advanced compositional analysis. Its basic functionality has seen some conceptual improvement, containing now some facilities to work with and represent ilr bases built from balances, and an elaborated subsys- tem for dealing with several kinds of irregular data: (rounded or structural) zeroes, incomplete observations and outliers. The general approach to these irregularities is based on subcompositions: for an irregular datum, one can distinguish a “regular” sub- composition (where all parts are actually observed and the datum behaves typically) and a “problematic” subcomposition (with those unobserved, zero or rounded parts, or else where the datum shows an erratic or atypical behaviour). Systematic classification schemes are proposed for both outliers and missing values (including zeros) focusing on the nature of irregularities in the datum subcomposition(s). To compute statistics with values missing at random and structural zeros, a projection approach is implemented: a given datum contributes to the estimation of the desired parameters only on the subcompositon where it was observed. For data sets with values below the detection limit, two different approaches are provided: the well-known imputation technique, and also the projection approach. To compute statistics in the presence of outliers, robust statistics are adapted to the characteristics of compositional data, based on the minimum covariance determinant approach. The outlier classification is based on four different models of outlier occur- rence and Monte-Carlo-based tests for their characterization. Furthermore the package provides special plots helping to understand the nature of outliers in the dataset. Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator, robustness, rounded zeros
Resumo:
All of the imputation techniques usually applied for replacing values below the detection limit in compositional data sets have adverse effects on the variability. In this work we propose a modification of the EM algorithm that is applied using the additive log-ratio transformation. This new strategy is applied to a compositional data set and the results are compared with the usual imputation techniques
Resumo:
In this research we explore several aspects of quality of life in young people, working with factors such as self-esteem, locus of control, perceived social support, values, and so on. We examine the correlations among factors that influence the values and life satisfaction of adolescents aged 12-16. Furthermore, we analyze the data obtained from the children, on the one hand, and their parents, on the other, we explore the relationships between the factors and we consider the agreements and discrepancies between the responses of parents and their offspring
Resumo:
The main objective of this paper aims at developing a methodology that takes into account the human factor extracted from the data base used by the recommender systems, and which allow to resolve the specific problems of prediction and recommendation. In this work, we propose to extract the user's human values scale from the data base of the users, to improve their suitability in open environments, such as the recommender systems. For this purpose, the methodology is applied with the data of the user after interacting with the system. The methodology is exemplified with a case study
Resumo:
In this paper we set out a confirmatory factor analysis model relating the values adolescents and their parents aspire to for the child’s future. We approach a problem when collecting parents’ answers and analysing paired data from parents and their child: the fact that in some families only one parent answers, while in others both meet to answer together. In order to account for differences between one-parent and two-parent responses we follow a multiple group structural equation modelling approach. Some significant differences emerged between the two and one answering parent groups. We observed only weak relationships between parents’ and children’s values
Resumo:
Given an observed test statistic and its degrees of freedom, one may compute the observed P value with most statistical packages. It is unknown to what extent test statistics and P values are congruent in published medical papers. Methods: We checked the congruence of statistical results reported in all the papers of volumes 409–412 of Nature (2001) and a random sample of 63 results from volumes 322–323 of BMJ (2001). We also tested whether the frequencies of the last digit of a sample of 610 test statistics deviated from a uniform distribution (i.e., equally probable digits).Results: 11.6% (21 of 181) and 11.1% (7 of 63) of the statistical results published in Nature and BMJ respectively during 2001 were incongruent, probably mostly due to rounding, transcription, or type-setting errors. At least one such error appeared in 38% and 25% of the papers of Nature and BMJ, respectively. In 12% of the cases, the significance level might change one or more orders of magnitude. The frequencies of the last digit of statistics deviated from the uniform distribution and suggested digit preference in rounding and reporting.Conclusions: this incongruence of test statistics and P values is another example that statistical practice is generally poor, even in the most renowned scientific journals, and that quality of papers should be more controlled and valued
Resumo:
En años recientes,la Inteligencia Artificial ha contribuido a resolver problemas encontrados en el desempeño de las tareas de unidades informáticas, tanto si las computadoras están distribuidas para interactuar entre ellas o en cualquier entorno (Inteligencia Artificial Distribuida). Las Tecnologías de la Información permiten la creación de soluciones novedosas para problemas específicos mediante la aplicación de los hallazgos en diversas áreas de investigación. Nuestro trabajo está dirigido a la creación de modelos de usuario mediante un enfoque multidisciplinario en los cuales se emplean los principios de la psicología, inteligencia artificial distribuida, y el aprendizaje automático para crear modelos de usuario en entornos abiertos; uno de estos es la Inteligencia Ambiental basada en Modelos de Usuario con funciones de aprendizaje incremental y distribuido (conocidos como Smart User Model). Basándonos en estos modelos de usuario, dirigimos esta investigación a la adquisición de características del usuario importantes y que determinan la escala de valores dominantes de este en aquellos temas en los cuales está más interesado, desarrollando una metodología para obtener la Escala de Valores Humanos del usuario con respecto a sus características objetivas, subjetivas y emocionales (particularmente en Sistemas de Recomendación).Una de las áreas que ha sido poco investigada es la inclusión de la escala de valores humanos en los sistemas de información. Un Sistema de Recomendación, Modelo de usuario o Sistemas de Información, solo toman en cuenta las preferencias y emociones del usuario [Velásquez, 1996, 1997; Goldspink, 2000; Conte and Paolucci, 2001; Urban and Schmidt, 2001; Dal Forno and Merlone, 2001, 2002; Berkovsky et al., 2007c]. Por lo tanto, el principal enfoque de nuestra investigación está basado en la creación de una metodología que permita la generación de una escala de valores humanos para el usuario desde el modelo de usuario. Presentamos resultados obtenidos de un estudio de casos utilizando las características objetivas, subjetivas y emocionales en las áreas de servicios bancarios y de restaurantes donde la metodología propuesta en esta investigación fue puesta a prueba.En esta tesis, las principales contribuciones son: El desarrollo de una metodología que, dado un modelo de usuario con atributos objetivos, subjetivos y emocionales, se obtenga la Escala de Valores Humanos del usuario. La metodología propuesta está basada en el uso de aplicaciones ya existentes, donde todas las conexiones entre usuarios, agentes y dominios que se caracterizan por estas particularidades y atributos; por lo tanto, no se requiere de un esfuerzo extra por parte del usuario.