6 resultados para monotone missing data
em Universitat de Girona, Spain
Resumo:
Customer satisfaction and retention are key issues for organizations in today’s competitive market place. As such, much research and revenue has been invested in developing accurate ways of assessing consumer satisfaction at both the macro (national) and micro (organizational) level, facilitating comparisons in performance both within and between industries. Since the instigation of the national customer satisfaction indices (CSI), partial least squares (PLS) has been used to estimate the CSI models in preference to structural equation models (SEM) because they do not rely on strict assumptions about the data. However, this choice was based upon some misconceptions about the use of SEM’s and does not take into consideration more recent advances in SEM, including estimation methods that are robust to non-normality and missing data. In this paper, both SEM and PLS approaches were compared by evaluating perceptions of the Isle of Man Post Office Products and Customer service using a CSI format. The new robust SEM procedures were found to be advantageous over PLS. Product quality was found to be the only driver of customer satisfaction, while image and satisfaction were the only predictors of loyalty, thus arguing for the specificity of postal services
Resumo:
The R-package “compositions”is a tool for advanced compositional analysis. Its basic functionality has seen some conceptual improvement, containing now some facilities to work with and represent ilr bases built from balances, and an elaborated subsys- tem for dealing with several kinds of irregular data: (rounded or structural) zeroes, incomplete observations and outliers. The general approach to these irregularities is based on subcompositions: for an irregular datum, one can distinguish a “regular” sub- composition (where all parts are actually observed and the datum behaves typically) and a “problematic” subcomposition (with those unobserved, zero or rounded parts, or else where the datum shows an erratic or atypical behaviour). Systematic classification schemes are proposed for both outliers and missing values (including zeros) focusing on the nature of irregularities in the datum subcomposition(s). To compute statistics with values missing at random and structural zeros, a projection approach is implemented: a given datum contributes to the estimation of the desired parameters only on the subcompositon where it was observed. For data sets with values below the detection limit, two different approaches are provided: the well-known imputation technique, and also the projection approach. To compute statistics in the presence of outliers, robust statistics are adapted to the characteristics of compositional data, based on the minimum covariance determinant approach. The outlier classification is based on four different models of outlier occur- rence and Monte-Carlo-based tests for their characterization. Furthermore the package provides special plots helping to understand the nature of outliers in the dataset. Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator, robustness, rounded zeros
Resumo:
El objetivo de esta tesis es predecir el rendimiento de los estudiantes de doctorado en la Universidad de Girona según características personales (background), actitudinales y de redes sociales de los estudiantes. La población estudiada son estudiantes de tercer y cuarto curso de doctorado y sus directores de tesis doctoral. Para obtener los datos se ha diseño un cuestionario web especificando sus ventajas y teniendo en cuenta algunos problemas tradicionales de no cobertura o no respuesta. El cuestionario web se hizo debido a la complejidad que comportan de las preguntas de red social. El cuestionario electrónico permite, mediante una serie de instrucciones, reducir el tiempo para responder y hacerlo menos cargado. Este cuestionario web, además es auto administrado, lo cual nos permite, según la literatura, unas respuestas mas honestas que cuestionario con encuestador. Se analiza la calidad de las preguntas de red social en cuestionario web para datos egocéntricos. Para eso se calcula la fiabilidad y la validez de este tipo de preguntas, por primera vez a través del modelo Multirasgo Multimétodo (Multitrait Multimethod). Al ser datos egocéntricos, se pueden considerar jerárquicos, y por primera vez se una un modelo Multirasgo Multimétodo Multinivel (multilevel Multitrait Multimethod). Las la fiabilidad y validez se pueden obtener a nivel individual (within group component) o a nivel de grupo (between group component) y se usan para llevar a cabo un meta-análisis con otras universidades europeas para analizar ciertas características de diseño del cuestionario. Estas características analizan si para preguntas de red social hechas en cuestionarios web son más fiables y validas hechas "by questions" o "by alters", si son presentes todas las etiquetas de frecuencia para los ítems o solo la del inicio y final, o si es mejor que el diseño del cuestionario esté en con color o blanco y negro. También se analiza la calidad de la red social en conjunto, en este caso específico son los grupos de investigación de la universidad. Se tratan los problemas de los datos ausentes en las redes completas. Se propone una nueva alternativa a la solución típica de la red egocéntrica o los respondientes proxies. Esta nueva alternativa la hemos nombrado "Nosduocentered Network" (red Nosduocentrada), se basa en dos actores centrales en una red. Estimando modelos de regresión, esta "Nosduocentered network" tiene mas poder predictivo para el rendimiento de los estudiantes de doctorado que la red egocéntrica. Además se corrigen las correlaciones de las variables actitudinales por atenuación debido al pequeño tamaño muestral. Finalmente, se hacen regresiones de los tres tipos de variables (background, actitudinales y de red social) y luego se combinan para analizar cual para predice mejor el rendimiento (según publicaciones académicas) de los estudiantes de doctorado. Los resultados nos llevan a predecir el rendimiento académico de los estudiantes de doctorado depende de variables personales (background) i actitudinales. Asimismo, se comparan los resultados obtenidos con otros estudios publicados.
Resumo:
This analysis was stimulated by the real data analysis problem of household expenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that try to add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spending excluding alcohol/tobacco similar for teetotal and non-teetotal households? In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than one component, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durables within the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small. While this analysis is based on around economic data, the ideas carry over to many other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced
Resumo:
R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computing and graphics. The environment in which many classical and modern statistical techniques have been implemented, but many are supplied as packages. There are 8 standard packages and many more are available through the cran family of Internet sites http://cran.r-project.org . We started to develop a library of functions in R to support the analysis of mixtures and our goal is a MixeR package for compositional data analysis that provides support for operations on compositions: perturbation and power multiplication, subcomposition with or without residuals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc. graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features: barycenter, geometric mean of the data set, the percentiles lines, marking and coloring of subsets of the data set, theirs geometric means, notation of individual data in the set . . . dealing with zeros and missing values in compositional data sets with R procedures for simple and multiplicative replacement strategy, the time series analysis of compositional data. We’ll present the current status of MixeR development and illustrate its use on selected data sets