El Tratado de Estadística de Olegario Fernández Baños fue el primer libro de Estadística Matemática en sentido moderno que se publicó en España. Anteriormente, se habían publicado libros de estadística para la asignatura de geógrafa y estadística industrial y Mercantil de las Escuelas de Comercio y para la de Economía Política de las Facultades de Derecho. tos libros de texto para esas asignaturas trataban, generalmente, temas de carácter administrativa, descripción de los métodos estadísticos utilizados y aplicación de la Estadística a España.


La utilización del modelo de regresión lineal en los procesos relacionados con el análisis de datos demanda el conocimiento objetivo e instrumentación de la relación funcional de variables, el coeficiente de determinación y de correlación y la prueba de hipótesis como pilares fundamentales para verificar e interpretar su significancia estadística en el intervalo de confianza determinado. La presentación específica de los temas relacionados con el modelo de regresión lineal, el análisis de regresión, el uso de la ecuación de regresión como instrumento para estimar y predecir y la consideración del análisis de residuales ha sido realizada tomando como referente el estudio de problemas reales definidos en los entornos de la economía, la administración y la salud, utilizando como plataforma de apoyo la hoja de cálculo Excel®. Se consideran en este módulo didáctico, los elementos teóricos correspondientes al análisis de regresión lineal, como técnica estadística empleada para estudiar la relación entre variables determinísticas o aleatorias que resultan de algún tipo de investigación, en la cual se analiza el comportamiento de dos variables, una dependiente y otra independiente. Se muestra mediante la gráfica de dispersión el posible comportamiento de las variables: lineal directa, inversa, no lineal directa o no lineal inversa, con el fin de desarrollar en el lector las competencias interpretativas y propositivas requeridas para dimensionar integralmente la importancia de la estadística inferencial en la vida del profesional en ciencias económicas, administrativas y de la salud.


Elaboración de un cuestionario que recoja la respuesta de los alumnos a los distintos aspectos de contenido de las matemáticas en el Primer Ciclo de Educación Primaria. El cuestionario está diseñado como un test de potencia basado en la práctica docente. Recoge las aportaciones de distintos profesionales y tendencias en el proceso didáctico. Pretende identificar carencias de los alumnos en cada uno de los bloques temáticos y tipos de contenido que componen el currículo de matemáticas para el Primer Ciclo de Educación Primaria. El cuestionario se administró a alumnos de la Región de Murcia según la distribución territorial de la Consejería de Educación y Cultura. Una vez en disposición de los datos procedentes de la muestra de 682 alumnos, se procede al análisis de los cuestionarios tomando como punto de partida los supuestos de la Teoría de la Respuesta al Ítem, que es un compendio de modelos matemáticos que tratan de establecer, a partir de una función estadística, la probabilidad de que un sujeto acierte o falle un ítem. No se vincula a teorías sobre la inteligencia sino a problemas técnicos derivados de la construcción de test y a la estadística matemática. Se realiza un análisis factorial exploratorio para comprobar la hipótesis de partida. Al confirmarse, se procede a la realización de los correspondientes estudios de validez y a la confección de la ficha técnica del cuestionario. La hipótesis formulada partía de que la competencia matemática se estructura de forma multifactorial con factores ligados a aspectos numéricos, componentes heurísticos y a aspectos reacionados con la organización espacio-temporal.. Se ha realizado un Análisis de Componentes Principales con la finalidad de determinar el número de componentes que pueden explicar mayoritariamente la covariación entre los items. Los tres componentes encontrados son: el componente operativo, que hace referencia a las competencias en el manejo de algoritmos y la aplicación de los mismos en la solución de problemas. El componente estimativo, que hace referencia a las competencias en estimación y medida, así como a la localización mediante posiciones relativas y reconocimiento de formas y figuras y el componente de dominio local que hace referencia a las competencias en el manejo del valor posicional de las cifras de un número en lo referente al dominio de la semirecta de los números naturales. A la vista de los resultados, la competencia matemática se expresa en función de las componentes señaladas. El autor presenta aportaciones psicopedagógicas para la didáctica de las matemáticas en el Primer Ciclo de Educación Primaria, que se derivan de los resultados de su investigación..


Customer satisfaction and retention are key issues for organizations in today’s competitive market place. As such, much research and revenue has been invested in developing accurate ways of assessing consumer satisfaction at both the macro (national) and micro (organizational) level, facilitating comparisons in performance both within and between industries. Since the instigation of the national customer satisfaction indices (CSI), partial least squares (PLS) has been used to estimate the CSI models in preference to structural equation models (SEM) because they do not rely on strict assumptions about the data. However, this choice was based upon some misconceptions about the use of SEM’s and does not take into consideration more recent advances in SEM, including estimation methods that are robust to non-normality and missing data. In this paper, both SEM and PLS approaches were compared by evaluating perceptions of the Isle of Man Post Office Products and Customer service using a CSI format. The new robust SEM procedures were found to be advantageous over PLS. Product quality was found to be the only driver of customer satisfaction, while image and satisfaction were the only predictors of loyalty, thus arguing for the specificity of postal services


Observations in daily practice are sometimes registered as positive values larger then a given threshold α. The sample space is in this case the interval (α,+∞), α > 0, which can be structured as a real Euclidean space in different ways. This fact opens the door to alternative statistical models depending not only on the assumed distribution function, but also on the metric which is considered as appropriate, i.e. the way differences are measured, and thus variability


This paper is a first draft of the principle of statistical modelling on coordinates. Several causes —which would be long to detail—have led to this situation close to the deadline for submitting papers to CODAWORK’03. The main of them is the fast development of the approach along the last months, which let appear previous drafts as obsolete. The present paper contains the essential parts of the state of the art of this approach from my point of view. I would like to acknowledge many clarifying discussions with the group of people working in this field in Girona, Barcelona, Carrick Castle, Firenze, Berlin, G¨ottingen, and Freiberg. They have given a lot of suggestions and ideas. Nevertheless, there might be still errors or unclear aspects which are exclusively my fault. I hope this contribution serves as a basis for further discussions and new developments


One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By an essential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur in many compositional situations, such as household budget patterns, time budgets, palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful in such situations. From consideration of such examples it seems sensible to build up a model in two stages, the first determining where the zeros will occur and the second how the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data


The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Central notations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform. In this way very elaborated aspects of mathematical statistics can be understood easily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating, combination of likelihood and robust M-estimation functions are simple additions/ perturbations in A2(Pprior). Weighting observations corresponds to a weighted addition of the corresponding evidence. Likelihood based statistics for general exponential families turns out to have a particularly easy interpretation in terms of A2(P). Regular exponential families form finite dimensional linear subspaces of A2(P) and they correspond to finite dimensional subspaces formed by their posterior in the dual information space A2(Pprior). The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P. The discussion of A2(P) valued random variables, such as estimation functions or likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning


One of the disadvantages of old age is that there is more past than future: this, however, may be turned into an advantage if the wealth of experience and, hopefully, wisdom gained in the past can be reflected upon and throw some light on possible future trends. To an extent, then, this talk is necessarily personal, certainly nostalgic, but also self critical and inquisitive about our understanding of the discipline of statistics. A number of almost philosophical themes will run through the talk: search for appropriate modelling in relation to the real problem envisaged, emphasis on sensible balances between simplicity and complexity, the relative roles of theory and practice, the nature of communication of inferential ideas to the statistical layman, the inter-related roles of teaching, consultation and research. A list of keywords might be: identification of sample space and its mathematical structure, choices between transform and stay, the role of parametric modelling, the role of a sample space metric, the underused hypothesis lattice, the nature of compositional change, particularly in relation to the modelling of processes. While the main theme will be relevance to compositional data analysis we shall point to substantial implications for general multivariate analysis arising from experience of the development of compositional data analysis…


The biplot has proved to be a powerful descriptive and analytical tool in many areas of applications of statistics. For compositional data the necessary theoretical adaptation has been provided, with illustrative applications, by Aitchison (1990) and Aitchison and Greenacre (2002). These papers were restricted to the interpretation of simple compositional data sets. In many situations the problem has to be described in some form of conditional modelling. For example, in a clinical trial where interest is in how patients’ steroid metabolite compositions may change as a result of different treatment regimes, interest is in relating the compositions after treatment to the compositions before treatment and the nature of the treatments applied. To study this through a biplot technique requires the development of some form of conditional compositional biplot. This is the purpose of this paper. We choose as a motivating application an analysis of the 1992 US President ial Election, where interest may be in how the three-part composition, the percentage division among the three candidates - Bush, Clinton and Perot - of the presidential vote in each state, depends on the ethnic composition and on the urban-rural composition of the state. The methodology of conditional compositional biplots is first developed and a detailed interpretation of the 1992 US Presidential Election provided. We use a second application involving the conditional variability of tektite mineral compositions with respect to major oxide compositions to demonstrate some hazards of simplistic interpretation of biplots. Finally we conjecture on further possible applications of conditional compositional biplots


The use of orthonormal coordinates in the simplex and, particularly, balance coordinates, has suggested the use of a dendrogram for the exploratory analysis of compositional data. The dendrogram is based on a sequential binary partition of a compositional vector into groups of parts. At each step of a partition, one group of parts is divided into two new groups, and a balancing axis in the simplex between both groups is defined. The set of balancing axes constitutes an orthonormal basis, and the projections of the sample on them are orthogonal coordinates. They can be represented in a dendrogram-like graph showing: (a) the way of grouping parts of the compositional vector; (b) the explanatory role of each subcomposition generated in the partition process; (c) the decomposition of the total variance into balance components associated with each binary partition; (d) a box-plot of each balance. This representation is useful to help the interpretation of balance coordinates; to identify which are the most explanatory coordinates; and to describe the whole sample in a single diagram independently of the number of parts of the sample


The application of compositional data analysis through log ratio trans- formations corresponds to a multinomial logit model for the shares themselves. This model is characterized by the property of Independence of Irrelevant Alter- natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactly this invariance of the ratio that underlies the commonly used zero replacement procedure in compositional data analysis. In this paper we investigate using the nested logit model that does not embody IIA and an associated zero replacement procedure and compare its performance with that of the more usual approach of using the multinomial logit model. Our comparisons exploit a data set that com- bines voting data by electoral division with corresponding census data for each division for the 2001 Federal election in Australia