920 resultados para Multivariate data analysis
Resumo:
Hydrogeological research usually includes some statistical studies devised to elucidate mean background state, characterise relationships among different hydrochemical parameters, and show the influence of human activities. These goals are achieved either by means of a statistical approach or by mixing models between end-members. Compositional data analysis has proved to be effective with the first approach, but there is no commonly accepted solution to the end-member problem in a compositional framework. We present here a possible solution based on factor analysis of compositions illustrated with a case study. We find two factors on the compositional bi-plot fitting two non-centered orthogonal axes to the most representative variables. Each one of these axes defines a subcomposition, grouping those variables that lay nearest to it. With each subcomposition a log-contrast is computed and rewritten as an equilibrium equation. These two factors can be interpreted as the isometric log-ratio coordinates (ilr) of three hidden components, that can be plotted in a ternary diagram. These hidden components might be interpreted as end-members. We have analysed 14 molarities in 31 sampling stations all along the Llobregat River and its tributaries, with a monthly measure during two years. We have obtained a bi-plot with a 57% of explained total variance, from which we have extracted two factors: factor G, reflecting geological background enhanced by potash mining; and factor A, essentially controlled by urban and/or farming wastewater. Graphical representation of these two factors allows us to identify three extreme samples, corresponding to pristine waters, potash mining influence and urban sewage influence. To confirm this, we have available analysis of diffused and widespread point sources identified in the area: springs, potash mining lixiviates, sewage, and fertilisers. Each one of these sources shows a clear link with one of the extreme samples, except fertilisers due to the heterogeneity of their composition. This approach is a useful tool to distinguish end-members, and characterise them, an issue generally difficult to solve. It is worth note that the end-member composition cannot be fully estimated but only characterised through log-ratio relationships among components. Moreover, the influence of each endmember in a given sample must be evaluated in relative terms of the other samples. These limitations are intrinsic to the relative nature of compositional data
Resumo:
In standard multivariate statistical analysis common hypotheses of interest concern changes in mean vectors and subvectors. In compositional data analysis it is now well established that compositional change is most readily described in terms of the simplicial operation of perturbation and that subcompositions replace the marginal concept of subvectors. To motivate the statistical developments of this paper we present two challenging compositional problems from food production processes. Against this background the relevance of perturbations and subcompositions can be clearly seen. Moreover we can identify a number of hypotheses of interest involving the specification of particular perturbations or differences between perturbations and also hypotheses of subcompositional stability. We identify the two problems as being the counterpart of the analysis of paired comparison or split plot experiments and of separate sample comparative experiments in the jargon of standard multivariate analysis. We then develop appropriate estimation and testing procedures for a complete lattice of relevant compositional hypotheses
Resumo:
The statistical analysis of compositional data should be treated using logratios of parts, which are difficult to use correctly in standard statistical packages. For this reason a freeware package, named CoDaPack was created. This software implements most of the basic statistical methods suitable for compositional data. In this paper we describe the new version of the package that now is called CoDaPack3D. It is developed in Visual Basic for applications (associated with Excel©), Visual Basic and Open GL, and it is oriented towards users with a minimum knowledge of computers with the aim at being simple and easy to use. This new version includes new graphical output in 2D and 3D. These outputs could be zoomed and, in 3D, rotated. Also a customization menu is included and outputs could be saved in jpeg format. Also this new version includes an interactive help and all dialog windows have been improved in order to facilitate its use. To use CoDaPack one has to access Excel© and introduce the data in a standard spreadsheet. These should be organized as a matrix where Excel© rows correspond to the observations and columns to the parts. The user executes macros that return numerical or graphical results. There are two kinds of numerical results: new variables and descriptive statistics, and both appear on the same sheet. Graphical output appears in independent windows. In the present version there are 8 menus, with a total of 38 submenus which, after some dialogue, directly call the corresponding macro. The dialogues ask the user to input variables and further parameters needed, as well as where to put these results. The web site http://ima.udg.es/CoDaPack contains this freeware package and only Microsoft Excel© under Microsoft Windows© is required to run the software. Kew words: Compositional data Analysis, Software
Resumo:
The aim of this talk is to convince the reader that there are a lot of interesting statistical problems in presentday life science data analysis which seem ultimately connected with compositional statistics. Key words: SAGE, cDNA microarrays, (1D-)NMR, virus quasispecies
Resumo:
Theory of compositional data analysis is often focused on the composition only. However in practical applications we often treat a composition together with covariables with some other scale. This contribution systematically gathers and develop statistical tools for this situation. For instance, for the graphical display of the dependence of a composition with a categorical variable, a colored set of ternary diagrams might be a good idea for a first look at the data, but it will fast hide important aspects if the composition has many parts, or it takes extreme values. On the other hand colored scatterplots of ilr components could not be very instructive for the analyst, if the conventional, black-box ilr is used. Thinking on terms of the Euclidean structure of the simplex, we suggest to set up appropriate projections, which on one side show the compositional geometry and on the other side are still comprehensible by a non-expert analyst, readable for all locations and scales of the data. This is e.g. done by defining special balance displays with carefully- selected axes. Following this idea, we need to systematically ask how to display, explore, describe, and test the relation to complementary or explanatory data of categorical, real, ratio or again compositional scales. This contribution shows that it is sufficient to use some basic concepts and very few advanced tools from multivariate statistics (principal covariances, multivariate linear models, trellis or parallel plots, etc.) to build appropriate procedures for all these combinations of scales. This has some fundamental implications in their software implementation, and how might they be taught to analysts not already experts in multivariate analysis
Resumo:
The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed to explore patterns in high-dimensional multivariate data. The conventional version of the algorithm involves the use of Euclidean metric in the process of adaptation of the model vectors, thus rendering in theory a whole methodology incompatible with non-Euclidean geometries. In this contribution we explore the two main aspects of the problem: 1. Whether the conventional approach using Euclidean metric can shed valid results with compositional data. 2. If a modification of the conventional approach replacing vectorial sum and scalar multiplication by the canonical operators in the simplex (i.e. perturbation and powering) can converge to an adequate solution. Preliminary tests showed that both methodologies can be used on compositional data. However, the modified version of the algorithm performs poorer than the conventional version, in particular, when the data is pathological. Moreover, the conventional ap- proach converges faster to a solution, when data is \well-behaved". Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observed for each individual. A particular case of FDA is when the observed functions are density functions, that are also an example of infinite dimensional compositional data. In this work we compare several methods for dimensionality reduction for this particular type of data: functional principal components analysis (PCA) with or without a previous data transformation and multidimensional scaling (MDS) for diferent inter-densities distances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (households income distributions)
Resumo:
In this paper we examine the problem of compositional data from a different starting point. Chemical compositional data, as used in provenance studies on archaeological materials, will be approached from the measurement theory. The results will show, in a very intuitive way that chemical data can only be treated by using the approach developed for compositional data. It will be shown that compositional data analysis is a particular case in projective geometry, when the projective coordinates are in the positive orthant, and they have the properties of logarithmic interval metrics. Moreover, it will be shown that this approach can be extended to a very large number of applications, including shape analysis. This will be exemplified with a case study in architecture of Early Christian churches dated back to the 5th-7th centuries AD
Resumo:
notes for class discussion and exercise
Resumo:
Slides and Handouts for class introducing some of the concepts associated with the analysis of qualitative data
Resumo:
Introducción: La anestesia total intravenosa (TIVA) es ampliamente usada y reportada en la literatura como técnica para disminuir la respuesta a la laringoscopia e intubación, en la inducción y mantenimiento de una adecuada anestesia, además de una mejor estabilidad hemodinámica y recuperación pos anestésica; sin embargo no existen un gran número de estudios que comparen el uso de TIVA, determinando si existen diferencias en el perfil farmacocinético según el género del paciente. Objetivo: Describir diferencias farmacocinéticas y de los tiempos de despertar y salida a la unidad de cuidados pos anestésicos (descarga), según el género; en pacientes que reciben TIVA, con remifentanil y propofol, orientado por Stangraf. Metodología: Estudio observacional analítico de corte transversal, en pacientes llevados a cirugía bajo TIVA en el Hospital Occidente de Kennedy en el periodo de junio de 2013 a Enero de 2014.Usando SPSS versión 20 Windows, se analizaron los datos mediante pruebas Kolmogorov-Smirnov y Shapiro-Wilk y U de Mann Withney. Un valor de p menor 0.05 fue aceptado como estadísticamente significativo. Resultados: Se aplicaron pruebas de normalidad y no se encontraron diferencias estadísticamente significativas entre género. El tiempo de despertar fue 9.36 minutos para mujeres y 11.26 minutos para hombres. Los tiempos de descarga fueron 10.71 minutos para mujeres y 12.82 minutos para hombres. Discusión. El tiempo de despertar y descarga no es diferente entre mujeres y hombres en los pacientes analizados. Se requieren estudios adicionales entre grupos poblacionales de diversas condiciones farmacocineticas para corroborar los datos.
Resumo:
En este estudio se realizó un análisis predictivo de la aparición de eventos adversos de los pacientes de una IPS de Bogotá, Mederi Hospital Universitario de Barrios Unidos (HUBU) durante el año 2013; relacionados con los indicadores de eficiencia hospitalaria (Porcentaje de ocupación hospitalaria, número de egresos hospitalarios, promedio de estancia hospitalaria, número de egresos de urgencias, promedio de estancia en urgencias). Los datos fueron exportados a una matriz de análisis de las variables cualitativas; fueron presentadas con frecuencias absolutas y relativas, las variables cuantitativas (edad, tiempos de estancia) fueron presentadas con media, desviaciones estándar. Se agruparon los datos de eventos adversos y de eficiencia hospitalaria en una nueva matriz que permitiera el análisis predictivo la nueva matriz fue exportada al software de modelación estadístico Eviews 6.5; se especificaron modelos predictivos multivariados para la variable número de eventos adversos, respecto de los indicadores de eficiencia hospitalaria y se estimaron las probabilidades de ocurrencia, análisis de correlación y multicolinealidad; los resultados se presentaron en tablas de estimación para cada modelo, se restringieron los eventos adversos prevenibles y no prevenibles información obtenida a través de un sistema de información que registra los factores relacionados con la ocurrencia de eventos adversos en salud, a través del sistema de reporte de eventos en salud, reporte en las historias clínicas, reporte individual, reporte por servicio, análisis de datos y estudios de caso, de la misma forma fueron extraídos los datos de eficiencia hospitalaria para el mismo periodo. El análisis y gestión de eventos adversos pretende establecer estrategias de mejoramiento continuo y análisis de resultados frente a los indicadores de eficiencia que permitan intervención de los factores de riesgo operativo de los servicios del Hospital Universitario de Barrios Unidos (HUBU), relacionados con eventos adversos en la atención de los pacientes en especial se debe enfocar en la gestión de los egresos de pacientes de acuerdo a los resultados obtenidos con el fin de alinearse y fortalecer las políticas de seguridad del paciente para brindar una atención integral con calidad y eficiencia, disminuyendo las quejas en la atención, las glosas, los riesgos jurídicos, de acuerdo al modelo predictivo estudiado.
Resumo:
La dependencia entre las series financieras, es un parámetro fundamental para la estimación de modelos de Riesgo. El Valor en Riesgo (VaR) es una de las medidas más importantes utilizadas para la administración y gestión de Riesgos Financieros, en la actualidad existen diferentes métodos para su estimación, como el método por simulación histórica, el cual no asume ninguna distribución sobre los retornos de los factores de riesgo o activos, o los métodos paramétricos que asumen normalidad sobre las distribuciones. En este documento se introduce la teoría de cópulas, como medida de dependencia entre las series, se estima un modelo ARMA-GARCH-Cópula para el cálculo del Valor en Riesgo de un portafolio compuesto por dos series financiera, la tasa de cambio Dólar-Peso y Euro-Peso. Los resultados obtenidos muestran que la estimación del VaR por medio de copulas es más preciso en relación a los métodos tradicionales.
Resumo:
Eye tracking has become a preponderant technique in the evaluation of user interaction and behaviour with study objects in defined contexts. Common eye tracking related data representation techniques offer valuable input regarding user interaction and eye gaze behaviour, namely through fixations and saccades measurement. However, these and other techniques may be insufficient for the representation of acquired data in specific studies, namely because of the complexity of the study object being analysed. This paper intends to contribute with a summary of data representation and information visualization techniques used in data analysis within different contexts (advertising, websites, television news and video games). Additionally, several methodological approaches are presented in this paper, which resulted from several studies developed and under development at CETAC.MEDIA - Communication Sciences and Technologies Research Centre. In the studies described, traditional data representation techniques were insufficient. As a result, new approaches were necessary and therefore, new forms of representing data, based on common techniques were developed with the objective of improving communication and information strategies. In each of these studies, a brief summary of the contribution to their respective area will be presented, as well as the data representation techniques used and some of the acquired results.