990 resultados para multidimensional data
Resumo:
In this paper, we apply multidimensional scaling (MDS) and parametric similarity indices (PSI) in the analysis of complex systems (CS). Each CS is viewed as a dynamical system, exhibiting an output time-series to be interpreted as a manifestation of its behavior. We start by adopting a sliding window to sample the original data into several consecutive time periods. Second, we define a given PSI for tracking pieces of data. We then compare the windows for different values of the parameter, and we generate the corresponding MDS maps of ‘points’. Third, we use Procrustes analysis to linearly transform the MDS charts for maximum superposition and to build a global MDS map of “shapes”. This final plot captures the time evolution of the phenomena and is sensitive to the PSI adopted. The generalized correlation, the Minkowski distance and four entropy-based indices are tested. The proposed approach is applied to the Dow Jones Industrial Average stock market index and the Europe Brent Spot Price FOB time-series.
Resumo:
Complex industrial plants exhibit multiple interactions among smaller parts and with human operators. Failure in one part can propagate across subsystem boundaries causing a serious disaster. This paper analyzes the industrial accident data series in the perspective of dynamical systems. First, we process real world data and show that the statistics of the number of fatalities reveal features that are well described by power law (PL) distributions. For early years, the data reveal double PL behavior, while, for more recent time periods, a single PL fits better into the experimental data. Second, we analyze the entropy of the data series statistics over time. Third, we use the Kullback–Leibler divergence to compare the empirical data and multidimensional scaling (MDS) techniques for data analysis and visualization. Entropy-based analysis is adopted to assess complexity, having the advantage of yielding a single parameter to express relationships between the data. The classical and the generalized (fractional) entropy and Kullback–Leibler divergence are used. The generalized measures allow a clear identification of patterns embedded in the data.
Resumo:
This paper studies the impact of the energy upon electricity markets using Multidimensional Scaling (MDS). Data from major energy and electricity markets is considered. Several maps produced by MDS are presented and discussed revealing that this method is useful for understanding the correlation between them. Furthermore, the results help electricity markets agents hedging against Market Clearing Price (MCP) volatility.
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Tese de Doutoramento em Ciências da Saúde
Resumo:
In most psychological tests and questionnaires, a test score is obtained bytaking the sum of the item scores. In virtually all cases where the test orquestionnaire contains multidimensional forced-choice items, this traditionalscoring method is also applied. We argue that the summation of scores obtained with multidimensional forced-choice items produces uninterpretabletest scores. Therefore, we propose three alternative scoring methods: a weakand a strict rank preserving scoring method, which both allow an ordinalinterpretation of test scores; and a ratio preserving scoring method, whichallows a proportional interpretation of test scores. Each proposed scoringmethod yields an index for each respondent indicating the degree to whichthe response pattern is inconsistent. Analysis of real data showed that withrespect to rank preservation, the weak and strict rank preserving methodresulted in lower inconsistency indices than the traditional scoring method;with respect to ratio preservation, the ratio preserving scoring method resulted in lower inconsistency indices than the traditional scoring method
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observedfor each individual. A particular case of FDA is when the observed functions are densityfunctions, that are also an example of infinite dimensional compositional data. In thiswork we compare several methods for dimensionality reduction for this particular typeof data: functional principal components analysis (PCA) with or without a previousdata transformation and multidimensional scaling (MDS) for diferent inter-densitiesdistances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (householdsincome distributions)
Resumo:
This paper establishes a general framework for metric scaling of any distance measure between individuals based on a rectangular individuals-by-variables data matrix. The method allows visualization of both individuals and variables as well as preserving all the good properties of principal axis methods such as principal components and correspondence analysis, based on the singular-value decomposition, including the decomposition of variance into components along principal axes which provide the numerical diagnostics known as contributions. The idea is inspired from the chi-square distance in correspondence analysis which weights each coordinate by an amount calculated from the margins of the data table. In weighted metric multidimensional scaling (WMDS) we allow these weights to be unknown parameters which are estimated from the data to maximize the fit to the original distances. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing a matrix and displaying its rows and columns in biplots.
Resumo:
The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.
Resumo:
We propose a multivariate approach to the study of geographic species distribution which does not require absence data. Building on Hutchinson's concept of the ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution of the localities where the focal species was observed to a reference set describing the whole study area. The first factor extracted maximizes the marginality of the focal species, defined as the ecological distance between the species optimum and the mean habitat within the reference area. The other factors maximize the specialization of this focal species, defined as the ratio of the ecological variance in mean habitat to that observed for the focal species. Eigenvectors and eigenvalues are readily interpreted and can be used to build habitat-suitability maps. This approach is recommended in Situations where absence data are not available (many data banks), unreliable (most cryptic or rare species), or meaningless (invaders). We provide an illustration and validation of the method for the alpine ibex, a species reintroduced in Switzerland which presumably has not yet recolonized its entire range.
Resumo:
We have investigated the phenomenon of deprivation in contemporary Switzerland through the adoption of a multidimensional, dynamic approach. By applying Self Organizing Maps (SOM) to a set of 33 non-monetary indicators from the 2009 wave of the Swiss Household Panel (SHP), we identified 13 prototypical forms (or clusters) of well-being, financial vulnerability, psycho-physiological fragility and deprivation within a topological dimensional space. Then new data from the previous waves (2003 to 2008) were classified by the SOM model, making it possible to estimate the weight of the different clusters in time and reconstruct the dynamics of stability and mobility of individuals within the map. Looking at the transition probabilities between year t and year t+1, we observed that the paths of mobility which catalyze the largest number of observations are those connecting clusters that are adjacent on the topological space.
Resumo:
Les sociétés modernes dépendent de plus en plus sur les systèmes informatiques et ainsi, il y a de plus en plus de pression sur les équipes de développement pour produire des logiciels de bonne qualité. Plusieurs compagnies utilisent des modèles de qualité, des suites de programmes qui analysent et évaluent la qualité d'autres programmes, mais la construction de modèles de qualité est difficile parce qu'il existe plusieurs questions qui n'ont pas été répondues dans la littérature. Nous avons étudié les pratiques de modélisation de la qualité auprès d'une grande entreprise et avons identifié les trois dimensions où une recherche additionnelle est désirable : Le support de la subjectivité de la qualité, les techniques pour faire le suivi de la qualité lors de l'évolution des logiciels, et la composition de la qualité entre différents niveaux d'abstraction. Concernant la subjectivité, nous avons proposé l'utilisation de modèles bayésiens parce qu'ils sont capables de traiter des données ambiguës. Nous avons appliqué nos modèles au problème de la détection des défauts de conception. Dans une étude de deux logiciels libres, nous avons trouvé que notre approche est supérieure aux techniques décrites dans l'état de l'art, qui sont basées sur des règles. Pour supporter l'évolution des logiciels, nous avons considéré que les scores produits par un modèle de qualité sont des signaux qui peuvent être analysés en utilisant des techniques d'exploration de données pour identifier des patrons d'évolution de la qualité. Nous avons étudié comment les défauts de conception apparaissent et disparaissent des logiciels. Un logiciel est typiquement conçu comme une hiérarchie de composants, mais les modèles de qualité ne tiennent pas compte de cette organisation. Dans la dernière partie de la dissertation, nous présentons un modèle de qualité à deux niveaux. Ces modèles ont trois parties: un modèle au niveau du composant, un modèle qui évalue l'importance de chacun des composants, et un autre qui évalue la qualité d'un composé en combinant la qualité de ses composants. L'approche a été testée sur la prédiction de classes à fort changement à partir de la qualité des méthodes. Nous avons trouvé que nos modèles à deux niveaux permettent une meilleure identification des classes à fort changement. Pour terminer, nous avons appliqué nos modèles à deux niveaux pour l'évaluation de la navigabilité des sites web à partir de la qualité des pages. Nos modèles étaient capables de distinguer entre des sites de très bonne qualité et des sites choisis aléatoirement. Au cours de la dissertation, nous présentons non seulement des problèmes théoriques et leurs solutions, mais nous avons également mené des expériences pour démontrer les avantages et les limitations de nos solutions. Nos résultats indiquent qu'on peut espérer améliorer l'état de l'art dans les trois dimensions présentées. En particulier, notre travail sur la composition de la qualité et la modélisation de l'importance est le premier à cibler ce problème. Nous croyons que nos modèles à deux niveaux sont un point de départ intéressant pour des travaux de recherche plus approfondis.
Resumo:
In most psychological tests and questionnaires, a test score is obtained by taking the sum of the item scores. In virtually all cases where the test or questionnaire contains multidimensional forced-choice items, this traditional scoring method is also applied. We argue that the summation of scores obtained with multidimensional forced-choice items produces uninterpretable test scores. Therefore, we propose three alternative scoring methods: a weak and a strict rank preserving scoring method, which both allow an ordinal interpretation of test scores; and a ratio preserving scoring method, which allows a proportional interpretation of test scores. Each proposed scoring method yields an index for each respondent indicating the degree to which the response pattern is inconsistent. Analysis of real data showed that with respect to rank preservation, the weak and strict rank preserving method resulted in lower inconsistency indices than the traditional scoring method; with respect to ratio preservation, the ratio preserving scoring method resulted in lower inconsistency indices than the traditional scoring method
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observed for each individual. A particular case of FDA is when the observed functions are density functions, that are also an example of infinite dimensional compositional data. In this work we compare several methods for dimensionality reduction for this particular type of data: functional principal components analysis (PCA) with or without a previous data transformation and multidimensional scaling (MDS) for diferent inter-densities distances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (households income distributions)
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing one or more parameters in their definition. Methods that can be linked in this way are correspondence analysis, unweighted or weighted logratio analysis (the latter also known as "spectral mapping"), nonsymmetric correspondence analysis, principal component analysis (with and without logarithmic transformation of the data) and multidimensional scaling. In this presentation I will show how several of these methods, which are frequently used in compositional data analysis, may be linked through parametrizations such as power transformations, linear transformations and convex linear combinations. Since the methods of interest here all lead to visual maps of data, a "movie" can be made where where the linking parameter is allowed to vary in small steps: the results are recalculated "frame by frame" and one can see the smooth change from one method to another. Several of these "movies" will be shown, giving a deeper insight into the similarities and differences between these methods