14 resultados para Automatic Analysis of Multivariate Categorical Data Sets
em Instituto Politécnico do Porto, Portugal
Resumo:
Complex industrial plants exhibit multiple interactions among smaller parts and with human operators. Failure in one part can propagate across subsystem boundaries causing a serious disaster. This paper analyzes the industrial accident data series in the perspective of dynamical systems. First, we process real world data and show that the statistics of the number of fatalities reveal features that are well described by power law (PL) distributions. For early years, the data reveal double PL behavior, while, for more recent time periods, a single PL fits better into the experimental data. Second, we analyze the entropy of the data series statistics over time. Third, we use the Kullback–Leibler divergence to compare the empirical data and multidimensional scaling (MDS) techniques for data analysis and visualization. Entropy-based analysis is adopted to assess complexity, having the advantage of yielding a single parameter to express relationships between the data. The classical and the generalized (fractional) entropy and Kullback–Leibler divergence are used. The generalized measures allow a clear identification of patterns embedded in the data.
Resumo:
Proceedings of the 12th Conference on 'Dynamical Systems -Theory and Applications'
Resumo:
This study aims to optimize the water quality monitoring of a polluted watercourse (Leça River, Portugal) through the principal component analysis (PCA) and cluster analysis (CA). These statistical methodologies were applied to physicochemical, bacteriological and ecotoxicological data (with the marine bacterium Vibrio fischeri and the green alga Chlorella vulgaris) obtained with the analysis of water samples monthly collected at seven monitoring sites and during five campaigns (February, May, June, August, and September 2006). The results of some variables were assigned to water quality classes according to national guidelines. Chemical and bacteriological quality data led to classify Leça River water quality as “bad” or “very bad”. PCA and CA identified monitoring sites with similar pollution pattern, giving to site 1 (located in the upstream stretch of the river) a distinct feature from all other sampling sites downstream. Ecotoxicity results corroborated this classification thus revealing differences in space and time. The present study includes not only physical, chemical and bacteriological but also ecotoxicological parameters, which broadens new perspectives in river water characterization. Moreover, the application of PCA and CA is very useful to optimize water quality monitoring networks, defining the minimum number of sites and their location. Thus, these tools can support appropriate management decisions.
Resumo:
Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.
Resumo:
This paper analyzes musical opus from the point of view of two mathematical tools, namely the entropy and the multidimensional scaling (MDS). The Fourier analysis reveals a fractional dynamics, but the time rhythm variations are diluted along the spectrum. The combination of time-window entropy and MDS copes with the time characteristics and is well suited to treat a large volume of data. The experiments focus on a large number of compositions classified along three sets of musical styles, namely “Classical”, “Jazz”, and “Pop & Rock” compositions. Without lack of generality, the present study describes the application of the tools and the sets of musical compositions in a methodology leading to clear conclusions, but extensions to other possibilities are straightforward. The results reveal significant differences in the musical styles, demonstrating the feasibility of the proposed strategy and motivating further developments toward a dynamical analysis of musical compositions.
Resumo:
Catastrophic events, such as wars and terrorist attacks, tornadoes and hurricanes, earthquakes, tsunamis, floods and landslides, are always accompanied by a large number of casualties. The size distribution of these casualties has separately been shown to follow approximate power law (PL) distributions. In this paper, we analyze the statistical distributions of the number of victims of catastrophic phenomena, in particular, terrorism, and find double PL behavior. This means that the data sets are better approximated by two PLs instead of a single one. We plot the PL parameters, corresponding to several events, and observe an interesting pattern in the charts, where the lines that connect each pair of points defining the double PLs are almost parallel to each other. A complementary data analysis is performed by means of the computation of the entropy. The results reveal relationships hidden in the data that may trigger a future comprehensive explanation of this type of phenomena.
Resumo:
This paper reports on the analysis of tidal breathing patterns measured during noninvasive forced oscillation lung function tests in six individual groups. The three adult groups were healthy, with prediagnosed chronic obstructive pulmonary disease, and with prediagnosed kyphoscoliosis, respectively. The three children groups were healthy, with prediagnosed asthma, and with prediagnosed cystic fibrosis, respectively. The analysis is applied to the pressure–volume curves and the pseudophaseplane loop by means of the box-counting method, which gives a measure of the area within each loop. The objective was to verify if there exists a link between the area of the loops, power-law patterns, and alterations in the respiratory structure with disease. We obtained statistically significant variations between the data sets corresponding to the six groups of patients, showing also the existence of power-law patterns. Our findings support the idea that the respiratory system changes with disease in terms of airway geometry and tissue parameters, leading, in turn, to variations in the fractal dimension of the respiratory tree and its dynamics.
Resumo:
The goal of this study is the analysis of the dynamical properties of financial data series from worldwide stock market indexes during the period 2000–2009. We analyze, under a regional criterium, ten main indexes at a daily time horizon. The methods and algorithms that have been explored for the description of dynamical phenomena become an effective background in the analysis of economical data. We start by applying the classical concepts of signal analysis, fractional Fourier transform, and methods of fractional calculus. In a second phase we adopt the multidimensional scaling approach. Stock market indexes are examples of complex interacting systems for which a huge amount of data exists. Therefore, these indexes, viewed from a different perspectives, lead to new classification patterns.
Resumo:
The industrial activity is inevitably associated with a certain degradation of the environmental quality, because is not possible to guarantee that a manufacturing process can be totally innocuous. The eco-efficiency concept is globally accepted as a philosophy of entreprise management, that encourages the companies to become more competitive, innovative and environmentally responsible by promoting the link between its companies objectives for excellence and its objectives of environmental excellence issues. This link imposes the creation of an organizational methodology where the performance of the company is concordant with the sustainable development. The main propose of this project is to apply the concept of eco-efficiency to the particular case of the metallurgical and metal workshop industries through the development of the particular indicators needed and to produce a manual of procedures for implementation of the accurate solution.
Resumo:
This paper reports on the analysis of tidal breathing patterns measured during noninvasive forced oscillation lung function tests in six individual groups. The three adult groups were healthy, with prediagnosed chronic obstructive pulmonary disease, and with prediagnosed kyphoscoliosis, respectively. The three children groups were healthy, with prediagnosed asthma, and with prediagnosed cystic fibrosis, respectively. The analysis is applied to the pressure-volume curves and the pseudophase-plane loop by means of the box-counting method, which gives a measure of the area within each loop. The objective was to verify if there exists a link between the area of the loops, power-law patterns, and alterations in the respiratory structure with disease. We obtained statistically significant variations between the data sets corresponding to the six groups of patients, showing also the existence of power-law patterns. Our findings support the idea that the respiratory system changes with disease in terms of airway geometry and tissue parameters, leading, in turn, to variations in the fractal dimension of the respiratory tree and its dynamics.
Resumo:
Electric power networks, namely distribution networks, have been suffering several changes during the last years due to changes in the power systems operation, towards the implementation of smart grids. Several approaches to the operation of the resources have been introduced, as the case of demand response, making use of the new capabilities of the smart grids. In the initial levels of the smart grids implementation reduced amounts of data are generated, namely consumption data. The methodology proposed in the present paper makes use of demand response consumers’ performance evaluation methods to determine the expected consumption for a given consumer. Then, potential commercial losses are identified using monthly historic consumption data. Real consumption data is used in the case study to demonstrate the application of the proposed method.
Resumo:
This paper studies the statistical distributions of worldwide earthquakes from year 1963 up to year 2012. A Cartesian grid, dividing Earth into geographic regions, is considered. Entropy and the Jensen–Shannon divergence are used to analyze and compare real-world data. Hierarchical clustering and multi-dimensional scaling techniques are adopted for data visualization. Entropy-based indices have the advantage of leading to a single parameter expressing the relationships between the seismic data. Classical and generalized (fractional) entropy and Jensen–Shannon divergence are tested. The generalized measures lead to a clear identification of patterns embedded in the data and contribute to better understand earthquake distributions.
Resumo:
New arguments proving that successive (repeated) measurements have a memory and actually remember each other are presented. The recognition of this peculiarity can change essentially the existing paradigm associated with conventional observation in behavior of different complex systems and lead towards the application of an intermediate model (IM). This IM can provide a very accurate fit of the measured data in terms of the Prony's decomposition. This decomposition, in turn, contains a small set of the fitting parameters relatively to the number of initial data points and allows comparing the measured data in cases where the “best fit” model based on some specific physical principles is absent. As an example, we consider two X-ray diffractometers (defined in paper as A- (“cheap”) and B- (“expensive”) that are used after their proper calibration for the measuring of the same substance (corundum a-Al2O3). The amplitude-frequency response (AFR) obtained in the frame of the Prony's decomposition can be used for comparison of the spectra recorded from (A) and (B) - X-ray diffractometers (XRDs) for calibration and other practical purposes. We prove also that the Fourier decomposition can be adapted to “ideal” experiment without memory while the Prony's decomposition corresponds to real measurement and can be fitted in the frame of the IM in this case. New statistical parameters describing the properties of experimental equipment (irrespective to their internal “filling”) are found. The suggested approach is rather general and can be used for calibration and comparison of different complex dynamical systems in practical purposes.
Resumo:
Atmospheric temperatures characterize Earth as a slow dynamics spatiotemporal system, revealing long-memory and complex behavior. Temperature time series of 54 worldwide geographic locations are considered as representative of the Earth weather dynamics. These data are then interpreted as the time evolution of a set of state space variables describing a complex system. The data are analyzed by means of multidimensional scaling (MDS), and the fractional state space portrait (fSSP). A centennial perspective covering the period from 1910 to 2012 allows MDS to identify similarities among different Earth’s locations. The multivariate mutual information is proposed to determine the “optimal” order of the time derivative for the fSSP representation. The fSSP emerges as a valuable alternative for visualizing system dynamics.