15 resultados para Genetic clustering analysis
em Instituto Politécnico do Porto, Portugal
Resumo:
Seismic data is difficult to analyze and classical mathematical tools reveal strong limitations in exposing hidden relationships between earthquakes. In this paper, we study earthquake phenomena in the perspective of complex systems. Global seismic data, covering the period from 1962 up to 2011 is analyzed. The events, characterized by their magnitude, geographic location and time of occurrence, are divided into groups, either according to the Flinn-Engdahl (F-E) seismic regions of Earth or using a rectangular grid based in latitude and longitude coordinates. Two methods of analysis are considered and compared in this study. In a first method, the distributions of magnitudes are approximated by Gutenberg-Richter (G-R) distributions and the parameters used to reveal the relationships among regions. In the second method, the mutual information is calculated and adopted as a measure of similarity between regions. In both cases, using clustering analysis, visualization maps are generated, providing an intuitive and useful representation of the complex relationships that are present among seismic data. Such relationships might not be perceived on classical geographic maps. Therefore, the generated charts are a valid alternative to other visualization tools, for understanding the global behavior of earthquakes.
Resumo:
The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.
Resumo:
A methodology based on data mining techniques to support the analysis of zonal prices in real transmission networks is proposed in this paper. The mentioned methodology uses clustering algorithms to group the buses in typical classes that include a set of buses with similar LMP values. Two different clustering algorithms have been used to determine the LMP clusters: the two-step and K-means algorithms. In order to evaluate the quality of the partition as well as the best performance algorithm adequacy measurements indices are used. The paper includes a case study using a Locational Marginal Prices (LMP) data base from the California ISO (CAISO) in order to identify zonal prices.
Resumo:
This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method’s artifacts.
Resumo:
This paper studies musical opus from the point of view of three mathematical tools: entropy, pseudo phase plane (PPP), and multidimensional scaling (MDS). The experiments analyze ten sets of different musical styles. First, for each musical composition, the PPP is produced using the time series lags captured by the average mutual information. Second, to unravel hidden relationships between the musical styles the MDS technique is used. The MDS is calculated based on two alternative metrics obtained from the PPP, namely, the average mutual information and the fractal dimension. The results reveal significant differences in the musical styles, demonstrating the feasibility of the proposed strategy and motivating further developments towards a dynamical analysis of musical sounds.
Resumo:
The paper formulates a genetic algorithm that evolves two types of objects in a plane. The fitness function promotes a relationship between the objects that is optimal when some kind of interface between them occurs. Furthermore, the algorithm adopts an hexagonal tessellation of the two-dimensional space for promoting an efficient method of the neighbour modelling. The genetic algorithm produces special patterns with resemblances to those revealed in percolation phenomena or in the symbiosis found in lichens. Besides the analysis of the spacial layout, a modelling of the time evolution is performed by adopting a distance measure and the modelling in the Fourier domain in the perspective of fractional calculus. The results reveal a consistent, and easy to interpret, set of model parameters for distinct operating conditions.
Resumo:
This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.
Resumo:
This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.
Resumo:
In this paper we analyze the behavior of tornado time-series in the U.S. from the perspective of dynamical systems. A tornado is a violently rotating column of air extending from a cumulonimbus cloud down to the ground. Such phenomena reveal features that are well described by power law functions and unveil characteristics found in systems with long range memory effects. Tornado time series are viewed as the output of a complex system and are interpreted as a manifestation of its dynamics. Tornadoes are modeled as sequences of Dirac impulses with amplitude proportional to the events size. First, a collection of time series involving 64 years is analyzed in the frequency domain by means of the Fourier transform. The amplitude spectra are approximated by power law functions and their parameters are read as an underlying signature of the system dynamics. Second, it is adopted the concept of circular time and the collective behavior of tornadoes analyzed. Clustering techniques are then adopted to identify and visualize the emerging patterns.
Resumo:
This paper studies forest fires from the perspective of dynamical systems. Burnt area, precipitation and atmospheric temperatures are interpreted as state variables of a complex system and the correlations between them are investigated by means of different mathematical tools. First, we use mutual information to reveal potential relationships in the data. Second, we adopt the state space portrait to characterize the system’s behavior. Third, we compare the annual state space curves and we apply clustering and visualization tools to unveil long-range patterns. We use forest fire data for Portugal, covering the years 1980–2003. The territory is divided into two regions (North and South), characterized by different climates and vegetation. The adopted methodology represents a new viewpoint in the context of forest fires, shedding light on a complex phenomenon that needs to be better understood in order to mitigate its devastating consequences, at both economical and environmental levels.
Resumo:
This paper studies the statistical distributions of worldwide earthquakes from year 1963 up to year 2012. A Cartesian grid, dividing Earth into geographic regions, is considered. Entropy and the Jensen–Shannon divergence are used to analyze and compare real-world data. Hierarchical clustering and multi-dimensional scaling techniques are adopted for data visualization. Entropy-based indices have the advantage of leading to a single parameter expressing the relationships between the seismic data. Classical and generalized (fractional) entropy and Jensen–Shannon divergence are tested. The generalized measures lead to a clear identification of patterns embedded in the data and contribute to better understand earthquake distributions.
Resumo:
Background Hippocampal neurogenesis has been suggested as a downstream event of antidepressants (AD) mechanism of action and might explain the lag time between AD administration and the therapeutic effect. Despite the widespread use of AD in the context of Major Depressive Disorder (MDD) there are no reliable biomarkers of treatment response phenotypes, and a significant proportion of patients display Treatment Resistant Depression (TRD). Fas/FasL system is one of the best-known death-receptor mediated cell signaling systems and is recognized to regulate cell proliferation and tumor cell growth. Recently this pathway has been described to be involved in neurogenesis and neuroplasticity. Methods Since FAS -670A>G and FASL -844T>C functional polymorphisms never been evaluated in the context of depression and antidepressant therapy, we genotyped FAS -670A>G and FASL -844T>C in a subset of 80 MDD patients to evaluate their role in antidepressant treatment response phenotypes. Results We found that the presence of FAS -670G allele was associated with antidepressant bad prognosis (relapse or TRD: OR=6.200; 95% CI: [1.875–20.499]; p=0.001), and we observed that patients carrying this allele have a higher risk to develop TRD (OR=10.895; 95% CI: [1.362–87.135]; p=0.008).Moreover, multivariate analysis adjusted to potentials confounders showed that patients carrying G allele have higher risk of early relapse (HR=3.827; 95% CI: [1.072–13.659]; p=0.039). FAS mRNA levels were down-regulated among G carriers, whose genotypes were more common in TRD patients. No association was found between FASL-844T>C genetic polymorphism and any treatment phenotypes. Limitations Small sample size. Patients used antidepressants with different mechanisms of action. Conclusion To the best of our knowledge this is the first study to evaluate the role of FAS functional polymorphism in the outcome of antidepressant therapy. This preliminary report associates FAS -670A>G genetic polymorphism with Treatment Resistant Depression and with time to relapse. The current results may possibly be given to the recent recognized role of Fas in neurogenesis and/or neuroplasticity.
Resumo:
In this paper we study several natural and man-made complex phenomena in the perspective of dynamical systems. For each class of phenomena, the system outputs are time-series records obtained in identical conditions. The time-series are viewed as manifestations of the system behavior and are processed for analyzing the system dynamics. First, we use the Fourier transform to process the data and we approximate the amplitude spectra by means of power law functions. We interpret the power law parameters as a phenomenological signature of the system dynamics. Second, we adopt the techniques of non-hierarchical clustering and multidimensional scaling to visualize hidden relationships between the complex phenomena. Third, we propose a vector field based analogy to interpret the patterns unveiled by the PL parameters.
Resumo:
The last 40 years of the world economy are analyzed by means of computer visualization methods. Multidimensional scaling and the hierarchical clustering tree techniques are used. The current Western downturn in favor of Asian partners may still be reversed in the coming decades.
Resumo:
Atmospheric temperatures characterize Earth as a slow dynamics spatiotemporal system, revealing long-memory and complex behavior. Temperature time series of 54 worldwide geographic locations are considered as representative of the Earth weather dynamics. These data are then interpreted as the time evolution of a set of state space variables describing a complex system. The data are analyzed by means of multidimensional scaling (MDS), and the fractional state space portrait (fSSP). A centennial perspective covering the period from 1910 to 2012 allows MDS to identify similarities among different Earth’s locations. The multivariate mutual information is proposed to determine the “optimal” order of the time derivative for the fSSP representation. The fSSP emerges as a valuable alternative for visualizing system dynamics.