13 resultados para Two-step Cluster Analysis

em Instituto Politécnico do Porto, Portugal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, cluster analysis is applied to a real dataset of biological features of several Portuguese reservoirs. All the statistical analysis is done using R statistical software. Several metrics and methods were explored, as well as the combination of Euclidean metric and the hierarchical Ward method. Although it did not present the best combination in terms of internal and stability validation, it was still a good solution and presented good results in terms of interpretation of the problem at hand.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A methodology based on data mining techniques to support the analysis of zonal prices in real transmission networks is proposed in this paper. The mentioned methodology uses clustering algorithms to group the buses in typical classes that include a set of buses with similar LMP values. Two different clustering algorithms have been used to determine the LMP clusters: the two-step and K-means algorithms. In order to evaluate the quality of the partition as well as the best performance algorithm adequacy measurements indices are used. The paper includes a case study using a Locational Marginal Prices (LMP) data base from the California ISO (CAISO) in order to identify zonal prices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The flow rates of drying and nebulizing gas, heat block and desolvation line temperatures and interface voltage are potential electrospray ionization parameters as they may enhance sensitivity of the mass spectrometer. The conditions that give higher sensitivity of 13 pharmaceuticals were explored. First, Plackett-Burman design was implemented to screen significant factors, and it was concluded that interface voltage and nebulizing gas flow were the only factors that influence the intensity signal for all pharmaceuticals. This fractionated factorial design was projected to set a full 2(2) factorial design with center points. The lack-of-fit test proved to be significant. Then, a central composite face-centered design was conducted. Finally, a stepwise multiple linear regression and subsequently an optimization problem solving were carried out. Two main drug clusters were found concerning the signal intensities of all runs of the augmented factorial design. p-Aminophenol, salicylic acid, and nimesulide constitute one cluster as a result of showing much higher sensitivity than the remaining drugs. The other cluster is more homogeneous with some sub-clusters comprising one pharmaceutical and its respective metabolite. It was observed that instrumental signal increased when both significant factors increased with maximum signal occurring when both codified factors are set at level +1. It was also found that, for most of the pharmaceuticals, interface voltage influences the intensity of the instrument more than the nebulizing gas flowrate. The only exceptions refer to nimesulide where the relative importance of the factors is reversed and still salicylic acid where both factors equally influence the instrumental signal. Graphical Abstract ᅟ.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Seismic data is difficult to analyze and classical mathematical tools reveal strong limitations in exposing hidden relationships between earthquakes. In this paper, we study earthquake phenomena in the perspective of complex systems. Global seismic data, covering the period from 1962 up to 2011 is analyzed. The events, characterized by their magnitude, geographic location and time of occurrence, are divided into groups, either according to the Flinn-Engdahl (F-E) seismic regions of Earth or using a rectangular grid based in latitude and longitude coordinates. Two methods of analysis are considered and compared in this study. In a first method, the distributions of magnitudes are approximated by Gutenberg-Richter (G-R) distributions and the parameters used to reveal the relationships among regions. In the second method, the mutual information is calculated and adopted as a measure of similarity between regions. In both cases, using clustering analysis, visualization maps are generated, providing an intuitive and useful representation of the complex relationships that are present among seismic data. Such relationships might not be perceived on classical geographic maps. Therefore, the generated charts are a valid alternative to other visualization tools, for understanding the global behavior of earthquakes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study aims to optimize the water quality monitoring of a polluted watercourse (Leça River, Portugal) through the principal component analysis (PCA) and cluster analysis (CA). These statistical methodologies were applied to physicochemical, bacteriological and ecotoxicological data (with the marine bacterium Vibrio fischeri and the green alga Chlorella vulgaris) obtained with the analysis of water samples monthly collected at seven monitoring sites and during five campaigns (February, May, June, August, and September 2006). The results of some variables were assigned to water quality classes according to national guidelines. Chemical and bacteriological quality data led to classify Leça River water quality as “bad” or “very bad”. PCA and CA identified monitoring sites with similar pollution pattern, giving to site 1 (located in the upstream stretch of the river) a distinct feature from all other sampling sites downstream. Ecotoxicity results corroborated this classification thus revealing differences in space and time. The present study includes not only physical, chemical and bacteriological but also ecotoxicological parameters, which broadens new perspectives in river water characterization. Moreover, the application of PCA and CA is very useful to optimize water quality monitoring networks, defining the minimum number of sites and their location. Thus, these tools can support appropriate management decisions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While fractional calculus (FC) is as old as integer calculus, its application has been mainly restricted to mathematics. However, many real systems are better described using FC equations than with integer models. FC is a suitable tool for describing systems characterised by their fractal nature, long-term memory and chaotic behaviour. It is a promising methodology for failure analysis and modelling, since the behaviour of a failing system depends on factors that increase the model’s complexity. This paper explores the proficiency of FC in modelling complex behaviour by tuning only a few parameters. This work proposes a novel two-step strategy for diagnosis, first modelling common failure conditions and, second, by comparing these models with real machine signals and using the difference to feed a computational classifier. Our proposal is validated using an electrical motor coupled with a mechanical gear reducer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Refinaria de Matosinhos é um dos complexos industriais da Galp Energia. A sua estação de tratamento de águas residuais industriais (ETARI) – designada internamente por Unidade 7000 – é composta por quatro tratamentos: o pré-tratamento, o tratamento físico-químico, o tratamento biológico e o pós-tratamento. Dada a interligação existente, é fundamental a otimização de cada um dos tratamentos. Este trabalho teve como objetivos a identificação dos problemas e/ou possibilidades de melhoria do pré-tratamento, tratamento físico-químico e pós-tratamento e principalmente a otimização do tratamento biológico da ETARI. No pré-tratamento verificou-se que a separação de óleos e lamas não era eficaz uma vez que se formam emulsões destas duas fases. Como solução, sugeriu-se a adição de agentes desemulsionantes, que se revelou economicamente inviável. Assim, sugeriu-se como alternativa o recurso a técnicas de tratamento da emulsão gerada, tais como a extração com solvente, centrifugação, ultrassons e micro-ondas. No tratamento físico-químico constatou-se que o controlo da unidade de saturação de ar na água era feito com base na análise visual dos operadores, o que pode conduzir a condições de operação afastadas das ótimas para este tratamento. Assim, sugeriu-se a realização de um estudo de otimização desta unidade com vista à determinação da razão ar/sólidos ótima para este efluente. Para além disto, constatou-se, ainda, que os consumos de coagulante aumentaram cerca de -- % no último ano, pelo que foi sugerido o estudo da viabilidade do processo de eletrocoagulação como substituto do sistema de coagulação existente. No pós-tratamento identificou-se o processo de lavagem dos filtros como sendo a etapa com possibilidade de ser otimizada. Através de um estudo preliminar concluiu-se que a lavagem contínua de um filtro por cada turno melhorava o desempenho dos mesmos. Constatou-se, ainda, que a introdução de ar comprimido na água de lavagem promove uma maior remoção de detritos do leito de areia, no entanto esta prática parece influenciar negativamente o desempenho dos filtros. No caso do tratamento biológico, identificaram-se problemas ao nível do tempo de retenção hidráulico do tratamento biológico II, que apresentou elevada variabilidade. Apesar de identificado concluiu-se que este problema era de difícil solução. Verificou-se, também, que o oxigénio dissolvido não era monitorizado, pelo que se sugeriu a instalação de uma sonda de oxigénio dissolvido numa zona de baixa turbulência do tanque de arejamento. Concluiu-se que o oxigénio era distribuído de forma homogénea por todo o tanque de arejamento e tentou-se identificar quais os fatores que influenciariam este parâmetro, no entanto, dada a elevada variabilidade do efluente e das condições de tratamento, tal não foi possível. Constatou-se, também, que o doseamento de fosfato para o tratamento biológico II era pouco eficiente já Otimização dos sistemas biológicos e melhorias nos tratamentos da ETARI da Refinaria de Matosinhos que em -- % dos dias se verificaram níveis baixos de fosfato no licor misto (< - mg/L). Foi, por isso, proposta a alteração do atual sistema de doseamento por gravidade para um sistema de bomba doseadora. Para além disso verificou-se que os consumos deste nutriente aumentaram significativamente no último ano (cerca de --%), situação que se constatou estar relacionada com um aumento da população microbiana para este período. Foi possível relacionar-se o aparecimento frequente de lamas à superfície dos decantadores secundários com incrementos repentinos de condutividade, pelo que se sugeriu o armazenamento do efluente nas bacias de tempestade, nestas situações. Verificou-se que a remoção de azoto era praticamente ineficaz uma vez que a conversão de azoto amoniacal em nitratos foi muito baixa. Assim, sugeriu-se o recurso à técnica de bio-augmentação ou a transformação do sistema de lamas ativadas num sistema bietápico. Por fim, constatou-se que a temperatura do efluente à entrada da ETARI apresenta valores bastante elevados para o tratamento biológico (aproximadamente de --º C) pelo que se sugeriu a instalação de uma sonda de temperatura no tanque de arejamento de modo a controlar de forma mais eficaz a temperatura do licor misto. Ainda no que diz respeito ao tratamento biológico, foi possível desenvolver-se um conjunto de ferramentas que visaram o funcionamento otimizado deste tratamento. Nesse sentido, foram apresentadas várias sugestões de melhoria: a utilização do índice volumétrico de lamas como indicador da qualidade das lamas em alternativa à percentagem de lamas; foi desenvolvido um conjunto de fluxogramas para a orientação dos operadores de exterior na resolução de problemas; foi criada uma “janela de operação” que pretende ser um guia de apoio à operação; foi ainda proposta a monitorização frequente da idade das lamas e da razão alimento/microrganismo.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A definition of medium voltage (MV) load diagrams was made, based on the data base knowledge discovery process. Clustering techniques were used as support for the agents of the electric power retail markets to obtain specific knowledge of their customers’ consumption habits. Each customer class resulting from the clustering operation is represented by its load diagram. The Two-step clustering algorithm and the WEACS approach based on evidence accumulation (EAC) were applied to an electricity consumption data from a utility client’s database in order to form the customer’s classes and to find a set of representative consumption patterns. The WEACS approach is a clustering ensemble combination approach that uses subsampling and that weights differently the partitions in the co-association matrix. As a complementary step to the WEACS approach, all the final data partitions produced by the different variations of the method are combined and the Ward Link algorithm is used to obtain the final data partition. Experiment results showed that WEACS approach led to better accuracy than many other clustering approaches. In this paper the WEACS approach separates better the customer’s population than Two-step clustering algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the electricity market liberalization, the distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity consumers. A fair insight on the consumers’ behavior will permit the definition of specific contract aspects based on the different consumption patterns. In order to form the different consumers’ classes, and find a set of representative consumption patterns we use electricity consumption data from a utility client’s database and two approaches: Two-step clustering algorithm and the WEACS approach based on evidence accumulation (EAC) for combining partitions in a clustering ensemble. While EAC uses a voting mechanism to produce a co-association matrix based on the pairwise associations obtained from N partitions and where each partition has equal weight in the combination process, the WEACS approach uses subsampling and weights differently the partitions. As a complementary step to the WEACS approach, we combine the partitions obtained in the WEACS approach with the ALL clustering ensemble construction method and we use the Ward Link algorithm to obtain the final data partition. The characterization of the obtained consumers’ clusters was performed using the C5.0 classification algorithm. Experiment results showed that the WEACS approach leads to better results than many other clustering approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the electricity market liberalization, distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity customers. In this environment all consumers are free to choose their electricity supplier. A fair insight on the customer´s behaviour will permit the definition of specific contract aspects based on the different consumption patterns. In this paper Data Mining (DM) techniques are applied to electricity consumption data from a utility client’s database. To form the different customer´s classes, and find a set of representative consumption patterns, we have used the Two-Step algorithm which is a hierarchical clustering algorithm. Each consumer class will be represented by its load profile resulting from the clustering operation. Next, to characterize each consumer class a classification model will be constructed with the C5.0 classification algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent decades, all over the world, competition in the electric power sector has deeply changed the way this sector’s agents play their roles. In most countries, electric process deregulation was conducted in stages, beginning with the clients of higher voltage levels and with larger electricity consumption, and later extended to all electrical consumers. The sector liberalization and the operation of competitive electricity markets were expected to lower prices and improve quality of service, leading to greater consumer satisfaction. Transmission and distribution remain noncompetitive business areas, due to the large infrastructure investments required. However, the industry has yet to clearly establish the best business model for transmission in a competitive environment. After generation, the electricity needs to be delivered to the electrical system nodes where demand requires it, taking into consideration transmission constraints and electrical losses. If the amount of power flowing through a certain line is close to or surpasses the safety limits, then cheap but distant generation might have to be replaced by more expensive closer generation to reduce the exceeded power flows. In a congested area, the optimal price of electricity rises to the marginal cost of the local generation or to the level needed to ration demand to the amount of available electricity. Even without congestion, some power will be lost in the transmission system through heat dissipation, so prices reflect that it is more expensive to supply electricity at the far end of a heavily loaded line than close to an electric power generation. Locational marginal pricing (LMP), resulting from bidding competition, represents electrical and economical values at nodes or in areas that may provide economical indicator signals to the market agents. This article proposes a data-mining-based methodology that helps characterize zonal prices in real power transmission networks. To test our methodology, we used an LMP database from the California Independent System Operator for 2009 to identify economical zones. (CAISO is a nonprofit public benefit corporation charged with operating the majority of California’s high-voltage wholesale power grid.) To group the buses into typical classes that represent a set of buses with the approximate LMP value, we used two-step and k-means clustering algorithms. By analyzing the various LMP components, our goal was to extract knowledge to support the ISO in investment and network-expansion planning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mestrado em Engenharia Electrotécnica – Sistemas Eléctricos de Energia