971 resultados para C-MEANS
Resumo:
In this paper a methodology for integrated multivariate monitoring and control of biological wastewater treatment plants during extreme events is presented. To monitor the process, on-line dynamic principal component analysis (PCA) is performed on the process data to extract the principal components that represent the underlying mechanisms of the process. Fuzzy c-means (FCM) clustering is used to classify the operational state. Performing clustering on scores from PCA solves computational problems as well as increases robustness due to noise attenuation. The class-membership information from FCM is used to derive adequate control set points for the local control loops. The methodology is illustrated by a simulation study of a biological wastewater treatment plant, on which disturbances of various types are imposed. The results show that the methodology can be used to determine and co-ordinate control actions in order to shift the control objective and improve the effluent quality.
Resumo:
Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents
Resumo:
Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets
Resumo:
Salamanca has been considered among the most polluted cities in Mexico. The vehicular park, the industry and the emissions produced by agriculture, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Particulate Matter less than 10 μg/m3 in diameter (PM10). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables (wind speed, wind direction, temperature and relative humidity) and air pollutant concentrations of PM10. Before the prediction, Fuzzy c-Means clustering algorithm have been implemented in order to find relationship among pollutant and meteorological variables. These relationship help us to get additional information that will be used for predicting. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of PM10 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results shown that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours
Resumo:
Salamanca, situated in center of Mexico is among the cities which suffer most from the air pollution in Mexico. The vehicular park and the industry, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Sulphur Dioxide (SO2). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables and air pollutant concentrations of SO2. Before the prediction, Fuzzy c-Means and K-means clustering algorithms have been implemented in order to find relationship among pollutant and meteorological variables. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of SO2 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results showed that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours.
Resumo:
Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance. For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes using fuzzy c-means to yield a point data-set representing local memberships in characteristic plant communities. The fuzzy clusters matched vegetation communities noted in the field, which tended to grade into one another, rather than occupying discrete patches. The fuzzy set representation of the community exploited the strengths of detrended correspondence analysis while retaining richer information than a TWINSPAN classification of the same data. Thus, in the absence of phytosociological benchmarks, meaningful and manageable habitat information could be derived from complex, multivariate species data. We also analysed the influence of the reliability of different surveyors' field observations by multiple sampling at a selected sample location. We show that the impact of surveyor error was more severe in the Boolean than the fuzzy classification. © 2007 Springer.
Resumo:
Segmentation is an important step in many medical imaging applications and a variety of image segmentation techniques exist. One group of segmentation algorithms is based on clustering concepts. In this article we investigate several fuzzy c-means based clustering algorithms and their application to medical image segmentation. In particular we evaluate the conventional hard c-means (HCM) and fuzzy c-means (FCM) approaches as well as three computationally more efficient derivatives of fuzzy c-means: fast FCM with random sampling, fast generalised FCM, and a new anisotropic mean shift based FCM. © 2010 by IJTS, ISDER.
Resumo:
Virtually every sector of business and industry that uses computing, including financial analysis, search engines, and electronic commerce, incorporate Big Data analysis into their business model. Sophisticated clustering algorithms are popular for deducing the nature of data by assigning labels to unlabeled data. We address two main challenges in Big Data. First, by definition, the volume of Big Data is too large to be loaded into a computer’s memory (this volume changes based on the computer used or available, but there is always a data set that is too large for any computer). Second, in real-time applications, the velocity of new incoming data prevents historical data from being stored and future data from being accessed. Therefore, we propose our Streaming Kernel Fuzzy c-Means (stKFCM) algorithm, which reduces both computational complexity and space complexity significantly. The proposed stKFCM only requires O(n2) memory where n is the (predetermined) size of a data subset (or data chunk) at each time step, which makes this algorithm truly scalable (as n can be chosen based on the available memory). Furthermore, only 2n2 elements of the full N × N (where N >> n) kernel matrix need to be calculated at each time-step, thus reducing both the computation time in producing the kernel elements and also the complexity of the FCM algorithm. Empirical results show that stKFCM, even with relatively very small n, can provide clustering performance as accurately as kernel fuzzy c-means run on the entire data set while achieving a significant speedup.
Resumo:
PURPOSE: To objectively characterize different heart tissues from functional and viability images provided by composite-strain-encoding (C-SENC) MRI. MATERIALS AND METHODS: C-SENC is a new MRI technique for simultaneously acquiring cardiac functional and viability images. In this work, an unsupervised multi-stage fuzzy clustering method is proposed to identify different heart tissues in the C-SENC images. The method is based on sequential application of the fuzzy c-means (FCM) and iterative self-organizing data (ISODATA) clustering algorithms. The proposed method is tested on simulated heart images and on images from nine patients with and without myocardial infarction (MI). The resulting clustered images are compared with MRI delayed-enhancement (DE) viability images for determining MI. Also, Bland-Altman analysis is conducted between the two methods. RESULTS: Normal myocardium, infarcted myocardium, and blood are correctly identified using the proposed method. The clustered images correctly identified 90 +/- 4% of the pixels defined as infarct in the DE images. In addition, 89 +/- 5% of the pixels defined as infarct in the clustered images were also defined as infarct in DE images. The Bland-Altman results show no bias between the two methods in identifying MI. CONCLUSION: The proposed technique allows for objectively identifying divergent heart tissues, which would be potentially important for clinical decision-making in patients with MI.
Resumo:
Large parity-violating longitudinal single-spin asymmetries A(L)(e+) = 0.86(-0.14)(+0.30) and Ae(L)(e-) = 0.88(-0.71)(+0.12) are observed for inclusive high transverse momentum electrons and positrons in polarized p + p collisions at a center-of-mass energy of root s = 500 GeV with the PHENIX detector at RHIC. These e(+/-) come mainly from the decay of W(+/-) and Z(0) bosons, and their asymmetries directly demonstrate parity violation in the couplings of the W(+/-) to the light quarks. The observed electron and positron yields were used to estimate W(+/-) boson production cross sections for the e(+/-) channels of sigma(pp -> W(+)X) X BR(W(+) -> e(+) nu(e)) = 144.1 +/- 21.2(stat)(-10.3)(+3.4)(syst) +/- 21.6(norm) pb, and sigma(pp -> W(-)X) X BR(W(-) -> e(-) (nu) over bar (e)) = 3.17 +/- 12.1(stat)(-8.2)(+10.1)(syst) +/- 4.8(norm) pb.
Resumo:
This paper presents the design and implementation of an embedded soft sensor, i. e., a generic and autonomous hardware module, which can be applied to many complex plants, wherein a certain variable cannot be directly measured. It is implemented based on a fuzzy identification algorithm called ""Limited Rules"", employed to model continuous nonlinear processes. The fuzzy model has a Takagi-Sugeno-Kang structure and the premise parameters are defined based on the Fuzzy C-Means (FCM) clustering algorithm. The firmware contains the soft sensor and it runs online, estimating the target variable from other available variables. Tests have been performed using a simulated pH neutralization plant. The results of the embedded soft sensor have been considered satisfactory. A complete embedded inferential control system is also presented, including a soft sensor and a PID controller. (c) 2007, ISA. Published by Elsevier Ltd. All rights reserved.
Resumo:
In this paper an approach to extreme event control in wastewater treatment plant operation by use of automatic supervisory control is discussed. The framework presented is based on the fact that different operational conditions manifest themselves as clusters in a multivariate measurement space. These clusters are identified and linked to specific and corresponding events by use of principal component analysis and fuzzy c-means clustering. A reduced system model is assigned to each type of extreme event and used to calculate appropriate local controller set points. In earlier work we have shown that this approach is applicable to wastewater treatment control using look-up tables to determine current set points. In this work we focus on the automatic determination of appropriate set points by use of steady state and dynamic predictions. The performance of a relatively simple steady-state supervisory controller is compared with that of a model predictive supervisory controller. Also, a look-up table approach is included in the comparison, as it provides a simple and robust alternative to the steady-state and model predictive controllers, The methodology is illustrated in a simulation study.
Resumo:
Este trabalho teve como objetivo utilizar a lógica fuzzy para geração de zonas de manejo, na área agrária e ambiental. Uma das aplicações consistiu da utilização do método fuzzy C-means, para geração de zonas de manejo para a cultura do mamoeiro, em um plantio comercial localizado em São Mateus-ES, com base em determinações realizadas através de amostragens e análises químicas do solo, considerando os atributos: P, K, Ca, Mg, e Saturação por bases (V%). Aplicou-se também a lógica fuzzy para desenvolver e executar um procedimento para dar suporte ao processo de tomada de decisões, envolvendo análise multicritério, gerando mapas de adequabilidade ao uso público e a conservação no Parque Estadual da Cachoeira da Fumaça, no município de Alegre-ES, considerando como fatores a localização da cachoeira, o uso do solo, os recursos hídricos, as trilhas, os locais de acessos, a infraestrutura, a declividade da área, e utilizando a abordagem de Sistema de Informações Geográficas para análise e combinação da base de dados. A partir das zonas de manejo geradas, foi possível explicar a variabilidade espacial dos atributos do solo na área de estudo da cultura do mamoeiro, e observa-se que as similaridades entre as zonas geradas, a partir de diferentes atributos, mostrou variação, mas observa-se uma influência nos dados, principalmente pelos atributos P e V. A partir do zoneamento da Unidade de Conservação foi possível selecionar áreas mais aptas ao ecoturismo, sendo encontradas próximas da cachoeira, trilhas em zonas de reflorestamento e de Mata Atlântica. Quanto às áreas propensas a medidas de conservação localizam-se próximas à cachoeira e às estruturas do parque, devido à maior pressão antrópica exercida nesses locais. Outras áreas que se destacaram, foram as áreas de pastagem, por estarem em estágio de regeneração natural. Os resultados indicam áreas de mesmo potencial de produção do mamoeiro, ou quando aplicado à área ambiental, áreas que devem receber maior cuidado para utilização por ecoturismo e para preservação e servem de base para a tomada de decisões, visando melhor aproveitamento da área.