937 resultados para Agrupamento de dados. Fuzzy C-Means. Inicialização dos centros de grupos. Índices de validação


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper a methodology for integrated multivariate monitoring and control of biological wastewater treatment plants during extreme events is presented. To monitor the process, on-line dynamic principal component analysis (PCA) is performed on the process data to extract the principal components that represent the underlying mechanisms of the process. Fuzzy c-means (FCM) clustering is used to classify the operational state. Performing clustering on scores from PCA solves computational problems as well as increases robustness due to noise attenuation. The class-membership information from FCM is used to derive adequate control set points for the local control loops. The methodology is illustrated by a simulation study of a biological wastewater treatment plant, on which disturbances of various types are imposed. The results show that the methodology can be used to determine and co-ordinate control actions in order to shift the control objective and improve the effluent quality.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Salamanca has been considered among the most polluted cities in Mexico. The vehicular park, the industry and the emissions produced by agriculture, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Particulate Matter less than 10 μg/m3 in diameter (PM10). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables (wind speed, wind direction, temperature and relative humidity) and air pollutant concentrations of PM10. Before the prediction, Fuzzy c-Means clustering algorithm have been implemented in order to find relationship among pollutant and meteorological variables. These relationship help us to get additional information that will be used for predicting. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of PM10 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results shown that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Salamanca, situated in center of Mexico is among the cities which suffer most from the air pollution in Mexico. The vehicular park and the industry, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Sulphur Dioxide (SO2). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables and air pollutant concentrations of SO2. Before the prediction, Fuzzy c-Means and K-means clustering algorithms have been implemented in order to find relationship among pollutant and meteorological variables. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of SO2 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results showed that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance. For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes using fuzzy c-means to yield a point data-set representing local memberships in characteristic plant communities. The fuzzy clusters matched vegetation communities noted in the field, which tended to grade into one another, rather than occupying discrete patches. The fuzzy set representation of the community exploited the strengths of detrended correspondence analysis while retaining richer information than a TWINSPAN classification of the same data. Thus, in the absence of phytosociological benchmarks, meaningful and manageable habitat information could be derived from complex, multivariate species data. We also analysed the influence of the reliability of different surveyors' field observations by multiple sampling at a selected sample location. We show that the impact of surveyor error was more severe in the Boolean than the fuzzy classification. © 2007 Springer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Segmentation is an important step in many medical imaging applications and a variety of image segmentation techniques exist. One group of segmentation algorithms is based on clustering concepts. In this article we investigate several fuzzy c-means based clustering algorithms and their application to medical image segmentation. In particular we evaluate the conventional hard c-means (HCM) and fuzzy c-means (FCM) approaches as well as three computationally more efficient derivatives of fuzzy c-means: fast FCM with random sampling, fast generalised FCM, and a new anisotropic mean shift based FCM. © 2010 by IJTS, ISDER.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Virtually every sector of business and industry that uses computing, including financial analysis, search engines, and electronic commerce, incorporate Big Data analysis into their business model. Sophisticated clustering algorithms are popular for deducing the nature of data by assigning labels to unlabeled data. We address two main challenges in Big Data. First, by definition, the volume of Big Data is too large to be loaded into a computer’s memory (this volume changes based on the computer used or available, but there is always a data set that is too large for any computer). Second, in real-time applications, the velocity of new incoming data prevents historical data from being stored and future data from being accessed. Therefore, we propose our Streaming Kernel Fuzzy c-Means (stKFCM) algorithm, which reduces both computational complexity and space complexity significantly. The proposed stKFCM only requires O(n2) memory where n is the (predetermined) size of a data subset (or data chunk) at each time step, which makes this algorithm truly scalable (as n can be chosen based on the available memory). Furthermore, only 2n2 elements of the full N × N (where N >> n) kernel matrix need to be calculated at each time-step, thus reducing both the computation time in producing the kernel elements and also the complexity of the FCM algorithm. Empirical results show that stKFCM, even with relatively very small n, can provide clustering performance as accurately as kernel fuzzy c-means run on the entire data set while achieving a significant speedup.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Symbolic Data Analysis (SDA) main aims to provide tools for reducing large databases to extract knowledge and provide techniques to describe the unit of such data in complex units, as such, interval or histogram. The objective of this work is to extend classical clustering methods for symbolic interval data based on interval-based distance. The main advantage of using an interval-based distance for interval-based data lies on the fact that it preserves the underlying imprecision on intervals which is usually lost when real-valued distances are applied. This work includes an approach allow existing indices to be adapted to interval context. The proposed methods with interval-based distances are compared with distances punctual existing literature through experiments with simulated data and real data interval

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabalho teve como objetivo utilizar a lógica fuzzy para geração de zonas de manejo, na área agrária e ambiental. Uma das aplicações consistiu da utilização do método fuzzy C-means, para geração de zonas de manejo para a cultura do mamoeiro, em um plantio comercial localizado em São Mateus-ES, com base em determinações realizadas através de amostragens e análises químicas do solo, considerando os atributos: P, K, Ca, Mg, e Saturação por bases (V%). Aplicou-se também a lógica fuzzy para desenvolver e executar um procedimento para dar suporte ao processo de tomada de decisões, envolvendo análise multicritério, gerando mapas de adequabilidade ao uso público e a conservação no Parque Estadual da Cachoeira da Fumaça, no município de Alegre-ES, considerando como fatores a localização da cachoeira, o uso do solo, os recursos hídricos, as trilhas, os locais de acessos, a infraestrutura, a declividade da área, e utilizando a abordagem de Sistema de Informações Geográficas para análise e combinação da base de dados. A partir das zonas de manejo geradas, foi possível explicar a variabilidade espacial dos atributos do solo na área de estudo da cultura do mamoeiro, e observa-se que as similaridades entre as zonas geradas, a partir de diferentes atributos, mostrou variação, mas observa-se uma influência nos dados, principalmente pelos atributos P e V. A partir do zoneamento da Unidade de Conservação foi possível selecionar áreas mais aptas ao ecoturismo, sendo encontradas próximas da cachoeira, trilhas em zonas de reflorestamento e de Mata Atlântica. Quanto às áreas propensas a medidas de conservação localizam-se próximas à cachoeira e às estruturas do parque, devido à maior pressão antrópica exercida nesses locais. Outras áreas que se destacaram, foram as áreas de pastagem, por estarem em estágio de regeneração natural. Os resultados indicam áreas de mesmo potencial de produção do mamoeiro, ou quando aplicado à área ambiental, áreas que devem receber maior cuidado para utilização por ecoturismo e para preservação e servem de base para a tomada de decisões, visando melhor aproveitamento da área.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Com a crescente geração, armazenamento e disseminação da informação nos últimos anos, o anterior problema de falta de informação transformou-se num problema de extracção do conhecimento útil a partir da informação disponível. As representações visuais da informação abstracta têm sido utilizadas para auxiliar a interpretação os dados e para revelar padrões de outra forma escondidos. A visualização de informação procura aumentar a cognição humana aproveitando as capacidades visuais humanas, de forma a tornar perceptível a informação abstracta, fornecendo os meios necessários para que um humano possa absorver quantidades crescentes de informação, com as suas capacidades de percepção. O objectivo das técnicas de agrupamento de dados consiste na divisão de um conjunto de dados em vários grupos, em que dados semelhantes são colocados no mesmo grupo e dados dissemelhantes em grupos diferentes. Mais especificamente, o agrupamento de dados com restrições tem o intuito de incorporar conhecimento a priori no processo de agrupamento de dados, com o objectivo de aumentar a qualidade do agrupamento de dados e, simultaneamente, encontrar soluções apropriadas a tarefas e interesses específicos. Nesta dissertação é estudado a abordagem de Agrupamento de Dados Visual Interactivo que permite ao utilizador, através da interacção com uma representação visual da informação, incorporar o seu conhecimento prévio acerca do domínio de dados, de forma a influenciar o agrupamento resultante para satisfazer os seus objectivos. Esta abordagem combina e estende técnicas de visualização interactiva de informação, desenho de grafos de forças direccionadas e agrupamento de dados com restrições. Com o propósito de avaliar o desempenho de diferentes estratégias de interacção com o utilizador, são efectuados estudos comparativos utilizando conjuntos de dados sintéticos e reais.