859 resultados para Fuzzy c-means algorithm
Resumo:
Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets
Resumo:
Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents
Resumo:
Salamanca has been considered among the most polluted cities in Mexico. The vehicular park, the industry and the emissions produced by agriculture, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Particulate Matter less than 10 μg/m3 in diameter (PM10). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables (wind speed, wind direction, temperature and relative humidity) and air pollutant concentrations of PM10. Before the prediction, Fuzzy c-Means clustering algorithm have been implemented in order to find relationship among pollutant and meteorological variables. These relationship help us to get additional information that will be used for predicting. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of PM10 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results shown that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours
Resumo:
Virtually every sector of business and industry that uses computing, including financial analysis, search engines, and electronic commerce, incorporate Big Data analysis into their business model. Sophisticated clustering algorithms are popular for deducing the nature of data by assigning labels to unlabeled data. We address two main challenges in Big Data. First, by definition, the volume of Big Data is too large to be loaded into a computer’s memory (this volume changes based on the computer used or available, but there is always a data set that is too large for any computer). Second, in real-time applications, the velocity of new incoming data prevents historical data from being stored and future data from being accessed. Therefore, we propose our Streaming Kernel Fuzzy c-Means (stKFCM) algorithm, which reduces both computational complexity and space complexity significantly. The proposed stKFCM only requires O(n2) memory where n is the (predetermined) size of a data subset (or data chunk) at each time step, which makes this algorithm truly scalable (as n can be chosen based on the available memory). Furthermore, only 2n2 elements of the full N × N (where N >> n) kernel matrix need to be calculated at each time-step, thus reducing both the computation time in producing the kernel elements and also the complexity of the FCM algorithm. Empirical results show that stKFCM, even with relatively very small n, can provide clustering performance as accurately as kernel fuzzy c-means run on the entire data set while achieving a significant speedup.
Resumo:
In this paper a methodology for integrated multivariate monitoring and control of biological wastewater treatment plants during extreme events is presented. To monitor the process, on-line dynamic principal component analysis (PCA) is performed on the process data to extract the principal components that represent the underlying mechanisms of the process. Fuzzy c-means (FCM) clustering is used to classify the operational state. Performing clustering on scores from PCA solves computational problems as well as increases robustness due to noise attenuation. The class-membership information from FCM is used to derive adequate control set points for the local control loops. The methodology is illustrated by a simulation study of a biological wastewater treatment plant, on which disturbances of various types are imposed. The results show that the methodology can be used to determine and co-ordinate control actions in order to shift the control objective and improve the effluent quality.
Resumo:
Salamanca, situated in center of Mexico is among the cities which suffer most from the air pollution in Mexico. The vehicular park and the industry, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Sulphur Dioxide (SO2). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables and air pollutant concentrations of SO2. Before the prediction, Fuzzy c-Means and K-means clustering algorithms have been implemented in order to find relationship among pollutant and meteorological variables. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of SO2 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results showed that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours.
Resumo:
Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance. For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes using fuzzy c-means to yield a point data-set representing local memberships in characteristic plant communities. The fuzzy clusters matched vegetation communities noted in the field, which tended to grade into one another, rather than occupying discrete patches. The fuzzy set representation of the community exploited the strengths of detrended correspondence analysis while retaining richer information than a TWINSPAN classification of the same data. Thus, in the absence of phytosociological benchmarks, meaningful and manageable habitat information could be derived from complex, multivariate species data. We also analysed the influence of the reliability of different surveyors' field observations by multiple sampling at a selected sample location. We show that the impact of surveyor error was more severe in the Boolean than the fuzzy classification. © 2007 Springer.
Resumo:
Segmentation is an important step in many medical imaging applications and a variety of image segmentation techniques exist. One group of segmentation algorithms is based on clustering concepts. In this article we investigate several fuzzy c-means based clustering algorithms and their application to medical image segmentation. In particular we evaluate the conventional hard c-means (HCM) and fuzzy c-means (FCM) approaches as well as three computationally more efficient derivatives of fuzzy c-means: fast FCM with random sampling, fast generalised FCM, and a new anisotropic mean shift based FCM. © 2010 by IJTS, ISDER.
Resumo:
In 2000 the European Statistical Office published the guidelines for developing the Harmonized European Time Use Surveys system. Under such a unified framework, the first Time Use Survey of national scope was conducted in Spain during 2002– 03. The aim of these surveys is to understand human behavior and the lifestyle of people. Time allocation data are of compositional nature in origin, that is, they are subject to non-negativity and constant-sum constraints. Thus, standard multivariate techniques cannot be directly applied to analyze them. The goal of this work is to identify homogeneous Spanish Autonomous Communities with regard to the typical activity pattern of their respective populations. To this end, fuzzy clustering approach is followed. Rather than the hard partitioning of classical clustering, where objects are allocated to only a single group, fuzzy method identify overlapping groups of objects by allowing them to belong to more than one group. Concretely, the probabilistic fuzzy c-means algorithm is conveniently adapted to deal with the Spanish Time Use Survey microdata. As a result, a map distinguishing Autonomous Communities with similar activity pattern is drawn. Key words: Time use data, Fuzzy clustering; FCM; simplex space; Aitchison distance
Resumo:
Zonal management in vineyards requires the prior delineation of stable yield zones within the parcel. Among the different methodologies used for zone delineation, cluster analysis of yield data from several years is one of the possibilities cited in scientific literature. However, there exist reasonable doubts concerning the cluster algorithm to be used and the number of zones that have to be delineated within a field. In this paper two different cluster algorithms have been compared (k-means and fuzzy c-means) using the grape yield data corresponding to three successive years (2002, 2003 and 2004), for a ‘Pinot Noir’ vineyard parcel. Final choice of the most recommendable algorithm has been linked to obtaining a stable pattern of spatial yield distribution and to allowing for the delineation of compact and average sized areas. The general recommendation is to use reclassified maps of two clusters or yield classes (low yield zone and high yield zone) and, consequently, the site-specific vineyard management should be based on the prior delineation of just two different zones or sub-parcels. The two tested algorithms are good options for this purpose. However, the fuzzy c-means algorithm allows for a better zoning of the parcel, forming more compact areas and with more equilibrated zonal differences over time.
Resumo:
Several equipments and methodologies have been developed to make available precision agriculture, especially considering the high cost of its implantation and sampling. An interesting possibility is to define management zones aim at dividing producing areas in smaller management zones that could be treated differently, serving as a source of recommendation and analysis. Thus, this trial used physical and chemical properties of soil and yield aiming at the generation of management zones in order to identify whether they can be used as recommendation and analysis. Management zones were generated by the Fuzzy C-Means algorithm and their evaluation was performed by calculating the reduction of variance and performing means tests. The division of the area into two management zones was considered appropriate for the present distinct averages of most soil properties and yield. The used methodology allowed the generation of management zones that can serve as source of recommendation and soil analysis; despite the relative efficiency has shown a reduced variance for all attributes in divisions in the three sub-regions, the ANOVA did not show significative differences among the management zones.
Resumo:
ABSTRACT Precision agriculture (PA) allows farmers to identify and address variations in an agriculture field. Management zones (MZs) make PA more feasible and economical. The most important method for defining MZs is a fuzzy C-means algorithm, but selecting the variable for use as the input layer in the fuzzy process is problematic. BAZZI et al. (2013) used Moran’s bivariate spatial autocorrelation statistic to identify variables that are spatially correlated with yield while employing spatial autocorrelation. BAZZI et al. (2013) proposed that all redundant variables be eliminated and that the remaining variables would be considered appropriate on the MZ generation process. Thus, the objective of this work, a study case, was to test the hypothesis that redundant variables can harm the MZ delineation process. BAZZI This work was conducted in a 19.6-ha commercial field, and 15 MZ designs were generated by a fuzzy C-means algorithm and divided into two to five classes. Each design used a different composition of variables, including copper, silt, clay, and altitude. Some combinations of these variables produced superior MZs. None of the variable combinations produced statistically better performance that the MZ generated with no redundant variables. Thus, the other redundant variables can be discredited. The design with all variables did not provide a greater separation and organization of data among MZ classes and was not recommended.
Resumo:
Atmosphärische Partikel beeinflussen das Klima durch Prozesse wie Streuung, Reflexion und Absorption. Zusätzlich fungiert ein Teil der Aerosolpartikel als Wolkenkondensationskeime (CCN), die sich auf die optischen Eigenschaften sowie die Rückstreukraft der Wolken und folglich den Strahlungshaushalt auswirken. Ob ein Aerosolpartikel Eigenschaften eines Wolkenkondensationskeims aufweist, ist vor allem von der Partikelgröße sowie der chemischen Zusammensetzung abhängig. Daher wurde die Methode der Einzelpartikel-Laserablations-Massenspektrometrie angewandt, die eine größenaufgelöste chemische Analyse von Einzelpartikeln erlaubt und zum Verständnis der ablaufenden multiphasenchemischen Prozesse innerhalb der Wolke beitragen soll.rnIm Rahmen dieser Arbeit wurde zur Charakterisierung von atmosphärischem Aerosol sowie von Wolkenresidualpartikel das Einzelpartikel-Massenspektrometer ALABAMA (Aircraft-based Laser Ablation Aerosol Mass Spectrometer) verwendet. Zusätzlich wurde zur Analyse der Partikelgröße sowie der Anzahlkonzentration ein optischer Partikelzähler betrieben. rnZur Bestimmung einer geeigneten Auswertemethode, die die Einzelpartikelmassenspektren automatisch in Gruppen ähnlich aussehender Spektren sortieren soll, wurden die beiden Algorithmen k-means und fuzzy c-means auf ihrer Richtigkeit überprüft. Es stellte sich heraus, dass beide Algorithmen keine fehlerfreien Ergebnisse lieferten, was u.a. von den Startbedingungen abhängig ist. Der fuzzy c-means lieferte jedoch zuverlässigere Ergebnisse. Darüber hinaus wurden die Massenspektren anhand auftretender charakteristischer chemischer Merkmale (Nitrat, Sulfat, Metalle) analysiert.rnIm Herbst 2010 fand die Feldkampagne HCCT (Hill Cap Cloud Thuringia) im Thüringer Wald statt, bei der die Veränderung von Aerosolpartikeln beim Passieren einer orographischen Wolke sowie ablaufende Prozesse innerhalb der Wolke untersucht wurden. Ein Vergleich der chemischen Zusammensetzung von Hintergrundaerosol und Wolkenresidualpartikeln zeigte, dass die relativen Anteile von Massenspektren der Partikeltypen Ruß und Amine für Wolkenresidualpartikel erhöht waren. Dies lässt sich durch eine gute CCN-Aktivität der intern gemischten Rußpartikel mit Nitrat und Sulfat bzw. auf einen begünstigten Übergang der Aminverbindungen aus der Gas- in die Partikelphase bei hohen relativen Luftfeuchten und tiefen Temperaturen erklären. Darüber hinaus stellte sich heraus, dass bereits mehr als 99% der Partikel des Hintergrundaerosols intern mit Nitrat und/oder Sulfat gemischt waren. Eine detaillierte Analyse des Mischungszustands der Aerosolpartikel zeigte, dass sich sowohl der Nitratgehalt als auch der Sulfatgehalt der Partikel beim Passieren der Wolke erhöhte. rn