891 resultados para Fuzzy C-Means Clustering
Resumo:
Radial basis function networks can be trained quickly using linear optimisation once centres and other associated parameters have been initialised. The authors propose a small adjustment to a well accepted initialisation algorithm which improves the network accuracy over a range of problems. The algorithm is described and results are presented.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
There is a family of well-known external clustering validity indexes to measure the degree of compatibility or similarity between two hard partitions of a given data set, including partitions with different numbers of categories. A unified, fully equivalent set-theoretic formulation for an important class of such indexes was derived and extended to the fuzzy domain in a previous work by the author [Campello, R.J.G.B., 2007. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Lett., 28, 833-841]. However, the proposed fuzzy set-theoretic formulation is not valid as a general approach for comparing two fuzzy partitions of data. Instead, it is an approach for comparing a fuzzy partition against a hard referential partition of the data into mutually disjoint categories. In this paper, generalized external indexes for comparing two data partitions with overlapping categories are introduced. These indexes can be used as general measures for comparing two partitions of the same data set into overlapping categories. An important issue that is seldom touched in the literature is also addressed in the paper, namely, how to compare two partitions of different subsamples of data. A number of pedagogical examples and three simulation experiments are presented and analyzed in details. A review of recent related work compiled from the literature is also provided. (c) 2010 Elsevier B.V. All rights reserved.
Resumo:
Image segmentation is one of the image processing problems that deserves special attention from the scientific community. This work studies unsupervised methods to clustering and pattern recognition applicable to medical image segmentation. Natural Computing based methods have shown very attractive in such tasks and are studied here as a way to verify it's applicability in medical image segmentation. This work treats to implement the following methods: GKA (Genetic K-means Algorithm), GFCMA (Genetic FCM Algorithm), PSOKA (PSO and K-means based Clustering Algorithm) and PSOFCM (PSO and FCM based Clustering Algorithm). Besides, as a way to evaluate the results given by the algorithms, clustering validity indexes are used as quantitative measure. Visual and qualitative evaluations are realized also, mainly using data given by the BrainWeb brain simulator as ground truth
Resumo:
Atmosphärische Aerosolpartikel wirken in vielerlei Hinsicht auf die Menschen und die Umwelt ein. Eine genaue Charakterisierung der Partikel hilft deren Wirken zu verstehen und dessen Folgen einzuschätzen. Partikel können hinsichtlich ihrer Größe, ihrer Form und ihrer chemischen Zusammensetzung charakterisiert werden. Mit der Laserablationsmassenspektrometrie ist es möglich die Größe und die chemische Zusammensetzung einzelner Aerosolpartikel zu bestimmen. Im Rahmen dieser Arbeit wurde das SPLAT (Single Particle Laser Ablation Time-of-flight mass spectrometer) zur besseren Analyse insbesondere von atmosphärischen Aerosolpartikeln weiterentwickelt. Der Aerosoleinlass wurde dahingehend optimiert, einen möglichst weiten Partikelgrößenbereich (80 nm - 3 µm) in das SPLAT zu transferieren und zu einem feinen Strahl zu bündeln. Eine neue Beschreibung für die Beziehung der Partikelgröße zu ihrer Geschwindigkeit im Vakuum wurde gefunden. Die Justage des Einlasses wurde mithilfe von Schrittmotoren automatisiert. Die optische Detektion der Partikel wurde so verbessert, dass Partikel mit einer Größe < 100 nm erfasst werden können. Aufbauend auf der optischen Detektion und der automatischen Verkippung des Einlasses wurde eine neue Methode zur Charakterisierung des Partikelstrahls entwickelt. Die Steuerelektronik des SPLAT wurde verbessert, so dass die maximale Analysefrequenz nur durch den Ablationslaser begrenzt wird, der höchsten mit etwa 10 Hz ablatieren kann. Durch eine Optimierung des Vakuumsystems wurde der Ionenverlust im Massenspektrometer um den Faktor 4 verringert.rnrnNeben den hardwareseitigen Weiterentwicklungen des SPLAT bestand ein Großteil dieser Arbeit in der Konzipierung und Implementierung einer Softwarelösung zur Analyse der mit dem SPLAT gewonnenen Rohdaten. CRISP (Concise Retrieval of Information from Single Particles) ist ein auf IGOR PRO (Wavemetrics, USA) aufbauendes Softwarepaket, das die effiziente Auswertung der Einzelpartikel Rohdaten erlaubt. CRISP enthält einen neu entwickelten Algorithmus zur automatischen Massenkalibration jedes einzelnen Massenspektrums, inklusive der Unterdrückung von Rauschen und von Problemen mit Signalen die ein intensives Tailing aufweisen. CRISP stellt Methoden zur automatischen Klassifizierung der Partikel zur Verfügung. Implementiert sind k-means, fuzzy-c-means und eine Form der hierarchischen Einteilung auf Basis eines minimal aufspannenden Baumes. CRISP bietet die Möglichkeit die Daten vorzubehandeln, damit die automatische Einteilung der Partikel schneller abläuft und die Ergebnisse eine höhere Qualität aufweisen. Daneben kann CRISP auf einfache Art und Weise Partikel anhand vorgebener Kriterien sortieren. Die CRISP zugrundeliegende Daten- und Infrastruktur wurde in Hinblick auf Wartung und Erweiterbarkeit erstellt. rnrnIm Rahmen der Arbeit wurde das SPLAT in mehreren Kampagnen erfolgreich eingesetzt und die Fähigkeiten von CRISP konnten anhand der gewonnen Datensätze gezeigt werden.rnrnDas SPLAT ist nun in der Lage effizient im Feldeinsatz zur Charakterisierung des atmosphärischen Aerosols betrieben zu werden, während CRISP eine schnelle und gezielte Auswertung der Daten ermöglicht.
Resumo:
Atmosphärische Partikel beeinflussen das Klima durch Prozesse wie Streuung, Reflexion und Absorption. Zusätzlich fungiert ein Teil der Aerosolpartikel als Wolkenkondensationskeime (CCN), die sich auf die optischen Eigenschaften sowie die Rückstreukraft der Wolken und folglich den Strahlungshaushalt auswirken. Ob ein Aerosolpartikel Eigenschaften eines Wolkenkondensationskeims aufweist, ist vor allem von der Partikelgröße sowie der chemischen Zusammensetzung abhängig. Daher wurde die Methode der Einzelpartikel-Laserablations-Massenspektrometrie angewandt, die eine größenaufgelöste chemische Analyse von Einzelpartikeln erlaubt und zum Verständnis der ablaufenden multiphasenchemischen Prozesse innerhalb der Wolke beitragen soll.rnIm Rahmen dieser Arbeit wurde zur Charakterisierung von atmosphärischem Aerosol sowie von Wolkenresidualpartikel das Einzelpartikel-Massenspektrometer ALABAMA (Aircraft-based Laser Ablation Aerosol Mass Spectrometer) verwendet. Zusätzlich wurde zur Analyse der Partikelgröße sowie der Anzahlkonzentration ein optischer Partikelzähler betrieben. rnZur Bestimmung einer geeigneten Auswertemethode, die die Einzelpartikelmassenspektren automatisch in Gruppen ähnlich aussehender Spektren sortieren soll, wurden die beiden Algorithmen k-means und fuzzy c-means auf ihrer Richtigkeit überprüft. Es stellte sich heraus, dass beide Algorithmen keine fehlerfreien Ergebnisse lieferten, was u.a. von den Startbedingungen abhängig ist. Der fuzzy c-means lieferte jedoch zuverlässigere Ergebnisse. Darüber hinaus wurden die Massenspektren anhand auftretender charakteristischer chemischer Merkmale (Nitrat, Sulfat, Metalle) analysiert.rnIm Herbst 2010 fand die Feldkampagne HCCT (Hill Cap Cloud Thuringia) im Thüringer Wald statt, bei der die Veränderung von Aerosolpartikeln beim Passieren einer orographischen Wolke sowie ablaufende Prozesse innerhalb der Wolke untersucht wurden. Ein Vergleich der chemischen Zusammensetzung von Hintergrundaerosol und Wolkenresidualpartikeln zeigte, dass die relativen Anteile von Massenspektren der Partikeltypen Ruß und Amine für Wolkenresidualpartikel erhöht waren. Dies lässt sich durch eine gute CCN-Aktivität der intern gemischten Rußpartikel mit Nitrat und Sulfat bzw. auf einen begünstigten Übergang der Aminverbindungen aus der Gas- in die Partikelphase bei hohen relativen Luftfeuchten und tiefen Temperaturen erklären. Darüber hinaus stellte sich heraus, dass bereits mehr als 99% der Partikel des Hintergrundaerosols intern mit Nitrat und/oder Sulfat gemischt waren. Eine detaillierte Analyse des Mischungszustands der Aerosolpartikel zeigte, dass sich sowohl der Nitratgehalt als auch der Sulfatgehalt der Partikel beim Passieren der Wolke erhöhte. rn
Resumo:
Analisi riguardante la tenacizzazione della matrice di laminati compositi. Lo scopo è quello di aumentare la resistenza alla frattura di modo I e, a tal proposito, sono stati modificati gli interstrati di alcuni provini tramite l’introduzione di strati, di diverso spessore, di nanofibre in polivinilidenfluoruro (PVDF). La valutazione di tale metodo di rinforzo è stata eseguita servendosi di dati ottenuti tramite prove sperimentali svolte in laboratorio direttamente dal sottoscritto, che si è occupato dell’elaborazione dei dati servendosi di tecniche e algoritmi di recente ideazione. La necessità primaria per cui si cerca di rinforzare la matrice risiede nel problema più sentito dei laminati compositi in opera da molto tempo: la delaminazione. Oltre a verificare le proprietà meccaniche dei provini modificati sottoponendoli a test DCB, si è utilizzata una tecnica basata sulle emissioni acustiche per comprendere più approfonditamente l’inizio della delaminazione e i meccanismi di rottura che si verificano durante le prove. Quest’ultimi sono illustrati servendosi di un algoritmo di clustering, detto Fuzzy C-means, tramite il quale è stato possibile identificare ogni segnale come appartenente o meno ad un determinato modo di rottura. I risultati mostrano che il PVDF, applicato nelle modalità esposte, è in grado di aumentare la resistenza alla frattura di modo I divenendo contemporaneamente causa di un diverso modo di propagazione della frattura. Infine l’elaborato presenta alcune micrografie delle superfici di rottura, le quali appoggiano i risultati ottenuti nelle precedenti fasi di analisi.
Resumo:
Fuzzy community detection is to identify fuzzy communities in a network, which are groups of vertices in the network such that the membership of a vertex in one community is in [0,1] and that the sum of memberships of vertices in all communities equals to 1. Fuzzy communities are pervasive in social networks, but only a few works have been done for fuzzy community detection. Recently, a one-step forward extension of Newman’s Modularity, the most popular quality function for disjoint community detection, results into the Generalized Modularity (GM) that demonstrates good performance in finding well-known fuzzy communities. Thus, GMis chosen as the quality function in our research. We first propose a generalized fuzzy t-norm modularity to investigate the effect of different fuzzy intersection operators on fuzzy community detection, since the introduction of a fuzzy intersection operation is made feasible by GM. The experimental results show that the Yager operator with a proper parameter value performs better than the product operator in revealing community structure. Then, we focus on how to find optimal fuzzy communities in a network by directly maximizing GM, which we call it Fuzzy Modularity Maximization (FMM) problem. The effort on FMM problem results into the major contribution of this thesis, an efficient and effective GM-based fuzzy community detection method that could automatically discover a fuzzy partition of a network when it is appropriate, which is much better than fuzzy partitions found by existing fuzzy community detection methods, and a crisp partition of a network when appropriate, which is competitive with partitions resulted from the best disjoint community detections up to now. We address FMM problem by iteratively solving a sub-problem called One-Step Modularity Maximization (OSMM). We present two approaches for solving this iterative procedure: a tree-based global optimizer called Find Best Leaf Node (FBLN) and a heuristic-based local optimizer. The OSMM problem is based on a simplified quadratic knapsack problem that can be solved in linear time; thus, a solution of OSMM can be found in linear time. Since the OSMM algorithm is called within FBLN recursively and the structure of the search tree is non-deterministic, we can see that the FMM/FBLN algorithm runs in a time complexity of at least O (n2). So, we also propose several highly efficient and very effective heuristic algorithms namely FMM/H algorithms. We compared our proposed FMM/H algorithms with two state-of-the-art community detection methods, modified MULTICUT Spectral Fuzzy c-Means (MSFCM) and Genetic Algorithm with a Local Search strategy (GALS), on 10 real-world data sets. The experimental results suggest that the H2 variant of FMM/H is the best performing version. The H2 algorithm is very competitive with GALS in producing maximum modularity partitions and performs much better than MSFCM. On all the 10 data sets, H2 is also 2-3 orders of magnitude faster than GALS. Furthermore, by adopting a simply modified version of the H2 algorithm as a mutation operator, we designed a genetic algorithm for fuzzy community detection, namely GAFCD, where elite selection and early termination are applied. The crossover operator is designed to make GAFCD converge fast and to enhance GAFCD’s ability of jumping out of local minimums. Experimental results on all the data sets show that GAFCD uncovers better community structure than GALS.
Resumo:
Intensity non-uniformity (bias field) correction, contextual constraints over spatial intensity distribution and non-spherical cluster's shape in the feature space are incorporated into the fuzzy c-means (FCM) for segmentation of three-dimensional multi-spectral MR images. The bias field is modeled by a linear combination of smooth polynomial basis functions for fast computation in the clustering iterations. Regularization terms for the neighborhood continuity of either intensity or membership are added into the FCM cost functions. Since the feature space is not isotropic, distance measures, other than the Euclidean distance, are used to account for the shape and volumetric effects of clusters in the feature space. The performance of segmentation is improved by combining the adaptive FCM scheme with the criteria used in Gustafson-Kessel (G-K) and Gath-Geva (G-G) algorithms through the inclusion of the cluster scatter measure. The performance of this integrated approach is quantitatively evaluated on normal MR brain images using the similarity measures. The improvement in the quality of segmentation obtained with our method is also demonstrated by comparing our results with those produced by FSL (FMRIB Software Library), a software package that is commonly used for tissue classification.
Resumo:
This study subdivides the Potter Cove, King George Island, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis includes in total 42 different environmental variables, interpolated based on samples taken during Australian summer seasons 2010/2011 and 2011/2012. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared and the most reasonable method has been applied. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested and 4, 7, 10 as well as 12 were identified as reasonable numbers for clustering the Potter Cove. Especially the results of 10 and 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.
Resumo:
Abstract Air pollution is a big threat and a phenomenon that has a specific impact on human health, in addition, changes that occur in the chemical composition of the atmosphere can change the weather and cause acid rain or ozone destruction. Those are phenomena of global importance. The World Health Organization (WHO) considerates air pollution as one of the most important global priorities. Salamanca, Gto., Mexico has been ranked as one of the most polluted cities in this country. The industry of the area led to a major economic development and rapid population growth in the second half of the twentieth century. The impact in the air quality is important and significant efforts have been made to measure the concentrations of pollutants. The main pollution sources are locally based plants in the chemical and power generation sectors. The registered concerning pollutants are Sulphur Dioxide (SO2) and particles on the order of ∼10 micrometers or less (PM10). The prediction in the concentration of those pollutants can be a powerful tool in order to take preventive measures such as the reduction of emissions and alerting the affected population. In this PhD thesis we propose a model to predict concentrations of pollutants SO2 and PM10 for each monitoring booth in the Atmospheric Monitoring Network Salamanca (REDMAS - for its spanish acronym). The proposed models consider the use of meteorological variables as factors influencing the concentration of pollutants. The information used along this work is the current real data from REDMAS. In the proposed model, Artificial Neural Networks (ANN) combined with clustering algorithms are used. The type of ANN used is the Multilayer Perceptron with a hidden layer, using separate structures for the prediction of each pollutant. The meteorological variables used for prediction were: Wind Direction (WD), wind speed (WS), Temperature (T) and relative humidity (RH). Clustering algorithms, K-means and Fuzzy C-means, are used to find relationships between air pollutants and weather variables under consideration, which are added as input of the RNA. Those relationships provide information to the ANN in order to obtain the prediction of the pollutants. The results of the model proposed in this work are compared with the results of a multivariate linear regression and multilayer perceptron neural network. The evaluation of the prediction is calculated with the mean absolute error, the root mean square error, the correlation coefficient and the index of agreement. The results show the importance of meteorological variables in the prediction of the concentration of the pollutants SO2 and PM10 in the city of Salamanca, Gto., Mexico. The results show that the proposed model perform better than multivariate linear regression and multilayer perceptron neural network. The models implemented for each monitoring booth have the ability to make predictions of air quality that can be used in a system of real-time forecasting and human health impact analysis. Among the main results of the development of this thesis we can cite: A model based on artificial neural network combined with clustering algorithms for prediction with a hour ahead of the concentration of each pollutant (SO2 and PM10) is proposed. A different model was designed for each pollutant and for each of the three monitoring booths of the REDMAS. A model to predict the average of pollutant concentration in the next 24 hours of pollutants SO2 and PM10 is proposed, based on artificial neural network combined with clustering algorithms. Model was designed for each booth of the REDMAS and each pollutant separately. Resumen La contaminación atmosférica es una amenaza aguda, constituye un fenómeno que tiene particular incidencia sobre la salud del hombre. Los cambios que se producen en la composición química de la atmósfera pueden cambiar el clima, producir lluvia ácida o destruir el ozono, fenómenos todos ellos de una gran importancia global. La Organización Mundial de la Salud (OMS) considera la contaminación atmosférica como una de las más importantes prioridades mundiales. Salamanca, Gto., México; ha sido catalogada como una de las ciudades más contaminadas en este país. La industria de la zona propició un importante desarrollo económico y un crecimiento acelerado de la población en la segunda mitad del siglo XX. Las afectaciones en el aire son graves y se han hecho importantes esfuerzos por medir las concentraciones de los contaminantes. Las principales fuentes de contaminación son fuentes fijas como industrias químicas y de generación eléctrica. Los contaminantes que se han registrado como preocupantes son el Bióxido de Azufre (SO2) y las Partículas Menores a 10 micrómetros (PM10). La predicción de las concentraciones de estos contaminantes puede ser una potente herramienta que permita tomar medidas preventivas como reducción de emisiones a la atmósfera y alertar a la población afectada. En la presente tesis doctoral se propone un modelo de predicción de concentraci ón de los contaminantes más críticos SO2 y PM10 para cada caseta de monitorización de la Red de Monitorización Atmosférica de Salamanca (REDMAS). Los modelos propuestos plantean el uso de las variables meteorol ógicas como factores que influyen en la concentración de los contaminantes. La información utilizada durante el desarrollo de este trabajo corresponde a datos reales obtenidos de la REDMAS. En el Modelo Propuesto (MP) se aplican Redes Neuronales Artificiales (RNA) combinadas con algoritmos de agrupamiento. La RNA utilizada es el Perceptrón Multicapa con una capa oculta, utilizando estructuras independientes para la predicción de cada contaminante. Las variables meteorológicas disponibles para realizar la predicción fueron: Dirección de Viento (DV), Velocidad de Viento (VV), Temperatura (T) y Humedad Relativa (HR). Los algoritmos de agrupamiento K-means y Fuzzy C-means son utilizados para encontrar relaciones existentes entre los contaminantes atmosféricos en estudio y las variables meteorológicas. Dichas relaciones aportan información a las RNA para obtener la predicción de los contaminantes, la cual es agregada como entrada de las RNA. Los resultados del modelo propuesto en este trabajo son comparados con los resultados de una Regresión Lineal Multivariable (RLM) y un Perceptrón Multicapa (MLP). La evaluación de la predicción se realiza con el Error Medio Absoluto, la Raíz del Error Cuadrático Medio, el coeficiente de correlación y el índice de acuerdo. Los resultados obtenidos muestran la importancia de las variables meteorológicas en la predicción de la concentración de los contaminantes SO2 y PM10 en la ciudad de Salamanca, Gto., México. Los resultados muestran que el MP predice mejor la concentración de los contaminantes SO2 y PM10 que los modelos RLM y MLP. Los modelos implementados para cada caseta de monitorizaci ón tienen la capacidad para realizar predicciones de calidad del aire, estos modelos pueden ser implementados en un sistema que permita realizar la predicción en tiempo real y analizar el impacto en la salud de la población. Entre los principales resultados obtenidos del desarrollo de esta tesis podemos citar: Se propone un modelo basado en una red neuronal artificial combinado con algoritmos de agrupamiento para la predicción con una hora de anticipaci ón de la concentración de cada contaminante (SO2 y PM10). Se diseñó un modelo diferente para cada contaminante y para cada una de las tres casetas de monitorización de la REDMAS. Se propone un modelo de predicción del promedio de la concentración de las próximas 24 horas de los contaminantes SO2 y PM10, basado en una red neuronal artificial combinado con algoritmos de agrupamiento. Se diseñó un modelo para cada caseta de monitorización de la REDMAS y para cada contaminante por separado.
Resumo:
The image by Computed Tomography is a non-invasive alternative for observing soil structures, mainly pore space. The pore space correspond in soil data to empty or free space in the sense that no material is present there but only fluids, the fluid transport depend of pore spaces in soil, for this reason is important identify the regions that correspond to pore zones. In this paper we present a methodology in order to detect pore space and solid soil based on the synergy of the image processing, pattern recognition and artificial intelligence. The mathematical morphology is an image processing technique used for the purpose of image enhancement. In order to find pixels groups with a similar gray level intensity, or more or less homogeneous groups, a novel image sub-segmentation based on a Possibilistic Fuzzy c-Means (PFCM) clustering algorithm was used. The Artificial Neural Networks (ANNs) are very efficient for demanding large scale and generic pattern recognition applications for this reason finally a classifier based on artificial neural network is applied in order to classify soil images in two classes, pore space and solid soil respectively.