938 resultados para k-means clustering
Resumo:
Thesis submitted in the fulfillment of the requirements for the Degree of Master in Biomedical Engineering
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.
Resumo:
Radial basis functions can be combined into a network structure that has several advantages over conventional neural network solutions. However, to operate effectively the number and positions of the basis function centres must be carefully selected. Although no rigorous algorithm exists for this purpose, several heuristic methods have been suggested. In this paper a new method is proposed in which radial basis function centres are selected by the mean-tracking clustering algorithm. The mean-tracking algorithm is compared with k means clustering and it is shown that it achieves significantly better results in terms of radial basis function performance. As well as being computationally simpler, the mean-tracking algorithm in general selects better centre positions, thus providing the radial basis functions with better modelling accuracy
Resumo:
LEÃO, Adriano de Castro; DÓRIA NETO, Adrião Duarte; SOUSA, Maria Bernardete Cordeiro de. New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM). Computers in Biology and Medicine, v. 39, p. 853-859, 2009
Resumo:
In this work we propose an image acquisition and processing methodology (framework) developed for performance in-field grapes and leaves detection and quantification, based on a six step methodology: 1) image segmentation through Fuzzy C-Means with Gustafson Kessel (FCM-GK) clustering; 2) obtaining of FCM-GK outputs (centroids) for acting as seeding for K-Means clustering; 3) Identification of the clusters generated by K-Means using a Support Vector Machine (SVM) classifier. 4) Performance of morphological operations over the grapes and leaves clusters in order to fill holes and to eliminate small pixels clusters; 5)Creation of a mosaic image by Scale-Invariant Feature Transform (SIFT) in order to avoid overlapping between images; 6) Calculation of the areas of leaves and grapes and finding of the centroids in the grape bunches. Image data are collected using a colour camera fixed to a mobile platform. This platform was developed to give a stabilized surface to guarantee that the images were acquired parallel to de vineyard rows. In this way, the platform avoids the distortion of the images that lead to poor estimation of the areas. Our preliminary results are promissory, although they still have shown that it is necessary to implement a camera stabilization system to avoid undesired camera movements, and also a parallel processing procedure in order to speed up the mosaicking process.
Resumo:
Salamanca, situated in center of Mexico is among the cities which suffer most from the air pollution in Mexico. The vehicular park and the industry, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Sulphur Dioxide (SO2). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables and air pollutant concentrations of SO2. Before the prediction, Fuzzy c-Means and K-means clustering algorithms have been implemented in order to find relationship among pollutant and meteorological variables. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of SO2 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results showed that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours.
Resumo:
The left hind foot.-- The f̓raid cat.-- The consolation prize.-- The first high janitor.-- Family ties.-- The ten-share horse.-- A chariot of fire.
Resumo:
In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005
Resumo:
Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance. For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes using fuzzy c-means to yield a point data-set representing local memberships in characteristic plant communities. The fuzzy clusters matched vegetation communities noted in the field, which tended to grade into one another, rather than occupying discrete patches. The fuzzy set representation of the community exploited the strengths of detrended correspondence analysis while retaining richer information than a TWINSPAN classification of the same data. Thus, in the absence of phytosociological benchmarks, meaningful and manageable habitat information could be derived from complex, multivariate species data. We also analysed the influence of the reliability of different surveyors' field observations by multiple sampling at a selected sample location. We show that the impact of surveyor error was more severe in the Boolean than the fuzzy classification. © 2007 Springer.
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
Resumo:
LEÃO, Adriano de Castro; DÓRIA NETO, Adrião Duarte; SOUSA, Maria Bernardete Cordeiro de. New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM). Computers in Biology and Medicine, v. 39, p. 853-859, 2009
Resumo:
Resources created at the University of Southampton for the module Remote Sensing for Earth Observation
Resumo:
LEÃO, Adriano de Castro; DÓRIA NETO, Adrião Duarte; SOUSA, Maria Bernardete Cordeiro de. New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM). Computers in Biology and Medicine, v. 39, p. 853-859, 2009
Resumo:
L'esperimento ATLAS, come gli altri esperimenti che operano al Large Hadron Collider, produce Petabytes di dati ogni anno, che devono poi essere archiviati ed elaborati. Inoltre gli esperimenti si sono proposti di rendere accessibili questi dati in tutto il mondo. In risposta a questi bisogni è stato progettato il Worldwide LHC Computing Grid che combina la potenza di calcolo e le capacità di archiviazione di più di 170 siti sparsi in tutto il mondo. Nella maggior parte dei siti del WLCG sono state sviluppate tecnologie per la gestione dello storage, che si occupano anche della gestione delle richieste da parte degli utenti e del trasferimento dei dati. Questi sistemi registrano le proprie attività in logfiles, ricchi di informazioni utili agli operatori per individuare un problema in caso di malfunzionamento del sistema. In previsione di un maggiore flusso di dati nei prossimi anni si sta lavorando per rendere questi siti ancora più affidabili e uno dei possibili modi per farlo è lo sviluppo di un sistema in grado di analizzare i file di log autonomamente e individuare le anomalie che preannunciano un malfunzionamento. Per arrivare a realizzare questo sistema si deve prima individuare il metodo più adatto per l'analisi dei file di log. In questa tesi viene studiato un approccio al problema che utilizza l'intelligenza artificiale per analizzare i logfiles, più nello specifico viene studiato l'approccio che utilizza dell'algoritmo di clustering K-means.