980 resultados para Algorithm clustering


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Industrial applications of computer vision sometimes require detection of atypical objects that occur as small groups of pixels in digital images. These objects are difficult to single out because they are small and randomly distributed. In this work we propose an image segmentation method using the novel Ant System-based Clustering Algorithm (ASCA). ASCA models the foraging behaviour of ants, which move through the data space searching for high data-density regions, and leave pheromone trails on their path. The pheromone map is used to identify the exact number of clusters, and assign the pixels to these clusters using the pheromone gradient. We applied ASCA to detection of microcalcifications in digital mammograms and compared its performance with state-of-the-art clustering algorithms such as 1D Self-Organizing Map, k-Means, Fuzzy c-Means and Possibilistic Fuzzy c-Means. The main advantage of ASCA is that the number of clusters needs not to be known a priori. The experimental results show that ASCA is more efficient than the other algorithms in detecting small clusters of atypical data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.’s. The distance function for clustering c.p.d.f.’s that is proposed is based on the Kolmogorov–Smirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.’s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov–Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270,000 c.p.d.f.’s in only 6 different clusters that correspond to 6 different c.p.d.f.’s.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Emerging vehicular comfort applications pose a host of completely new set of requirements such as maintaining end-to-end connectivity, packet routing, and reliable communication for internet access while on the move. One of the biggest challenges is to provide good quality of service (QoS) such as low packet delay while coping with the fast topological changes. In this paper, we propose a clustering algorithm based on minimal path loss ratio (MPLR) which should help in spectrum efficiency and reduce data congestion in the network. The vehicular nodes which experience minimal path loss are selected as the cluster heads. The performance of the MPLR clustering algorithm is calculated by rate of change of cluster heads, average number of clusters and average cluster size. Vehicular traffic models derived from the Traffic Wales data are fed as input to the motorway simulator. A mathematical analysis for the rate of change of cluster head is derived which validates the MPLR algorithm and is compared with the simulated results. The mathematical and simulated results are in good agreement indicating the stability of the algorithm and the accuracy of the simulator. The MPLR system is also compared with V2R system with MPLR system performing better. © 2013 IEEE.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Multiple Pheromone Ant Clustering Algorithm (MPACA) models the collective behaviour of ants to find clusters in data and to assign objects to the most appropriate class. It is an ant colony optimisation approach that uses pheromones to mark paths linking objects that are similar and potentially members of the same cluster or class. Its novelty is in the way it uses separate pheromones for each descriptive attribute of the object rather than a single pheromone representing the whole object. Ants that encounter other ants frequently enough can combine the attribute values they are detecting, which enables the MPACA to learn influential variable interactions. This paper applies the model to real-world data from two domains. One is logistics, focusing on resource allocation rather than the more traditional vehicle-routing problem. The other is mental-health risk assessment. The task for the MPACA in each domain was to predict class membership where the classes for the logistics domain were the levels of demand on haulage company resources and the mental-health classes were levels of suicide risk. Results on these noisy real-world data were promising, demonstrating the ability of the MPACA to find patterns in the data with accuracy comparable to more traditional linear regression models. © 2013 Polish Information Processing Society.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Ant colony optimisation algorithms model the way ants use pheromones for marking paths to important locations in their environment. Pheromone traces are picked up, followed, and reinforced by other ants but also evaporate over time. Optimal paths attract more pheromone and less useful paths fade away. The main innovation of the proposed Multiple Pheromone Ant Clustering Algorithm (MPACA) is to mark objects using many pheromones, one for each value of each attribute describing the objects in multidimensional space. Every object has one or more ants assigned to each attribute value and the ants then try to find other objects with matching values, depositing pheromone traces that link them. Encounters between ants are used to determine when ants should combine their features to look for conjunctions and whether they should belong to the same colony. This paper explains the algorithm and explores its potential effectiveness for cluster analysis. © 2014 Springer International Publishing Switzerland.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Emerging vehicular comfort applications pose a host of completely new set of requirements such as maintaining end-to-end connectivity, packet routing, and reliable communication for internet access while on the move. One of the biggest challenges is to provide good quality of service (QoS) such as low packet delay while coping with the fast topological changes. In this paper, we propose a clustering algorithm based on minimal path loss ratio (MPLR) which should help in spectrum efficiency and reduce data congestion in the network. The vehicular nodes which experience minimal path loss are selected as the cluster heads. The performance of the MPLR clustering algorithm is calculated by rate of change of cluster heads, average number of clusters and average cluster size. Vehicular traffic models derived from the Traffic Wales data are fed as input to the motorway simulator. A mathematical analysis for the rate of change of cluster head is derived which validates the MPLR algorithm and is compared with the simulated results. The mathematical and simulated results are in good agreement indicating the stability of the algorithm and the accuracy of the simulator. The MPLR system is also compared with V2R system with MPLR system performing better. © 2013 IEEE.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Ant Colony Optimisation algorithms mimic the way ants use pheromones for marking paths to important locations. Pheromone traces are followed and reinforced by other ants, but also evaporate over time. As a consequence, optimal paths attract more pheromone, whilst the less useful paths fade away. In the Multiple Pheromone Ant Clustering Algorithm (MPACA), ants detect features of objects represented as nodes within graph space. Each node has one or more ants assigned to each feature. Ants attempt to locate nodes with matching feature values, depositing pheromone traces on the way. This use of multiple pheromone values is a key innovation. Ants record other ant encounters, keeping a record of the features and colony membership of ants. The recorded values determine when ants should combine their features to look for conjunctions and whether they should merge into colonies. This ability to detect and deposit pheromone representative of feature combinations, and the resulting colony formation, renders the algorithm a powerful clustering tool. The MPACA operates as follows: (i) initially each node has ants assigned to each feature; (ii) ants roam the graph space searching for nodes with matching features; (iii) when departing matching nodes, ants deposit pheromones to inform other ants that the path goes to a node with the associated feature values; (iv) ant feature encounters are counted each time an ant arrives at a node; (v) if the feature encounters exceed a threshold value, feature combination occurs; (vi) a similar mechanism is used for colony merging. The model varies from traditional ACO in that: (i) a modified pheromone-driven movement mechanism is used; (ii) ants learn feature combinations and deposit multiple pheromone scents accordingly; (iii) ants merge into colonies, the basis of cluster formation. The MPACA is evaluated over synthetic and real-world datasets and its performance compares favourably with alternative approaches.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs). Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K -Means (Par3PKM) algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K -Means and then employ a MapReduce paradigm to redesign the optimized K -Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K -Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.