8 resultados para Algorithm clustering

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering analysis of data from DNA microarray hybridization studies is an essential task for identifying biologically relevant groups of genes. Attribute cluster algorithm (ACA) has provided an attractive way to group and select meaningful genes. However, ACA needs much prior knowledge about the genes to set the number of clusters. In practical applications, if the number of clusters is misspecified, the performance of the ACA will deteriorate rapidly. In fact, it is a very demanding to do that because of our little knowledge. We propose the Cooperative Competition Cluster Algorithm (CCCA) in this paper. In the algorithm, we assume that both cooperation and competition exist simultaneously between clusters in the process of clustering. By using this principle of Cooperative Competition, the number of clusters can be found in the process of clustering. Experimental results on a synthetic and gene expression data are demonstrated. The results show that CCCA can choose the number of clusters automatically and get excellent performance with respect to other competing methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper deals with Takagi-Sugeno (TS) fuzzy model identification of nonlinear systems using fuzzy clustering. In particular, an extended fuzzy Gustafson-Kessel (EGK) clustering algorithm, using robust competitive agglomeration (RCA), is developed for automatically constructing a TS fuzzy model from system input-output data. The EGK algorithm can automatically determine the 'optimal' number of clusters from the training data set. It is shown that the EGK approach is relatively insensitive to initialization and is less susceptible to local minima, a benefit derived from its agglomerate property. This issue is often overlooked in the current literature on nonlinear identification using conventional fuzzy clustering. Furthermore, the robust statistical concepts underlying the EGK algorithm help to alleviate the difficulty of cluster identification in the construction of a TS fuzzy model from noisy training data. A new hybrid identification strategy is then formulated, which combines the EGK algorithm with a locally weighted, least-squares method for the estimation of local sub-model parameters. The efficacy of this new approach is demonstrated through function approximation examples and also by application to the identification of an automatic voltage regulation (AVR) loop for a simulated 3 kVA laboratory micro-machine system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose a graph stream clustering algorithm with a unied similarity measure on both structural and attribute properties of vertices, with each attribute being treated as a vertex. Unlike others, our approach does not require an input parameter for the number of clusters, instead, it dynamically creates new sketch-based clusters and periodically merges existing similar clusters. Experiments on two publicly available datasets reveal the advantages of our approach in detecting vertex clusters in the graph stream. We provide a detailed investigation into how parameters affect the algorithm performance. We also provide a quantitative evaluation and comparison with a well-known offline community detection algorithm which shows that our streaming algorithm can achieve comparable or better average cluster purity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most traditional data mining algorithms struggle to cope with the sheer scale of data efficiently. In this paper, we propose a general framework to accelerate existing clustering algorithms to cluster large-scale datasets which contain large numbers of attributes, items, and clusters. Our framework makes use of locality sensitive hashing (LSH) to significantly reduce the cluster search space. We also theoretically prove that our framework has a guaranteed error bound in terms of the clustering quality. This framework can be applied to a set of centroid-based clustering algorithms that assign an object to the most similar cluster, and we adopt the popular K-Modes categorical clustering algorithm to present how the framework can be applied. We validated our framework with five synthetic datasets and a real world Yahoo! Answers dataset. The experimental results demonstrate that our framework is able to speed up the existing clustering algorithm between factors of 2 and 6, while maintaining comparable cluster purity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Social networks generally display a positively skewed degree distribution and higher values for clustering coefficient and degree assortativity than would be expected from the degree sequence. For some types of simulation studies, these properties need to be varied in the artificial networks over which simulations are to be conducted. Various algorithms to generate networks have been described in the literature but their ability to control all three of these network properties is limited. We introduce a spatially constructed algorithm that generates networks with constrained but arbitrary degree distribution, clustering coefficient and assortativity. Both a general approach and specific implementation are presented. The specific implementation is validated and used to generate networks with a constrained but broad range of property values. © Copyright JASSS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This papers examines the use of trajectory distance measures and clustering techniques to define normal
and abnormal trajectories in the context of pedestrian tracking in public spaces. In order to detect abnormal
trajectories, what is meant by a normal trajectory in a given scene is firstly defined. Then every trajectory
that deviates from this normality is classified as abnormal. By combining Dynamic Time Warping and a
modified K-Means algorithms for arbitrary-length data series, we have developed an algorithm for trajectory
clustering and abnormality detection. The final system performs with an overall accuracy of 83% and 75%
when tested in two different standard datasets.