Biblioteca Digital

873 resultados para Clustering, Collaborative Filtering, K-Means Algorithm, Recommender System, Slope One Algorithm

New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

LE��O, Adriano de Castro; D��RIA NETO, Adri��o Duarte; SOUSA, Maria Bernardete Cordeiro de. New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM). Computers in Biology and Medicine, v. 39, p. 853-859, 2009

New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

LE��O, Adriano de Castro; D��RIA NETO, Adri��o Duarte; SOUSA, Maria Bernardete Cordeiro de. New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM). Computers in Biology and Medicine, v. 39, p. 853-859, 2009

New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

LE��O, Adriano de Castro; D��RIA NETO, Adri��o Duarte; SOUSA, Maria Bernardete Cordeiro de. New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM). Computers in Biology and Medicine, v. 39, p. 853-859, 2009

What to do when K-means clustering fails:a simple yet principled alternative algorithm

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

Epidemic K-Means clustering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.

Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

QK-Means: A clustering technique based on community detection and K-Means for deployment of cluster head nodes

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wireless Sensor Networks (WSN) are a special kind of ad-hoc networks that is usually deployed in a monitoring field in order to detect some physical phenomenon. Due to the low dependability of individual nodes, small radio coverage and large areas to be monitored, the organization of nodes in small clusters is generally used. Moreover, a large number of WSN nodes is usually deployed in the monitoring area to increase WSN dependability. Therefore, the best cluster head positioning is a desirable characteristic in a WSN. In this paper, we propose a hybrid clustering algorithm based on community detection in complex networks and traditional K-means clustering technique: the QK-Means algorithm. Simulation results show that QK-Means detect communities and sub-communities thus lost message rate is decreased and WSN coverage is increased. �� 2012 IEEE.

Scalability of efficient parallel K-Means

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.

Space partitioning for scalable K-means

Relevância:

100.00% 100.00%

Publicador:

Resumo:

K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.

Dynamic load balancing in parallel KD-tree k-means

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Improving Collaborative Filtering Based Recommender Systems Using Pareto Dominance

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Los sistemas de recomendaci��n son un tipo de soluci��n al problema de sobrecarga de informaci��n que sufren los usuarios de los sitios web en los que se pueden votar ciertos art��culos. El sistema de recomendaci��n de filtrado colaborativo es considerado como el m��todo con m��s ��xito debido a que sus recomendaciones se hacen bas��ndose en los votos de usuarios similares a un usuario activo. Sin embargo, el m��todo de filtrado de colaboraci��n tradicional selecciona usuarios insuficientemente representativos como vecinos de cada usuario activo. Esto significa que las recomendaciones hechas a posteriori no son lo suficientemente precisas. El m��todo propuesto en esta tesis realiza un pre-filtrado del proceso, mediante el uso de dominancia de Pareto, que elimina los usuarios menos representativos del proceso de selecci��n k-vecino y mantiene los m��s prometedores. Los resultados de los experimentos realizados en MovieLens y Netflix muestran una mejora significativa en todas las medidas de calidad estudiadas en la aplicaci��n del m��todo propuesto. ABSTRACTRecommender systems are a type of solution to the information overload problem suffered by users of websites on which they can rate certain items. The Collaborative Filtering Recommender System is considered to be the most successful approach as it make its recommendations based on votes of users similar to an active user. Nevertheless, the traditional collaborative filtering method selects insufficiently representative users as neighbors of each active user. This means that the recommendations made a posteriori are not precise enough. The method proposed in this thesis performs a pre-filtering process, by using Pareto dominance, which eliminates the less representative users from the k-neighbor selection process and keeps the most promising ones. The results from the experiments performed on Movielens and Netflix show a significant improvement in all the quality measures studied on applying the proposed method.

Robust initialisation of Gaussian radial basis function networks using partitioned k-means clustering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Radial basis function networks can be trained quickly using linear optimisation once centres and other associated parameters have been initialised. The authors propose a small adjustment to a well accepted initialisation algorithm which improves the network accuracy over a range of problems. The algorithm is described and results are presented.

An Approach to Collaborative Filtering by ARTMAP Neural Networks

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recommender systems are now widely used in e-commerce applications to assist customers to find relevant products from the many that are frequently available. Collaborative filtering (CF) is a key component of many of these systems, in which recommendations are made to users based on the opinions of similar users in a system. This paper presents a model-based approach to CF by using supervised ARTMAP neural networks (NN). This approach deploys formation of reference vectors, which makes a CF recommendation system able to classify user profile patterns into classes of similar profiles. Empirical results reported show that the proposed approach performs better than similar CF systems based on unsupervised ART2 NN or neighbourhood-based algorithm.

Collaborative filtering and deep learning based recommendation system for cold start items

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recommender system is a specific type of intelligent systems, which exploits historical user ratings on items and/or auxiliary information to make recommendations on items to the users. It plays a critical role in a wide range of online shopping, e-commercial services and social networking applications. Collaborative filtering (CF) is the most popular approaches used for recommender systems, but it suffers from complete cold start (CCS) problem where no rating record are available and incomplete cold start (ICS) problem where only a small number of rating records are available for some new items or users in the system. In this paper, we propose two recommendation models to solve the CCS and ICS problems for new items, which are based on a framework of tightly coupled CF approach and deep learning neural network. A specific deep neural network SADE is used to extract the content features of the items. The state of the art CF model, timeSVD++, which models and utilizes temporal dynamics of user preferences and item features, is modified to take the content features into prediction of ratings for cold start items. Extensive experiments on a large Netflix rating dataset of movies are performed, which show that our proposed recommendation models largely outperform the baseline models for rating prediction of cold start items. The two proposed recommendation models are also evaluated and compared on ICS items, and a flexible scheme of model retraining and switching is proposed to deal with the transition of items from cold start to non-cold start status. The experiment results on Netflix movie recommendation show the tight coupling of CF approach and deep learning neural network is feasible and very effective for cold start item recommendation. The design is general and can be applied to many other recommender systems for online shopping and social networking applications. The solution of cold start item problem can largely improve user experience and trust of recommender systems, and effectively promote cold start items.

Clustering algorithms for anti-money laundering using graph theory and social network analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

HEMOLIA (a project under European community��s 7th framework programme) is a new generation Anti-Money Laundering (AML) intelligent multi-agent alert and investigation system which in addition to the traditional financial data makes extensive use of modern society��s huge telecom data source, thereby opening up a new dimension of capabilities to all Money Laundering fighters (FIUs, LEAs) and Financial Institutes (Banks, Insurance Companies, etc.). This Master-Thesis project is done at AIA, one of the partners for the HEMOLIA project in Barcelona. The objective of this thesis is to find the clusters in a network drawn by using the financial data. An extensive literature survey has been carried out and several standard algorithms related to networks have been studied and implemented. The clustering problem is a NP-hard problem and several algorithms like K-Means and Hierarchical clustering are being implemented for studying several problems relating to sociology, evolution, anthropology etc. However, these algorithms have certain drawbacks which make them very difficult to implement. The thesis suggests (a) a possible improvement to the K-Means algorithm, (b) a novel approach to the clustering problem using the Genetic Algorithms and (c) a new algorithm for finding the cluster of a node using the Genetic Algorithm.

«
1
2
3
4
5
6
7
8
...
58
59
»