150 resultados para Labeling hierarchical clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose in this paper a novel sparse subspace clustering method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points via group sparse coding. We derive simple, provably convergent, and computationally efficient algorithms for solving the proposed group formulations. We demonstrate the advantage of the framework on three challenging benchmark datasets ranging from medical record data to image and text clustering and show that they consistently outperforms rival methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Spinel LiNi0.5Mn1.5O4 hierarchical nanofibers with diameters of 200–500 nm and lengths of up to several tens of micrometers were synthesized using low-cost starting materials by electrospinning combined with annealing. Well-separated nanofiber precursors impede the growth and agglomeration of Li-Ni0.5Mn1.5O4 particles. The hierarchical nanofibers were constructed from attached LiNi0.5Mn1.5O4 nanooctahedrons with sizes ranging from 200 to 400 nm. It is proven that these Li-Ni0.5Mn1.5O4 hierarchical nanofibers exhibit a favorable electrochemical performance. At a 0.5C (coulombic) rate, it shows an initial discharge capacity of 133 mAhg_1 with a capacity retention over 94% after 30 cycles. Even at 2, 5, 10, and 15C rates, it can still deliver a discharge capacity of 115, 100, 90, and 80 mAhg_1, respectively. Compared with self-aggregated nanooctahedrons synthesized using common sol–gel methods, the LiNi0.5Mn1.5O4 hierarchical nanofibers exhibit a much higher capacity. This is owing to the fact that the self-aggregation of the unique nanooctahedron-in-nanofiber structure has been greatly reduced because of the attachment of nanopolyhedrons in the long nanofibers. This unique microstructured cathode results in the large effective contact areas of the active materials, conductive additives and fully realize the advantage of nanomaterial-based cathodes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, significant effort has been given to predicting protein functions from protein interaction data generated from high throughput techniques. However, predicting protein functions correctly and reliably still remains a challenge. Recently, many computational methods have been proposed for predicting protein functions. Among these methods, clustering based methods are the most promising. The existing methods, however, mainly focus on protein relationship modeling and the prediction algorithms that statically predict functions from the clusters that are related to the unannotated proteins. In fact, the clustering itself is a dynamic process and the function prediction should take this dynamic feature of clustering into consideration. Unfortunately, this dynamic feature of clustering is ignored in the existing prediction methods. In this paper, we propose an innovative progressive clustering based prediction method to trace the functions of relevant annotated proteins across all clusters that are generated through the progressive clustering of proteins. A set of prediction criteria is proposed to predict functions of unannotated proteins from all relevant clusters and traced functions. The method was evaluated on real protein interaction datasets and the results demonstrated the effectiveness of the proposed method compared with representative existing methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an integration of a novel document vector representation technique and a novel Growing Self Organizing Process. In this new approach, documents are represented as a low dimensional vector, which is composed of the indices and weights derived from the keywords of the document.

An index based similarity calculation method is employed on this low dimensional feature space and the growing self organizing process is modified to comply with the new feature representation model.

The initial experiments show that this novel integration outperforms the state-of-the-art Self Organizing Map based techniques of text clustering in terms of its efficiency while preserving the same accuracy level.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Text clustering can be considered as a four step process consisting of feature extraction, text representation, document clustering and cluster interpretation. Most text clustering models consider text as an unordered collection of words. However the semantics of text would be better captured if word sequences are taken into account.

In this paper we propose a sequence based text clustering model where four novel sequence based components are introduced in each of the four steps in the text clustering process.

Experiments conducted on the Reuters dataset and Sydney Morning Herald (SMH) news archives demonstrate the advantage of the proposed sequence based model, in terms of capturing context with semantics, accuracy and speed, compared to clustering of documents based on single words and n-gram based models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multimedia contents often possess weakly annotated data such as tags, links and interactions. The weakly annotated data is called side information. It is the auxiliary information of data and provides hints for exploring the link structure of data. Most clustering algorithms utilize pure data for clustering. A model that combines pure data and side information, such as images and tags, documents and keywords, can perform better at understanding the underlying structure of data. We demonstrate how to incorporate different types of side information into a recently proposed Bayesian nonparametric model, the distance dependent Chinese restaurant process (DD-CRP). Our algorithm embeds the affinity of this information into the decay function of the DD-CRP when side information is in the form of subsets of discrete labels. It is flexible to measure distance based on arbitrary side information instead of only the spatial layout or time stamp of observations. At the same time, for noisy and incomplete side information, we set the decay function so that the DD-CRP reduces to the traditional Chinese restaurant process, thus not inducing side effects of noisy and incomplete side information. Experimental evaluations on two real-world datasets NUS WIDE and 20 Newsgroups show exploiting side information in DD-CRP significantly improves the clustering performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The k-means algorithm is a partitional clustering method. Over 60 years old, it has been successfully used for a variety of problems. The popularity of k-means is in large part a consequence of its simplicity and efficiency. In this paper we are inspired by these appealing properties of k-means in the development of a clustering algorithm which accepts the notion of "positively" and "negatively" labelled data. The goal is to discover the cluster structure of both positive and negative data in a manner which allows for the discrimination between the two sets. The usefulness of this idea is demonstrated practically on the problem of face recognition, where the task of learning the scope of a person's appearance should be done in a manner which allows this face to be differentiated from others.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of this article is to examine evidence of stock price clustering on the South Pacific Stock Exchange, located in Fiji, and explore its determinants. We find that stock prices cluster at the decimal of 0 and 5, with almost half of prices settling on these two decimals. Upon investigating the determinants of price clustering on the South Pacific Stock Exchange we find that price level and volume of trade have a statistically significant positive effect on price clustering. We also propose and test a ‘panic trading’ hypothesis which states political instability induces price clustering. We find evidence that political instability in Fiji induces price clustering behaviour.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new approach is presented for calculating the parent orientation from sets of variants of orientations produced by phase transformation. The parent austenite orientation is determined using the orientations of bainite variants that transformed from a single parent austenite grain. In this approach, the five known orientation relationships are used to back transform each observed bainite variant to all their potential face-centered-cubic (f.c.c.) parent orientations. A set of potential f.c.c. orientations has one representative from each bainite variant, and each set is assembled on the basis of minimum mutual misorientation. The set of back-transformed orientations with the minimum summation of mutual misorientation angle (SMMA) is selected as the most probable parent (austenite) orientation. The availability of multiple sets permits a confidence index to be calculated from the best and next best fits to a parent orientation. The results show good agreement between the measured parent austenite orientation and the calculated parent orientation having minimum SMMA.