980 resultados para Labeling hierarchical clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multimedia contents often possess weakly annotated data such as tags, links and interactions. The weakly annotated data is called side information. It is the auxiliary information of data and provides hints for exploring the link structure of data. Most clustering algorithms utilize pure data for clustering. A model that combines pure data and side information, such as images and tags, documents and keywords, can perform better at understanding the underlying structure of data. We demonstrate how to incorporate different types of side information into a recently proposed Bayesian nonparametric model, the distance dependent Chinese restaurant process (DD-CRP). Our algorithm embeds the affinity of this information into the decay function of the DD-CRP when side information is in the form of subsets of discrete labels. It is flexible to measure distance based on arbitrary side information instead of only the spatial layout or time stamp of observations. At the same time, for noisy and incomplete side information, we set the decay function so that the DD-CRP reduces to the traditional Chinese restaurant process, thus not inducing side effects of noisy and incomplete side information. Experimental evaluations on two real-world datasets NUS WIDE and 20 Newsgroups show exploiting side information in DD-CRP significantly improves the clustering performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The k-means algorithm is a partitional clustering method. Over 60 years old, it has been successfully used for a variety of problems. The popularity of k-means is in large part a consequence of its simplicity and efficiency. In this paper we are inspired by these appealing properties of k-means in the development of a clustering algorithm which accepts the notion of "positively" and "negatively" labelled data. The goal is to discover the cluster structure of both positive and negative data in a manner which allows for the discrimination between the two sets. The usefulness of this idea is demonstrated practically on the problem of face recognition, where the task of learning the scope of a person's appearance should be done in a manner which allows this face to be differentiated from others.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of this article is to examine evidence of stock price clustering on the South Pacific Stock Exchange, located in Fiji, and explore its determinants. We find that stock prices cluster at the decimal of 0 and 5, with almost half of prices settling on these two decimals. Upon investigating the determinants of price clustering on the South Pacific Stock Exchange we find that price level and volume of trade have a statistically significant positive effect on price clustering. We also propose and test a ‘panic trading’ hypothesis which states political instability induces price clustering. We find evidence that political instability in Fiji induces price clustering behaviour.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new approach is presented for calculating the parent orientation from sets of variants of orientations produced by phase transformation. The parent austenite orientation is determined using the orientations of bainite variants that transformed from a single parent austenite grain. In this approach, the five known orientation relationships are used to back transform each observed bainite variant to all their potential face-centered-cubic (f.c.c.) parent orientations. A set of potential f.c.c. orientations has one representative from each bainite variant, and each set is assembled on the basis of minimum mutual misorientation. The set of back-transformed orientations with the minimum summation of mutual misorientation angle (SMMA) is selected as the most probable parent (austenite) orientation. The availability of multiple sets permits a confidence index to be calculated from the best and next best fits to a parent orientation. The results show good agreement between the measured parent austenite orientation and the calculated parent orientation having minimum SMMA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the problem of resource scheduling in a grid computing environment. One of the main goals of grid computing is to share system resources among geographically dispersed users, and schedule resource requests in an efficient manner. Grid computing resources are distributed, heterogeneous, dynamic, and autonomous, which makes resource scheduling a complex problem. This paper proposes a new approach to resource scheduling in grid computing environments, the hierarchical stochastic Petri net (HSPN). The HSPN optimizes grid resource sharing, by categorizing resource requests in three layers, where each layer has special functions for receiving subtasks from, and delivering data to, the layer above or below. We compare the HSPN performance with the Min-min and Max-min resource scheduling algorithms. Our results show that the HSPN performs better than Max-min, but slightly underperforms Min-min.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, an approach for profiling email-born phishing activities is proposed. Profiling phishing activities are useful in determining the activity of an individual or a particular group of phishers. By generating profiles, phishing activities can be well understood and observed. Typically, work in the area of phishing is intended at detection of phishing emails, whereas we concentrate on profiling the phishing email. We formulate the profiling problem as a clustering problem using the various features in the phishing emails as feature vectors. Further, we generate profiles based on clustering predictions. These predictions are further utilized to generate complete profiles of these emails. The performance of the clustering algorithms at the earlier stage is crucial for the effectiveness of this model. We carried out an experimental evaluation to determine the performance of many classification algorithms by incorporating clustering approach in our model. Our proposed profiling email-born phishing algorithm (ProEP) demonstrates promising results with the RatioSize rules for selecting the optimal number of clusters.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cloud computing is experiencing phenomenal growth and there are now many vendors offering their cloud services. In cloud computing, cloud providers cooperate together to offer their computing resource as a utility and software as a service to customers. The demands and the price of cloud service should be negotiated between providers and users based on the Service Level Agreement (SLA). In order to help cloud providers achieving an agreeable price for their services and maximizing the benefits of both cloud providers and clients, this paper proposes a cloud pricing system consisting of hierarchical system, M/M/c queuing model and pricing model. Simulation results verify the efficiency of our proposed system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper explores effective multi-label classification methods for multi-semantic image and text categorization. We perform an experimental study of clustering based multi-label classification (CBMLC) for the target problem. Experimental evaluation is conducted for identifying the impact of different clustering algorithms and base classifiers on the predictive performance and efficiency of CBMLC. In the experimental setting, three widely used clustering algorithms and six popular multi-label classification algorithms are used and evaluated on multi-label image and text datasets. A multi-label classification evaluation metrics, micro F1-measure, is used for presenting predictive performances of the classifications. Experimental evaluation results reveal that clustering based multi-label learning algorithms are more effective compared to their non-clustering counterparts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A hierarchical intrusion detection model is proposed to detect both anomaly and misuse attacks. In order to further speed up the training and testing, PCA-based feature extraction algorithm is used to reduce the dimensionality of the data. A PCA-based algorithm is used to filter normal data out in the upper level. The experiment results show that PCA can reduce noise in the original data set and the PCA-based algorithm can reach the desirable performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The recent years have seen extensive work on statistics-based network traffic classification using machine learning (ML) techniques. In the particular scenario of learning from unlabeled traffic data, some classic unsupervised clustering algorithms (e.g. K-Means and EM) have been applied but the reported results are unsatisfactory in terms of low accuracy. This paper presents a novel approach for the task, which performs clustering based on Random Forest (RF) proximities instead of Euclidean distances. The approach consists of two steps. In the first step, we derive a proximity measure for each pair of data points by performing a RF classification on the original data and a set of synthetic data. In the next step, we perform a K-Medoids clustering to partition the data points into K groups based on the proximity matrix. Evaluations have been conducted on real-world Internet traffic traces and the experimental results indicate that the proposed approach is more accurate than the previous methods.