153 resultados para Incremental Clustering


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a document clustering framework incorporating instance-level knowledge in the form of pairwise constraints and attribute-level knowledge in the form of keyphrases. Firstly, we initialize weights based on metric learning with pairwise constraints, then simultaneously learn two kinds of knowledge by combining the distance-based and the constraint-based approaches, finally evaluate and select clustering result based on the degree of users’ satisfaction. The experimental results demonstrate the effectiveness and potential of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Much has been written and researched about transformational change and the exogenous events that result in radical institutional transformation. This paper examines institutions as building blocks of social order comprising power and politics and shared understanding to bring about change. Thelen and Mahoney (2010) go beyond a general model of change that describes the collapse of one set of institutional norms to be replaced by another. The model of change proposed takes into account both exogenous as well as endogenous factors as being the source of institutional change. They go on to state that a view of transformation change as being a result of abrupt, wholesale breakdown needs to be rethought to include incremental, endogenous shifts in thinking that can often result in fundamental transformations. This paper gives consideration to these issues to propose the Australian Higher Education sector as a unique sample in which to investigate this type of change.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow ƒ1 and ƒ2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, ƒ1 and ƒ2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we discuss combining incremental learning and incremental recognition to classify patterns consisting of multiple objects, each represented by multiple spatio-temporal features. Importantly the technique allows for ambiguity in terms of the positions of the start and finish of the pattern. This involves a progressive classification which considers the data at each time instance in the query and thus provides a probable answer before all the query information becomes available. We present two methods that combine incremental learning and incremental recognition: a time instance method and an overall best match method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines the recovery of user context in indoor environmnents with existing wireless infrastructures to enable assistive systems. We present a novel approach to the extraction of user context, casting the problem of context recovery as an unsupervised, clustering problem. A well known density-based clustering technique, DBSCAN, is adapted to recover user context that includes user motion state, and significant places the user visits from WiFi observations consisting of access point id and signal strength. Furthermore, user rhythms or sequences of places the user visits periodically are derived from the above low level contexts by employing state-of-the-art probabilistic clustering technique, the Latent Dirichiet Allocation (LDA), to enable a variety of application services. Experimental results with real data are presented to validate the proposed unsupervised learning approach and demonstrate its applicability.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an agent-oriented approach to the meeting scheduling problem and propose an incremental negotiation scheme that makes use of a hierarchical structure of an individual agent's working knowledge. First, we formalise the meeting scheduling problem in a multi-agent context, then elaborate on the design of a common agent architecture of all agents in the system. As a result, each agent becomes a modularised computing unit yet possesses high autonomy and robust interface with other agents. The system reserves the meeting participants' privacy since there are no agents with dominant roles, and agents can communicate at an abstract level in their hierarchical structures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article presents experimental results devoted to a new application of the novel clustering technique introduced by the authors recently. Our aim is to facilitate the application of robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on the particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, we use a consensus function to combine these independent clusterings into one consensus clustering . Feature ranking is used to select a subset of features for the consensus function. Third, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of three consensus functions, Cluster-Based Graph Formulation (CBGF), Hybrid Bipartite Graph Formulation (HBGF), and Instance-Based Graph Formulation (IBGF) and a variety of supervised classification algorithms. The best precision and recall have been obtained by the combination of the HBGF consensus function and the SMO classifier with the polynomial kernel.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, the Two-Dimensional Principal Component Analysis (2DPCA) model is proposed and proved to be an efficient approach for face recognition. In this paper, we will investigate the incremental 2DPCA and develop a new constructive method for incrementally adding observation to the existing eigen-space model. An explicit formula for incremental learning is derived. In order to illustrate the effectiveness of the proposed approach, we performed some typical experiments and show that we can only keep the eigen-space of previous images and discard the raw images in the face recognition process. Furthermore, this proposed incremental approach is faster when compared to the batch method (2DPCD) and the recognition rate and reconstruction accuracy are as good as those obtained by the batch method.