66 resultados para k-Means algorithm

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to alleviate the traffic congestion and reduce the complexity of traffic control and management, it is necessary to exploit traffic sub-areas division which should be effective in planing traffic. Some researchers applied the K-Means algorithm to divide traffic sub-areas on the taxi trajectories. However, the traditional K-Means algorithms faced difficulties in processing large-scale Global Position System(GPS) trajectories of taxicabs with the restrictions of memory, I/O, computing performance. This paper proposes a Parallel Traffic Sub-Areas Division(PTSD) method which consists of two stages, on the basis of the Parallel K-Means(PKM) algorithm. During the first stage, we develop a process to cluster traffic sub-areas based on the PKM algorithm. Then, the second stage, we identify boundary of traffic sub-areas on the base of cluster result. According to this method, we divide traffic sub-areas of Beijing on the real-word (GPS) trajectories of taxicabs. The experiment and discussion show that the method is effective in dividing traffic sub-areas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The k-means algorithm is a partitional clustering method. Over 60 years old, it has been successfully used for a variety of problems. The popularity of k-means is in large part a consequence of its simplicity and efficiency. In this paper we are inspired by these appealing properties of k-means in the development of a clustering algorithm which accepts the notion of "positively" and "negatively" labelled data. The goal is to discover the cluster structure of both positive and negative data in a manner which allows for the discrimination between the two sets. The usefulness of this idea is demonstrated practically on the problem of face recognition, where the task of learning the scope of a person's appearance should be done in a manner which allows this face to be differentiated from others.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the methodology for identifying moving obstacles by obtaining a reliable and a sparse optical flow from image sequences. Given a sequence of images, basically we can detect two-types of on road vehicles, vehicles traveling in the opposite direction and vehicles traveling in the same direction. For both types, distinct feature points can be detected by Shi and Tomasi corner detector algorithm. Then pyramidal Lucas Kanade method for optical flow calculation is used to match the sparse feature set of one frame on the consecutive frame. By applying k means clustering on four component feature vector, which are to be the coordinates of the feature point and the two components of the optical flow, we can easily calculate the centroids of the clusters and the objects can be easily tracked. The vehicles traveling in the opposite direction produce a diverging vector field, while vehicles traveling in the same direction produce a converging vector field

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Privacy-preserving data mining aims to keep data safe, yet useful. But algorithms providing strong guarantees often end up with low utility. We propose a novel privacy preserving framework that thwarts an adversary from inferring an unknown data point by ensuring that the estimation error is almost invariant to the inclusion/exclusion of the data point. By focusing directly on the estimation error of the data point, our framework is able to significantly lower the perturbation required. We use this framework to propose a new privacy aware K-means clustering algorithm. Using both synthetic and real datasets, we demonstrate that the utility of this algorithm is almost equal to that of the unperturbed K-means, and at strict privacy levels, almost twice as good as compared to the differential privacy counterpart.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs). Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K -Means (Par3PKM) algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K -Means and then employ a MapReduce paradigm to redesign the optimized K -Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K -Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in Web document sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow ƒ1 and ƒ2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, ƒ1 and ƒ2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the limitations of the traditional port-based and payload-based traffic classification approaches, the past decade has seen extensive work on utilizing machine learning techniques to classify network traffic based on packet and flow level features. In particular, previous studies have shown that the unsupervised clustering approach is both accurate and capable of discovering previously unknown application classes. In this paper, we explore the utility of side information in the process of traffic clustering. Specifically, we focus on the flow correlation information that can be efficiently extracted from packet headers and expressed as instance-level constraints, which indicate that particular sets of flows are using the same application and thus should be put into the same cluster. To incorporate the constraints, we propose a modified constrained K-Means algorithm. A variety of real-world traffic traces are used to show that the constraints are widely available. The experimental results indicate that the constrained approach not only improves the quality of the resulted clusters, but also speeds up the convergence of the clustering process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Growing self-organizing map (GSOM) has been introduced as an improvement to the self-organizing map (SOM) algorithm in clustering and knowledge discovery. Unlike the traditional SOM, GSOM has a dynamic structure which allows nodes to grow reflecting the knowledge discovered from the input data as learning progresses. The spread factor parameter (SF) in GSOM can be utilized to control the spread of the map, thus giving an analyst a flexibility to examine the clusters at different granularities. Although GSOM has been applied in various areas and has been proven effective in knowledge discovery tasks, no comprehensive study has been done on the effect of the spread factor parameter value to the cluster formation and separation. Therefore, the aim of this paper is to investigate the effect of the spread factor value towards cluster separation in the GSOM. We used simple k-means algorithm as a method to identify clusters in the GSOM. By using Davies–Bouldin index, clusters formed by different values of spread factor are obtained and the resulting clusters are analyzed. In this work, we show that clusters can be more separated when the spread factor value is increased. Hierarchical clusters can then be constructed by mapping the GSOM clusters at different spread factor values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Courvisanos J., Jain A. and Mardaneh K. Economic resilience of regions under crises: a study of the Australian economy, Regional Studies. Identifying patterns of economic resilience in regions by industry categories is the focus of this paper. Patterns emerge from adaptive capacity in four distinct functional groups of local government regions in Australia, in respect of their resilience from shocks on specific industries. A model of regional adaptive cycles around four sequential phases – reorganization, exploitation, conservation and release – is adopted as the framework for recognizing such patterns. A data-mining method utilizes a k-means algorithm to evaluate the impact of two major shocks – a 13-year drought and the Global Financial Crisis – on four functional groups of regions, using census data from 2001, 2006 and 2011.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We propose a new technique to perform unsupervised data classification (clustering) based on density induced metric and non-smooth optimization. Our goal is to automatically recognize multidimensional clusters of non-convex shape. We present a modification of the fuzzy c-means algorithm, which uses the data induced metric, defined with the help of Delaunay triangulation. We detail computation of the distances in such a metric using graph algorithms. To find optimal positions of cluster prototypes we employ the discrete gradient method of non-smooth optimization. The new clustering method is capable to identify non-convex overlapped d-dimensional clusters.


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The thickness of the retinal nerve fiber layer (RFNL) has become a diagnose measure for glaucoma assessment. To measure this thickness, accurate segmentation of the RFNL in optical coherence tomography (OCT) images is essential. Identification of a suitable segmentation algorithm will facilitate the enhancement of the RNFL thickness measurement accuracy. This paper investigates the performance of six algorithms in the segmentation of RNFL in OCT images. The algorithms are: normalised cuts, region growing, k-means clustering, active contour, level sets segmentation: Piecewise Gaussian Method (PGM) and Kernelized Method (KM). The performance of the six algorithms are determined through a set of experiments on OCT retinal images. An experimental procedure is used to measure the performance of the tested algorithms. The measured segmentation precision-recall results of the six algorithms are compared and discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Stock price forecast has long been received special attention of investors and financial institutions. As stock prices are changeable over time and increasingly uncertain in modern financial markets, their forecasting becomes more important than ever before. A hybrid approach consisting of two components, a neural network and a fuzzy logic system, is proposed in this paper for stock price prediction. The first component of the hybrid, i.e. a feedforward neural network (FFNN), is used to select inputs that are highly relevant to the dependent variables. An interval type-2 fuzzy logic system (IT2 FLS) is employed as the second component of the hybrid forecasting method. The IT2 FLS’s parameters are initialized through deployment of the k-means clustering method and they are adjusted by the genetic algorithm. Experimental results demonstrate the efficiency of the FFNN input selection approach as it reduces the complexity and increase the accuracy of the forecasting models. In addition, IT2 FLS outperforms the widely used type-1 FLS and FFNN models in stock price forecasting. The combination of the FFNN and the IT2 FLS produces dominant forecasting accuracy compared to employing only the IT2 FLSs without the FFNN input selection.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.