121 resultados para clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Utility companies provide electricity to a large number of consumers. These companies need to have an accurate forecast of the next day electricity demand. Any forecast errors will result in either reliability issues or increased costs for the company. Because of the widespread roll-out of smart meters, a large amount of high resolution consumption data is now accessible which was not available in the past. This new data can be used to improve the load forecast and as a result increase the reliability and decrease the expenses of electricity providers. In this paper, a number of methods for improving load forecast using smart meter data are discussed. In these methods, consumers are first divided into a number of clusters. Then a neural network is trained for each cluster and forecasts of these networks are added together in order to form the prediction for the aggregated load. In this paper, it is demonstrated that clustering increases the forecast accuracy significantly. Criteria used for grouping consumers play an important role in this process. In this work, three different feature selection methods for clustering consumers are explained and the effect of feature extraction methods on forecast error is investigated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is applied in wireless sensor networks for increasing energy efficiency. Clustering methods in wireless sensor networks are different from those in traditional data mining systems. This paper proposes a novel clustering algorithm based on Minimal Spanning Tree (MST) and Maximum Energy resource on sensors named MSTME. Also, specified constrains of clustering in wireless sensor networks and several evaluation metrics are given. MSTME performs better than already known clustering methods of Low Energy Adaptive Clustering Hierarchy (LEACH) and Base Station Controlled Dynamic Clustering Protocol (BCDCP) in wireless sensor networks when they are evaluated by these evaluation metrics. Simulation results show MSTME increases energy efficiency and network lifetime compared with LEACH and BCDCP in two-hop and multi-hop networks, respectively. © World Scientific Publishing Company.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel Cluster Heads (CH) choosing algorithm based on both Minimal Spanning Tree and Maximum Energy resource on sensors, named MSTME, is provided for prolonging lifetime of wireless sensor networks. MSTME can satisfy three principles of optimal CHs: to have the most energy resource among sensors in local clusters, to group approximately the same number of closer sensors into clusters, and to distribute evenly in the networks in terms of location. Simulation shows the network lifetime in MSTME excels its counterparts in two-hop and multi-hop wireless sensor networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cluster analysis has been identified as a core task in data mining. What constitutes a cluster, or a good clustering, may depend on the background of researchers and applications. This paper proposes two optimization criteria of abstract degree and fidelity in the field of image abstract. To satisfy the fidelity criteria, a novel clustering algorithm named Global Optimized Color-based DBSCAN Clustering (GOC-DBSCAN) is provided. Also, non-optimized local color information based version of GOC-DBSCAN, called HSV-DBSCAN, is given. Both of them are based on HSV color space. Clusters of GOC-DBSCAN are analyzed to find the factors that impact on the performance of both abstract degree and fidelity. Examples show generally the greater the abstract degree is, the less is the fidelity. It also shows GOC-DBSCAN outperforms HSV-DBSCAN when they are evaluated by the two optimization criteria.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a novel approach is proposed to automatically generate both watercolor painting and pencil sketch drawing, or binary image of contour, from realism-style photo by using DBSCAN color clustering based on HSV color space. While the color clusters produced by proposed methods help to create watercolor painting, the noise pixels are useful to generate the pencil sketch drawing. Moreover, noise pixels are reassigned to color clusters by a novel algorithm to refine the contour in the watercolor painting. The main goal of this paper is to inspire non-professional artists' imagination to produce traditional style painting easily by only adjusting a few parameters. Also, another contribution of this paper is to propose an easy method to produce the binary image of contour, which is a vice product when mining image data by DBSCAN clustering. Thus the binary image is useful in resource limited system to reduce data but keep enough information of images. © 2007 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recognition of multiple moving objects is a very important task for achieving user-cared knowledge to send to the base station in wireless video-based sensor networks. However, video based sensor nodes, which have constrained resources and produce huge amount of video streams continuously, bring a challenge to segment multiple moving objects from the video stream online. Traditional efficient clustering algorithms such as DBSCAN cannot run time-efficiently and even fail to run on limited memory space on sensor nodes, because the number of pixel points is too huge. This paper provides a novel algorithm named Inter-Frame Change Directing Online clustering (IFCDO clustering) for segmenting multiple moving objects from video stream on sensor nodes. IFCDO clustering only needs to group inter-frame different pixels, thus it reduces both space and time complexity while achieves robust clusters the same as DBSCAN. Experiment results show IFCDO clustering excels DBSCAN in terms of both time and space efficiency. © 2008 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent development of characterisation techniques and computer simulation has extended our ability to access atomic scale information regarding materials microstructure evolution. New results from such techniques have significantly progressed our knowledge about solute behaviour during the earliest stages of decomposition of the solid solution. This chapter updates current understanding about solute clustering and discusses the effect of solute clustering and micro-alloying on precipitate microstructure evolution in aluminium alloys. In addition, a brief review is given on the effect of severe plastic deformation on precipitate evolution in Al alloys.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reports robustness comparison of clustering-based multi-label classification methods versus nonclustering counterparts for multi-concept associated image and video annotations. In the experimental setting of this paper, we adopted six popular multi-label classification Algorithms, two different base classifiers for problem transformation based multilabel classifications, and three different clustering algorithms for pre-clustering of the training data. We conducted experimental evaluation on two multi-label benchmark datasets: scene image data and mediamill video data. We also employed two multi-label classification evaluation metrics, namely, micro F1-measure and Hamming-loss to present the predictive performance of the classifications. The results reveal that different base classifiers and clustering methods contribute differently to the performance of the multi-label classifications. Overall, the pre-clustering methods improve the effectiveness of multi-label classifications in certain experimental settings. This provides vital information to users when deciding which multi-label classification method to choose for multiple-concept associated image and video annotations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analysis of crowd behaviour in public places is an indispensable tool for video surveillance. Automated detection of anomalous crowd behaviour is a critical problem with the increase in human population. Anomalous events may include a person loitering about a place for unusual amounts of time; people running and causing panic; the size of a group of people growing over time etc. In this work, to detect anomalous events and objects, two types of feature coding has been proposed: spatial features and spatio-temporal features. Spatial features comprises of contrast, correlation, energy and homogeneity, which are derived from Gray Level Co-occurrence Matrix (GLCM). Spatio-temporal feature includes the time spent by an object at different locations in the scene. Hyperspherical clustering has been employed to detect the anomalies. Spatial features revealed the anomalous frames by using contrast and homogeneity measures. Loitering behaviour of the people were detected as anomalous objects using the spatio-temporal coding.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multilevel clustering problems where the con-tent and contextual information are jointly clustered are ubiquitous in modern datasets. Existing works on this problem are limited to small datasets due to the use of the Gibbs sampler. We address the problem of scaling up multi-level clustering under a Bayesian nonparametric setting, extending the MC2 model proposed in (Nguyen et al., 2014). We ground our approach in structured mean-field and stochastic variational inference (SVI) and develop a tree-structured SVI algorithm that exploits the interplay between content and context modeling. Our new algorithm avoids the need to repeatedly go through the corpus as in Gibbs sampler. More crucially, our method is immediately amendable to parallelization, facilitating a scalable distributed implementation on the Apache Spark platform. We conduct extensive experiments in a variety of domains including text, images, and real-world user application activities. Direct comparison with the Gibbs-sampler demonstrates that our method is an order-of-magnitude faster without loss of model quality. Our Spark-based implementation gains an-other order-of-magnitude speedup and can scale to large real-world datasets containing millions of documents and groups.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: To characterise clusters of individuals based on adherence to dietary recommendations and to determine whether changes in Healthy Eating Index (HEI) scores in response to a personalised nutrition (PN) intervention varied between clusters.

DESIGN: Food4Me study participants were clustered according to whether their baseline dietary intakes met European dietary recommendations. Changes in HEI scores between baseline and month 6 were compared between clusters and stratified by whether individuals received generalised or PN advice.

SETTING: Pan-European, Internet-based, 6-month randomised controlled trial.

SUBJECTS: Adults aged 18-79 years (n 1480).

RESULTS: Individuals in cluster 1 (C1) met all recommended intakes except for red meat, those in cluster 2 (C2) met two recommendations, and those in cluster 3 (C3) and cluster 4 (C4) met one recommendation each. C1 had higher intakes of white fish, beans and lentils and low-fat dairy products and lower percentage energy intake from SFA (P<0·05). C2 consumed less chips and pizza and fried foods than C3 and C4 (P<0·05). C1 were lighter, had lower BMI and waist circumference than C3 and were more physically active than C4 (P<0·05). More individuals in C4 were smokers and wanted to lose weight than in C1 (P<0·05). Individuals who received PN advice in C4 reported greater improvements in HEI compared with C3 and C1 (P<0·05).

CONCLUSIONS: The cluster where the fewest recommendations were met (C4) reported greater improvements in HEI following a 6-month trial of PN whereas there was no difference between clusters for those randomised to the Control, non-personalised dietary intervention.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Anomaly detection as a kind of intrusion detection is good at detecting the unknown attacks or new attacks, and it has attracted much attention during recent years. In this paper, a new hierarchy anomaly intrusion detection model that combines the fuzzy c-means (FCM) based on genetic algorithm and SVM is proposed. During the process of detecting intrusion, the membership function and the fuzzy interval are applied to it, and the process is extended to soft classification from the previous hard classification. Then a fuzzy error correction sub interval is introduced, so when the detection result of a data instance belongs to this range, the data will be re-detected in order to improve the effectiveness of intrusion detection. Experimental results show that the proposed model can effectively detect the vast majority of network attack types, which provides a feasible solution for solving the problems of false alarm rate and detection rate in anomaly intrusion detection model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a novel application of Visual Assessment of Tendency (VAT)-based hierarchical clustering algorithms (VAT, iVAT, and clusiVAT) for trajectory analysis. We introduce a new clustering based anomaly detection framework named iVAT+ and clusiVAT+ and use it for trajectory anomaly detection. This approach is based on partitioning the VAT-generated Minimum Spanning Tree based on an efficient thresholding scheme. The trajectories are classified as normal or anomalous based on the number of paths in the clusters. On synthetic datasets with fixed and variable numbers of clusters and anomalies, we achieve 98 % classification accuracy. Our two-stage clusiVAT method is applied to 26,039 trajectories of vehicles and pedestrians from a parking lot scene from the real life MIT trajectories dataset. The first stage clusters the trajectories ignoring directionality. The second stage divides the clusters obtained from the first stage by considering trajectory direction. We show that our novel two-stage clusiVAT approach can produce natural and informative trajectory clusters on this real life dataset while finding representative anomalies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering of big data has received much attention recently. In this paper, we present a new clusiVAT algorithm and compare it with four other popular data clustering algorithms. Three of the four comparison methods are based on the well known, classical batch k-means model. Specifically, we use k-means, single pass k-means, online k-means, and clustering using representatives (CURE) for numerical comparisons. clusiVAT is based on sampling the data, imaging the reordered distance matrix to estimate the number of clusters in the data visually, clustering the samples using a relative of single linkage (SL), and then noniteratively extending the labels to the rest of the data-set using the nearest prototype rule. Previous work has established that clusiVAT produces true SL clusters in compact-separated data. We have performed experiments to show that k-means and its modified algorithms suffer from initialization issues that cause many failures. On the other hand, clusiVAT needs no initialization, and almost always finds partitions that accurately match ground truth labels in labeled data. CURE also finds SL type partitions but is much slower than the other four algorithms. In our experiments, clusiVAT proves to be the fastest and most accurate of the five algorithms; e.g., it recovers 97% of the ground truth labels in the real world KDD-99 cup data (4 292 637 samples in 41 dimensions) in 76 s.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering.