153 resultados para Incremental Clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we address the problem of learning Gaussian Mixture Models (GMMs) incrementally. Unlike previous approaches which universally assume that new data comes in blocks representable by GMMs which are then merged with the current model estimate, our method works for the case when novel data points arrive one- by-one, while requiring little additional memory. We keep only two GMMs in the memory and no historical data. The current fit is updated with the assumption that the number of components is fixed which is increased (or reduced) when enough evidence for a new component is seen. This is deducedfrom the change from the oldest fit of the same complexity, termed the Historical GMM, the concept of which is central to our method. The performance of the proposed method is demonstrated qualitatively and quantitatively on several synthetic data sets and video sequences of faces acquired in realistic imaging conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Online blind source separation (BSS) is proposed to overcome the high computational cost problem, which limits the practical applications of traditional batch BSS algorithms. However, the existing online BSS methods are mainly used to separate independent or uncorrelated sources. Recently, nonnegative matrix factorization (NMF) shows great potential to separate the correlative sources, where some constraints are often imposed to overcome the non-uniqueness of the factorization. In this paper, an incremental NMF with volume constraint is derived and utilized for solving online BSS. The volume constraint to the mixing matrix enhances the identifiability of the sources, while the incremental learning mode reduces the computational cost. The proposed method takes advantage of the natural gradient based multiplication updating rule, and it performs especially well in the recovery of dependent sources. Simulations in BSS for dual-energy X-ray images, online encrypted speech signals, and high correlative face images show the validity of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, an approach for profiling email-born phishing activities is proposed. Profiling phishing activities are useful in determining the activity of an individual or a particular group of phishers. By generating profiles, phishing activities can be well understood and observed. Typically, work in the area of phishing is intended at detection of phishing emails, whereas we concentrate on profiling the phishing email. We formulate the profiling problem as a clustering problem using the various features in the phishing emails as feature vectors. Further, we generate profiles based on clustering predictions. These predictions are further utilized to generate complete profiles of these emails. The performance of the clustering algorithms at the earlier stage is crucial for the effectiveness of this model. We carried out an experimental evaluation to determine the performance of many classification algorithms by incorporating clustering approach in our model. Our proposed profiling email-born phishing algorithm (ProEP) demonstrates promising results with the RatioSize rules for selecting the optimal number of clusters.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper explores effective multi-label classification methods for multi-semantic image and text categorization. We perform an experimental study of clustering based multi-label classification (CBMLC) for the target problem. Experimental evaluation is conducted for identifying the impact of different clustering algorithms and base classifiers on the predictive performance and efficiency of CBMLC. In the experimental setting, three widely used clustering algorithms and six popular multi-label classification algorithms are used and evaluated on multi-label image and text datasets. A multi-label classification evaluation metrics, micro F1-measure, is used for presenting predictive performances of the classifications. Experimental evaluation results reveal that clustering based multi-label learning algorithms are more effective compared to their non-clustering counterparts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The recent years have seen extensive work on statistics-based network traffic classification using machine learning (ML) techniques. In the particular scenario of learning from unlabeled traffic data, some classic unsupervised clustering algorithms (e.g. K-Means and EM) have been applied but the reported results are unsatisfactory in terms of low accuracy. This paper presents a novel approach for the task, which performs clustering based on Random Forest (RF) proximities instead of Euclidean distances. The approach consists of two steps. In the first step, we derive a proximity measure for each pair of data points by performing a RF classification on the original data and a set of synthetic data. In the next step, we perform a K-Medoids clustering to partition the data points into K groups based on the proximity matrix. Evaluations have been conducted on real-world Internet traffic traces and the experimental results indicate that the proposed approach is more accurate than the previous methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nine trained contemporary dancers performed a modality-specific, heart-rate-monitored, choreographed fatiguing dance protocol with an assumption of fatigue at volitional exhaustion (RPE 16). Postural stability was assessed as the variability of ground reaction forces and the centre of pressure during the performance of a flat-foot arabesque. Psychological response was assessed using self-reported fatigue, psychological distress (PD), and psychological well-being (PWB) (Subjective Exercise Experience Scale). After reaching RPE 16 in 15.7 ± 2.6 mins, heart rate decreased to the post-warm-up level within 64 ± 9 sec. Variability of ground reaction forces or the centre of pressure was not changed. There were no significant changes in fatigue, psychological distress, or psychological well-being. Within fatigue, there was a significant increase in the item tired (p = 0.04). As supported by the heart rate data and RPE, the protocol achieved an appropriate level of physical demand. No changes in the stability indices were observed, possibly attributed to the rapid recovery in heart rate. The expression of only tiredness suggests the use of a disassociative attentional style by the dancers. The project represents pilot work toward the validation of a monitoring process that supports dancer health and awareness training.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bending in a V-die has been well covered in the literature and the results have been used to indicate the out-come of bending in cold roll forming. However, recent work comparing springback between roll forming and single step bending has found lower springback in the roll forming process compared to single step bending. Roll forming is an incremental bending process and in this study a V-section was formed in a single operation and in multiple steps and the springback determined. The springback in V-die forming was significantly reduced by incremental forming. This suggests that the lower springback determined in roll forming compared to single step bending may be related to the incremental nature of the roll forming process.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The internet age has fuelled an enormous explosion in the amount of information generated by humanity. Much of this information is transient in nature, created to be immediately consumed and built upon (or discarded). The field of data mining is surprisingly scant with algorithms that are geared towards the unsupervised knowledge extraction of such dynamic data streams. This chapter describes a new neural network algorithm inspired by self-organising maps. The new algorithm is a hybrid algorithm from the growing self-organising map (GSOM) and the cellular probabilistic self-organising map (CPSOM). The result is an algorithm which generates a dynamically growing feature map for the purpose of clustering dynamic data streams and tracking clusters as they evolve in the data stream.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study investigated the relationship between the Big 5, measured at factor and facet levels, and dimensions of both psychological and subjective well-being. Three hundred and thirty-seven participants completed the 30 Facet International Personality Item Pool Scale, Satisfaction with Life Scale, Positive and Negative Affectivity Schedule, and Ryff’s Scales of Psychological Well-Being. Cross-correlation decomposition presented a parsimonious picture of how well-being is related to personality factors. Incremental facet prediction was examined using double-adjusted r2 confidence intervals and semi-partial correlations. Incremental prediction by facets over factors ranged from almost nothing to a third more variance explained, suggesting a more modest incremental prediction than presented in the literature previously. Examination of semi-partial correlations controlling for factors revealed a small number of important facet-well-being correlations. All data and R analysis scripts are made available in an online repository.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

 Many researchers have argued that higher order models of personality such as the Five Factor Model are insufficient, and that facet-level analysis is required to better understand criteria such as well-being, job performance, and personality disorders. However, common methods in the extant literature used to estimate the incremental prediction of facets over factors have several shortcomings. This paper delineates these shortcomings by evaluating alternative methods using statistical theory, simulation, and an empirical example. We recommend using differences between Olkin-Pratt adjusted r-squared for factor versus facet regression models to estimate the incremental prediction of facets and present a method for obtaining confidence intervals for such estimates using double adjusted-. r-squared bootstrapping. We also provide an R package that implements the proposed methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice.