786 resultados para Incremental Clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum  in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, {\it MixKMeans}, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This papers examines the use of trajectory distance measures and clustering techniques to define normal
and abnormal trajectories in the context of pedestrian tracking in public spaces. In order to detect abnormal
trajectories, what is meant by a normal trajectory in a given scene is firstly defined. Then every trajectory
that deviates from this normality is classified as abnormal. By combining Dynamic Time Warping and a
modified K-Means algorithms for arbitrary-length data series, we have developed an algorithm for trajectory
clustering and abnormality detection. The final system performs with an overall accuracy of 83% and 75%
when tested in two different standard datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND:  We used four years of paediatric severe acute respiratory illness (SARI) sentinel surveillance in Blantyre, Malawi to identify factors associated with clinical severity and co-viral clustering.

METHODS:  From January 2011 to December 2014, 2363 children aged 3 months to 14 years presenting to hospital with SARI were enrolled. Nasopharyngeal aspirates were tested for influenza and other respiratory viruses. We assessed risk factors for clinical severity and conducted clustering analysis to identify viral clusters in children with co-viral detection.

RESULTS:  Hospital-attended influenza-positive SARI incidence was 2.0 cases per 10,000 children annually; it was highest children aged under 1 year (6.3 cases per 10,000), and HIV-infected children aged 5 to 9 years (6.0 cases per 10,000). 605 (26.8%) SARI cases had warning signs, which were positively associated with HIV infection (adjusted risk ratio [aRR]: 2.4, 95% CI: 1.4, 3.9), RSV infection (aRR: 1.9, 95% CI: 1.3, 3.0) and rainy season (aRR: 2.4, 95% CI: 1.6, 3.8). We identified six co-viral clusters; one cluster was associated with SARI with warning signs.

CONCLUSIONS:  Influenza vaccination may benefit young children and HIV infected children in this setting. Viral clustering may be associated with SARI severity; its assessment should be included in routine SARI surveillance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of resource selection in clustered Peer-to-Peer Information Retrieval (P2P IR) networks with cooperative peers. The clustered P2P IR framework presents a significant departure from general P2P IR architectures by employing clustering to ensure content coherence between resources at the resource selection layer, without disturbing document allocation. We propose that such a property could be leveraged in resource selection by adapting well-studied and popular inverted lists for centralized document retrieval. Accordingly, we propose the Inverted PeerCluster Index (IPI), an approach that adapts the inverted lists, in a straightforward manner, for resource selection in clustered P2P IR. IPI also encompasses a strikingly simple peer-specific scoring mechanism that exploits the said index for resource selection. Through an extensive empirical analysis on P2P IR testbeds, we establish that IPI competes well with the sophisticated state-of-the-art methods in virtually every parameter of interest for the resource selection task, in the context of clustered P2P IR.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

[ES]En un problema complejo como es el reconocimiento de caras, donde se utilizan espacios de representación de un número alto de dimensiones, es de gran importancia emplear toda la información disponible. En particular, habrá casos en los que, estando el sistema de reconocimiento en funcionamiento, se acumulará una gran cantidad de información en forma de imágenes faciales con su etiqueta asociada. En este trabajo se propone un algoritmo para utilizar esta información, caso de estar disponible. El algoritmo presenta la principal característica de ser incremental, con lo que la e ciencia del mismo no degenera con el número de imágenes acumuladas...

Relevância:

20.00% 20.00%

Publicador:

Resumo:

[EN]In face recognition, where high-dimensional representation spaces are generally used, it is very important to take advantage of all the available information. In particular, many labelled facial images will be accumulated while the recognition system is functioning, and due to practical reasons some of them are often discarded. In this paper, we propose an algorithm for using this information. The algorithm has the fundamental characteristic of being incremental. On the other hand, the algorithm makes use of a combination of classification results for the images in the input sequence. Experiments with sequences obtained with a real person detection and tracking system allow us to analyze the performance of the algorithm, as well as its potential improvements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Twitter System is the biggest social network in the world, and everyday millions of tweets are posted and talked about, expressing various views and opinions. A large variety of research activities have been conducted to study how the opinions can be clustered and analyzed, so that some tendencies can be uncovered. Due to the inherent weaknesses of the tweets - very short texts and very informal styles of writing - it is rather hard to make an investigation of tweet data analysis giving results with good performance and accuracy. In this paper, we intend to attack the problem from another aspect - using a two-layer structure to analyze the twitter data: LDA with topic map modelling. The experimental results demonstrate that this approach shows a progress in twitter data analysis. However, more experiments with this method are expected in order to ensure that the accurate analytic results can be maintained.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper introduces a new stochastic clustering methodology devised for the analysis of categorized or sorted data. The methodology reveals consumers' common category knowledge as well as individual differences in using this knowledge for classifying brands in a designated product class. A small study involving the categorization of 28 brands of U.S. automobiles is presented where the results of the proposed methodology are compared with those obtained from KMEANS clustering. Finally, directions for future research are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reverse engineering is usually the stepping stone of a variety of at-tacks aiming at identifying sensitive information (keys, credentials, data, algo-rithms) or vulnerabilities and flaws for broader exploitation. Software applica-tions are usually deployed as identical binary code installed on millions of com-puters, enabling an adversary to develop a generic reverse-engineering strategy that, if working on one code instance, could be applied to crack all the other in-stances. A solution to mitigate this problem is represented by Software Diversity, which aims at creating several structurally different (but functionally equivalent) binary code versions out of the same source code, so that even if a successful attack can be elaborated for one version, it should not work on a diversified ver-sion. In this paper, we address the problem of maximizing software diversity from a search-based optimization point of view. The program to protect is subject to a catalogue of transformations to generate many candidate versions. The problem of selecting the subset of most diversified versions to be deployed is formulated as an optimisation problem, that we tackle with different search heuristics. We show the applicability of this approach on some popular Android apps.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a novel demand response model using a fuzzy subtractive cluster approach. The model development provides support to domestic consumer decisions on controllable loads management, considering consumers' consumption needs and the appropriate load shape or rescheduling in order to achieve possible economic benefits. The model based on fuzzy subtractive clustering method considers clusters of domestic consumption covering an adequate consumption range. Analysis of different scenarios is presented considering available electric power and electric energy prices. Simulation results are presented and conclusions of the proposed demand response model are discussed. (C) 2016 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This proposal shows that ACO systems can be applied to problems of requirements selection in software incremental development, with the idea of obtaining better results of those produced by expert judgment alone. The evaluation of the ACO systems should be done through a compared analysis with greedy and simulated annealing algorithms, performing experiments with some problems instances