785 resultados para Task Clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present paper we focus on the performance of clustering algorithms using indices of paired agreement to measure the accordance between clusters and an a priori known structure. We specifically propose a method to correct all indices considered for agreement by chance - the adjusted indices are meant to provide a realistic measure of clustering performance. The proposed method enables the correction of virtually any index - overcoming previous limitations known in the literature - and provides very precise results. We use simulated datasets under diverse scenarios and discuss the pertinence of our proposal which is particularly relevant when poorly separated clusters are considered. Finally we compare the performance of EM and KMeans algorithms, within each of the simulated scenarios and generally conclude that EM generally yields best results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Animal Cognition, V.6, pp. 259–267

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A procura de padrões nos dados de modo a formar grupos é conhecida como aglomeração de dados ou clustering, sendo uma das tarefas mais realizadas em mineração de dados e reconhecimento de padrões. Nesta dissertação é abordado o conceito de entropia e são usados algoritmos com critérios entrópicos para fazer clustering em dados biomédicos. O uso da entropia para efetuar clustering é relativamente recente e surge numa tentativa da utilização da capacidade que a entropia possui de extrair da distribuição dos dados informação de ordem superior, para usá-la como o critério na formação de grupos (clusters) ou então para complementar/melhorar algoritmos existentes, numa busca de obtenção de melhores resultados. Alguns trabalhos envolvendo o uso de algoritmos baseados em critérios entrópicos demonstraram resultados positivos na análise de dados reais. Neste trabalho, exploraram-se alguns algoritmos baseados em critérios entrópicos e a sua aplicabilidade a dados biomédicos, numa tentativa de avaliar a adequação destes algoritmos a este tipo de dados. Os resultados dos algoritmos testados são comparados com os obtidos por outros algoritmos mais “convencionais" como o k-médias, os algoritmos de spectral clustering e um algoritmo baseado em densidade.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Ciência e Sistemas de Informação Geográfica

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully exploits the nature of the co-association matrix. Our solution determines probabilistic assignments of data points to clusters by minimizing a Bregman divergence between the observed co-association frequencies and the corresponding co-occurrence probabilities expressed as functions of the unknown assignments. We additionally propose an optimization algorithm to find a solution under any double-convex Bregman divergence. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Consider the problem of assigning implicit-deadline sporadic tasks on a heterogeneous multiprocessor platform comprising two different types of processors—such a platform is referred to as two-type platform. We present two low degree polynomial time-complexity algorithms, SA and SA-P, each providing the following guarantee. For a given two-type platform and a task set, if there exists a task assignment such that tasks can be scheduled to meet deadlines by allowing them to migrate only between processors of the same type (intra-migrative), then (i) using SA, it is guaranteed to find such an assignment where the same restriction on task migration applies but given a platform in which processors are 1+α/2 times faster and (ii) SA-P succeeds in finding a task assignment where tasks are not allowed to migrate between processors (non-migrative) but given a platform in which processors are 1+α times faster. The parameter 0<α≤1 is a property of the task set; it is the maximum of all the task utilizations that are no greater than 1. We evaluate average-case performance of both the algorithms by generating task sets randomly and measuring how much faster processors the algorithms need (which is upper bounded by 1+α/2 for SA and 1+α for SA-P) in order to output a feasible task assignment (intra-migrative for SA and non-migrative for SA-P). In our evaluations, for the vast majority of task sets, these algorithms require significantly smaller processor speedup than indicated by their theoretical bounds. Finally, we consider a special case where no task utilization in the given task set can exceed one and for this case, we (re-)prove the performance guarantees of SA and SA-P. We show, for both of the algorithms, that changing the adversary from intra-migrative to a more powerful one, namely fully-migrative, in which tasks can migrate between processors of any type, does not deteriorate the performance guarantees. For this special case, we compare the average-case performance of SA-P and a state-of-the-art algorithm by generating task sets randomly. In our evaluations, SA-P outperforms the state-of-the-art by requiring much smaller processor speedup and by running orders of magnitude faster.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Consider the problem of assigning implicit-deadline sporadic tasks on a heterogeneous multiprocessor platform comprising a constant number (denoted by t) of distinct types of processors—such a platform is referred to as a t-type platform. We present two algorithms, LPGIM and LPGNM, each providing the following guarantee. For a given t-type platform and a task set, if there exists a task assignment such that tasks can be scheduled to meet their deadlines by allowing them to migrate only between processors of the same type (intra-migrative), then: (i) LPGIM succeeds in finding such an assignment where the same restriction on task migration applies (intra-migrative) but given a platform in which only one processor of each type is 1 + α × t-1/t times faster and (ii) LPGNM succeeds in finding a task assignment where tasks are not allowed to migrate between processors (non-migrative) but given a platform in which every processor is 1 + α times faster. The parameter α is a property of the task set; it is the maximum of all the task utilizations that are no greater than one. To the best of our knowledge, for t-type heterogeneous multiprocessors: (i) for the problem of intra-migrative task assignment, no previous algorithm exists with a proven bound and hence our algorithm, LPGIM, is the first of its kind and (ii) for the problem of non-migrative task assignment, our algorithm, LPGNM, has superior performance compared to state-of-the-art.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hard real- time multiprocessor scheduling has seen, in recent years, the flourishing of semi-partitioned scheduling algorithms. This category of scheduling schemes combines elements of partitioned and global scheduling for the purposes of achieving efficient utilization of the system’s processing resources with strong schedulability guarantees and with low dispatching overheads. The sub-class of slot-based “task-splitting” scheduling algorithms, in particular, offers very good trade-offs between schedulability guarantees (in the form of high utilization bounds) and the number of preemptions/migrations involved. However, so far there did not exist unified scheduling theory for such algorithms; each one was formulated in its own accompanying analysis. This article changes this fragmented landscape by formulating a more unified schedulability theory covering the two state-of-the-art slot-based semi-partitioned algorithms, S-EKG and NPS-F (both fixed job-priority based). This new theory is based on exact schedulability tests, thus also overcoming many sources of pessimism in existing analysis. In turn, since schedulability testing guides the task assignment under the schemes in consideration, we also formulate an improved task assignment procedure. As the other main contribution of this article, and as a response to the fact that many unrealistic assumptions, present in the original theory, tend to undermine the theoretical potential of such scheduling schemes, we identified and modelled into the new analysis all overheads incurred by the algorithms in consideration. The outcome is a new overhead-aware schedulability analysis that permits increased efficiency and reliability. The merits of this new theory are evaluated by an extensive set of experiments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The multiprocessor scheduling scheme NPS-F for sporadic tasks has a high utilisation bound and an overall number of preemptions bounded at design time. NPS-F binpacks tasks offline to as many servers as needed. At runtime, the scheduler ensures that each server is mapped to at most one of the m processors, at any instant. When scheduled, servers use EDF to select which of their tasks to run. Yet, unlike the overall number of preemptions, the migrations per se are not tightly bounded. Moreover, we cannot know a priori which task a server will be currently executing at the instant when it migrates. This uncertainty complicates the estimation of cache-related preemption and migration costs (CPMD), potentially resulting in their overestimation. Therefore, to simplify the CPMD estimation, we propose an amended bin-packing scheme for NPS-F allowing us (i) to identify at design time, which task migrates at which instant and (ii) bound a priori the number of migrating tasks, while preserving the utilisation bound of NPS-F.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Consider scheduling of real-time tasks on a multiprocessor where migration is forbidden. Specifically, consider the problem of determining a task-to-processor assignment for a given collection of implicit-deadline sporadic tasks upon a multiprocessor platform in which there are two distinct types of processors. For this problem, we propose a new algorithm, LPC (task assignment based on solving a Linear Program with Cutting planes). The algorithm offers the following guarantee: for a given task set and a platform, if there exists a feasible task-to-processor assignment, then LPC succeeds in finding such a feasible task-to-processor assignment as well but on a platform in which each processor is 1.5 × faster and has three additional processors. For systems with a large number of processors, LPC has a better approximation ratio than state-of-the-art algorithms. To the best of our knowledge, this is the first work that develops a provably good real-time task assignment algorithm using cutting planes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.