877 resultados para document clustering
Resumo:
As more and more digital resources are available, finding the appropriate document becomes harder. Thus, a new kind of tools, able to recommend the more appropriated resources according the user needs, becomes even more necessary. The current project implements an intelligent recommendation system for elearning platforms. The recommendations are based on one hand, the performance of the user during the training process and on the other hand, the requests made by the user in the form of search queries. All information necessary for decision-making process of recommendation will be represented in the user model. This model will be updated throughout the target user interaction with the platform.
Resumo:
This article is is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Attribution-NonCommercial (CC BY-NC) license lets others remix, tweak, and build upon work non-commercially, and although the new works must also acknowledge & be non-commercial.
Resumo:
TPM Vol. 21, No. 4, December 2014, 435-447 – Special Issue © 2014 Cises.
Resumo:
O aumento do número de recursos digitais disponíveis dificulta a tarefa de pesquisa dos recursos mais relevantes, no sentido de se obter o que é mais relevante. Assim sendo, um novo tipo de ferramentas, capaz de recomendar os recursos mais apropriados às necessidades do utilizador, torna-se cada vez mais necessário. O objetivo deste trabalho de I&D é o de implementar um módulo de recomendação inteligente para plataformas de e-learning. As recomendações baseiam-se, por um lado, no perfil do utilizador durante o processo de formação e, por outro lado, nos pedidos efetuados pelo utilizador, através de pesquisas [Tavares, Faria e Martins, 2012]. O e-learning 3.0 é um projeto QREN desenvolvido por um conjunto de organizações e tem com objetivo principal implementar uma plataforma de e-learning. Este trabalho encontra-se inserido no projeto e-learning 3.0 e consiste no desenvolvimento de um módulo de recomendação inteligente (MRI). O MRI utiliza diferentes técnicas de recomendação já aplicadas noutros sistemas de recomendação. Estas técnicas são utilizadas para criar um sistema de recomendação híbrido direcionado para a plataforma de e-learning. Para representar a informação relevante, sobre cada utilizador, foi construído um modelo de utilizador. Toda a informação necessária para efetuar a recomendação será representada no modelo do utilizador, sendo este modelo atualizado sempre que necessário. Os dados existentes no modelo de utilizador serão utilizados para personalizar as recomendações produzidas. As recomendações estão divididas em dois tipos, a formal e a não formal. Na recomendação formal o objetivo é fazer sugestões relacionadas a um curso específico. Na recomendação não-formal, o objetivo é fazer sugestões mais abrangentes onde as recomendações não estão associadas a nenhum curso. O sistema proposto é capaz de sugerir recursos de aprendizagem, com base no perfil do utilizador, através da combinação de técnicas de similaridade de palavras, um algoritmo de clustering e técnicas de filtragem [Tavares, Faria e Martins, 2012].
Resumo:
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Resumo:
Clustering analysis is a useful tool to detect and monitor disease patterns and, consequently, to contribute for an effective population disease management. Portugal has the highest incidence of tuberculosis in the European Union (in 2012, 21.6 cases per 100.000 inhabitants), although it has been decreasing consistently. Two critical PTB (Pulmonary Tuberculosis) areas, metropolitan Oporto and metropolitan Lisbon regions, were previously identified through spatial and space-time clustering for PTB incidence rate and risk factors. Identifying clusters of temporal trends can further elucidate policy makers about municipalities showing a faster or a slower TB control improvement.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
This paper focus on a demand response model analysis in a smart grid context considering a contingency scenario. A fuzzy clustering technique is applied on the developed demand response model and an analysis is performed for the contingency scenario. Model considerations and architecture are described. The demand response developed model aims to support consumers decisions regarding their consumption needs and possible economic benefits.
Resumo:
Biosignals analysis has become widespread, upstaging their typical use in clinical settings. Electrocardiography (ECG) plays a central role in patient monitoring as a diagnosis tool in today's medicine and as an emerging biometric trait. In this paper we adopt a consensus clustering approach for the unsupervised analysis of an ECG-based biometric records. This type of analysis highlights natural groups within the population under investigation, which can be correlated with ground truth information in order to gain more insights about the data. Preliminary results are promising, for meaningful clusters are extracted from the population under analysis. © 2014 EURASIP.
Resumo:
This paper focus on a demand response model analysis in a smart grid context considering a contingency scenario. A fuzzy clustering technique is applied on the developed demand response model and an analysis is performed for the contingency scenario. Model considerations and architecture are described. The demand response developed model aims to support consumers decisions regarding their consumption needs and possible economic benefits.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
In the present paper we focus on the performance of clustering algorithms using indices of paired agreement to measure the accordance between clusters and an a priori known structure. We specifically propose a method to correct all indices considered for agreement by chance - the adjusted indices are meant to provide a realistic measure of clustering performance. The proposed method enables the correction of virtually any index - overcoming previous limitations known in the literature - and provides very precise results. We use simulated datasets under diverse scenarios and discuss the pertinence of our proposal which is particularly relevant when poorly separated clusters are considered. Finally we compare the performance of EM and KMeans algorithms, within each of the simulated scenarios and generally conclude that EM generally yields best results.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática