12 resultados para collaborative filtering
em Universidad Politécnica de Madrid
Resumo:
As the use of recommender systems becomes more consolidated on the Net, an increasing need arises to develop some kind of evaluation framework for collaborative filtering measures and methods which is capable of not only testing the prediction and recommendation results, but also of other purposes which until now were considered secondary, such as novelty in the recommendations and the users? trust in these. This paper provides: (a) measures to evaluate the novelty of the users? recommendations and trust in their neighborhoods, (b) equations that formalize and unify the collaborative filtering process and its evaluation, (c) a framework based on the above-mentioned elements that enables the evaluation of the quality results of any collaborative filtering applied to the desired recommender systems, using four graphs: quality of the predictions, the recommendations, the novelty and the trust.
Resumo:
Recommender systems play an important role in reducing the negative impact of informa- tion overload on those websites where users have the possibility of voting for their prefer- ences on items. The most normal technique for dealing with the recommendation mechanism is to use collaborative filtering, in which it is essential to discover the most similar users to whom you desire to make recommendations. The hypothesis of this paper is that the results obtained by applying traditional similarities measures can be improved by taking contextual information, drawn from the entire body of users, and using it to cal- culate the singularity which exists, for each item, in the votes cast by each pair of users that you wish to compare. As such, the greater the measure of singularity result between the votes cast by two given users, the greater the impact this will have on the similarity. The results, tested on the Movielens, Netflix and FilmAffinity databases, corroborate the excellent behaviour of the singularity measure proposed.
Resumo:
The new user cold start issue represents a serious problem in recommender systems as it can lead to the loss of new users who decide to stop using the system due to the lack of accuracy in the recommenda- tions received in that first stage in which they have not yet cast a significant number of votes with which to feed the recommender system?s collaborative filtering core. For this reason it is particularly important to design new similarity metrics which provide greater precision in the results offered to users who have cast few votes. This paper presents a new similarity measure perfected using optimization based on neu- ral learning, which exceeds the best results obtained with current metrics. The metric has been tested on the Netflix and Movielens databases, obtaining important improvements in the measures of accuracy, precision and recall when applied to new user cold start situations. The paper includes the mathematical formalization describing how to obtain the main quality measures of a recommender system using leave- one-out cross validation.
Resumo:
Collaborative filtering recommender systems contribute to alleviating the problem of information overload that exists on the Internet as a result of the mass use of Web 2.0 applications. The use of an adequate similarity measure becomes a determining factor in the quality of the prediction and recommendation results of the recommender system, as well as in its performance. In this paper, we present a memory-based collaborative filtering similarity measure that provides extremely high-quality and balanced results; these results are complemented with a low processing time (high performance), similar to the one required to execute traditional similarity metrics. The experiments have been carried out on the MovieLens and Netflix databases, using a representative set of information retrieval quality measures.
Resumo:
Los sistemas de recomendación son un tipo de solución al problema de sobrecarga de información que sufren los usuarios de los sitios web en los que se pueden votar ciertos artículos. El sistema de recomendación de filtrado colaborativo es considerado como el método con más éxito debido a que sus recomendaciones se hacen basándose en los votos de usuarios similares a un usuario activo. Sin embargo, el método de filtrado de colaboración tradicional selecciona usuarios insuficientemente representativos como vecinos de cada usuario activo. Esto significa que las recomendaciones hechas a posteriori no son lo suficientemente precisas. El método propuesto en esta tesis realiza un pre-filtrado del proceso, mediante el uso de dominancia de Pareto, que elimina los usuarios menos representativos del proceso de selección k-vecino y mantiene los más prometedores. Los resultados de los experimentos realizados en MovieLens y Netflix muestran una mejora significativa en todas las medidas de calidad estudiadas en la aplicación del método propuesto. ABSTRACTRecommender systems are a type of solution to the information overload problem suffered by users of websites on which they can rate certain items. The Collaborative Filtering Recommender System is considered to be the most successful approach as it make its recommendations based on votes of users similar to an active user. Nevertheless, the traditional collaborative filtering method selects insufficiently representative users as neighbors of each active user. This means that the recommendations made a posteriori are not precise enough. The method proposed in this thesis performs a pre-filtering process, by using Pareto dominance, which eliminates the less representative users from the k-neighbor selection process and keeps the most promising ones. The results from the experiments performed on Movielens and Netflix show a significant improvement in all the quality measures studied on applying the proposed method.
Resumo:
In this paper we provide a method that allows the visualization of similarity relationships present between items of collaborative filtering recommender systems, as well as the relative importance of each of these. The objective is to offer visual representations of the recommender system?s set of items and of their relationships; these graphs show us where the most representative information can be found and which items are rated in a more similar way by the recommender system?s community of users. The visual representations achieved take the shape of phylogenetic trees, displaying the numerical similarity and the reliability between each pair of items considered to be similar. As a case study we provide the results obtained using the public database Movielens 1M, which contains 3900 movies.
Resumo:
This doctoral thesis focuses on the modeling of multimedia systems to create personalized recommendation services based on the analysis of users’ audiovisual consumption. Research is focused on the characterization of both users’ audiovisual consumption and content, specifically images and video. This double characterization converges into a hybrid recommendation algorithm, adapted to different application scenarios covering different specificities and constraints. Hybrid recommendation systems use both content and user information as input data, applying the knowledge from the analysis of these data as the initial step to feed the algorithms in order to generate personalized recommendations. Regarding the user information, this doctoral thesis focuses on the analysis of audiovisual consumption to infer implicitly acquired preferences. The inference process is based on a new probabilistic model proposed in the text. This model takes into account qualitative and quantitative consumption factors on the one hand, and external factors such as zapping factor or company factor on the other. As for content information, this research focuses on the modeling of descriptors and aesthetic characteristics, which influence the user and are thus useful for the recommendation system. Similarly, the automatic extraction of these descriptors from the audiovisual piece without excessive computational cost has been considered a priority, in order to ensure applicability to different real scenarios. Finally, a new content-based recommendation algorithm has been created from the previously acquired information, i.e. user preferences and content descriptors. This algorithm has been hybridized with a collaborative filtering algorithm obtained from the current state of the art, so as to compare the efficiency of this hybrid recommender with the individual techniques of recommendation (different hybridization techniques of the state of the art have been studied for suitability). The content-based recommendation focuses on the influence of the aesthetic characteristics on the users. The heterogeneity of the possible users of these kinds of systems calls for the use of different criteria and attributes to create effective recommendations. Therefore, the proposed algorithm is adaptable to different perceptions producing a dynamic representation of preferences to obtain personalized recommendations for each user of the system. The hypotheses of this doctoral thesis have been validated by conducting a set of tests with real users, or by querying a database containing user preferences - available to the scientific community. This thesis is structured based on the different research and validation methodologies of the techniques involved. In the three central chapters the state of the art is studied and the developed algorithms and models are validated via self-designed tests. It should be noted that some of these tests are incremental and confirm the validation of previously discussed techniques. Resumen Esta tesis doctoral se centra en el modelado de sistemas multimedia para la creación de servicios personalizados de recomendación a partir del análisis de la actividad de consumo audiovisual de los usuarios. La investigación se focaliza en la caracterización tanto del consumo audiovisual del usuario como de la naturaleza de los contenidos, concretamente imágenes y vídeos. Esta doble caracterización de usuarios y contenidos confluye en un algoritmo de recomendación híbrido que se adapta a distintos escenarios de aplicación, cada uno de ellos con distintas peculiaridades y restricciones. Todo sistema de recomendación híbrido toma como datos de partida tanto información del usuario como del contenido, y utiliza este conocimiento como entrada para algoritmos que permiten generar recomendaciones personalizadas. Por la parte de la información del usuario, la tesis se centra en el análisis del consumo audiovisual para inferir preferencias que, por lo tanto, se adquieren de manera implícita. Para ello, se ha propuesto un nuevo modelo probabilístico que tiene en cuenta factores de consumo tanto cuantitativos como cualitativos, así como otros factores de contorno, como el factor de zapping o el factor de compañía, que condicionan la incertidumbre de la inferencia. En cuanto a la información del contenido, la investigación se ha centrado en la definición de descriptores de carácter estético y morfológico que resultan influyentes en el usuario y que, por lo tanto, son útiles para la recomendación. Del mismo modo, se ha considerado una prioridad que estos descriptores se puedan extraer automáticamente de un contenido sin exigir grandes requisitos computacionales y, de tal forma que se garantice la posibilidad de aplicación a escenarios reales de diverso tipo. Por último, explotando la información de preferencias del usuario y de descripción de los contenidos ya obtenida, se ha creado un nuevo algoritmo de recomendación basado en contenido. Este algoritmo se cruza con un algoritmo de filtrado colaborativo de referencia en el estado del arte, de tal manera que se compara la eficiencia de este recomendador híbrido (donde se ha investigado la idoneidad de las diferentes técnicas de hibridación del estado del arte) con cada una de las técnicas individuales de recomendación. El algoritmo de recomendación basado en contenido que se ha creado se centra en las posibilidades de la influencia de factores estéticos en los usuarios, teniendo en cuenta que la heterogeneidad del conjunto de usuarios provoca que los criterios y atributos que condicionan las preferencias de cada individuo sean diferentes. Por lo tanto, el algoritmo se adapta a las diferentes percepciones y articula una metodología dinámica de representación de las preferencias que permite obtener recomendaciones personalizadas, únicas para cada usuario del sistema. Todas las hipótesis de la tesis han sido debidamente validadas mediante la realización de pruebas con usuarios reales o con bases de datos de preferencias de usuarios que están a disposición de la comunidad científica. La diferente metodología de investigación y validación de cada una de las técnicas abordadas condiciona la estructura de la tesis, de tal manera que los tres capítulos centrales se estructuran sobre su propio estudio del estado del arte y los algoritmos y modelos desarrollados se validan mediante pruebas autónomas, sin impedir que, en algún caso, las pruebas sean incrementales y ratifiquen la validación de técnicas expuestas anteriormente.
Resumo:
In this paper we introduce the idea of using a reliability measure associated to the predic- tions made by recommender systems based on collaborative filtering. This reliability mea- sure is based on the usual notion that the more reliable a prediction, the less liable to be wrong. Here we will define a general reliability measure suitable for any arbitrary recom- mender system. We will also show a method for obtaining specific reliability measures specially fitting the needs of different specific recommender systems.
Resumo:
One of the advantages of social networks is the possibility to socialize and personalize the content created or shared by the users. In mobile social networks, where the devices have limited capabilities in terms of screen size and computing power, Multimedia Recommender Systems help to present the most relevant content to the users, depending on their tastes, relationships and profile. Previous recommender systems are not able to cope with the uncertainty of automated tagging and are knowledge domain dependant. In addition, the instantiation of a recommender in this domain should cope with problems arising from the collaborative filtering inherent nature (cold start, banana problem, large number of users to run, etc.). The solution presented in this paper addresses the abovementioned problems by proposing a hybrid image recommender system, which combines collaborative filtering (social techniques) with content-based techniques, leaving the user the liberty to give these processes a personal weight. It takes into account aesthetics and the formal characteristics of the images to overcome the problems of current techniques, improving the performance of existing systems to create a mobile social networks recommender with a high degree of adaptation to any kind of user.
Resumo:
La importancia de los sistemas de recomendación ha experimentado un crecimiento exponencial como consecuencia del auge de las redes sociales. En esta tesis doctoral presentaré una amplia visión sobre el estado del arte de los sistemas de recomendación. Incialmente, estos estaba basados en fitrado demográfico, basado en contendio o colaborativo. En la actualidad, estos sistemas incorporan alguna información social al proceso de recomendación. En el futuro utilizarán información implicita, local y personal proveniente del Internet de las cosas. Los sistemas de recomendación basados en filtrado colaborativo se pueden modificar con el fin de realizar recomendaciones a grupos de usuarios. Existen trabajos previos que han incluido estas modificaciones en diferentes etapas del algoritmo de filtrado colaborativo: búsqueda de los vecinos, predicción de las votaciones y elección de las recomendaciones. En esta tesis doctoral proporcionaré un nuevo método que realizar el proceso de unficación (pasar de varios usuarios a un grupo) en el primer paso del algoritmo de filtrado colaborativo: cálculo de la métrica de similaridad. Proporcionaré una formalización completa del método propuesto. Explicaré cómo obtener el conjunto de k vecinos del grupo de usuarios y mostraré cómo obtener recomendaciones usando dichos vecinos. Asimismo, incluiré un ejemplo detallando cada paso del método propuesto en un sistema de recomendación compuesto por 8 usuarios y 10 items. Las principales características del método propuesto son: (a) es más rápido (más eficiente) que las alternativas proporcionadas por otros autores, y (b) es al menos tan exacto y preciso como otras soluciones estudiadas. Para contrastar esta hipótesis realizaré varios experimentos que miden la precisión, la exactitud y el rendimiento del método. Los resultados obtenidos se compararán con los resultados de otras alternativas utilizadas en la recomendación de grupos. Los experimentos se realizarán con las bases de datos de MovieLens y Netflix. ABSTRACT The importance of recommender systems has grown exponentially with the advent of social networks. In this PhD thesis I will provide a wide vision about the state of the art of recommender systems. They were initially based on demographic, contentbased and collaborative filtering. Currently, these systems incorporate some social information to the recommendation process. In the future, they will use implicit, local and personal information from the Internet of Things. As we will see here, recommender systems based on collaborative filtering can be used to perform recommendations to group of users. Previous works have made this modification in different stages of the collaborative filtering algorithm: establishing the neighborhood, prediction phase and determination of recommended items. In this PhD thesis I will provide a new method that carry out the unification process (many users to one group) in the first stage of the collaborative filtering algorithm: similarity metric computation. I will provide a full formalization of the proposed method. I will explain how to obtain the k nearest neighbors of the group of users and I will show how to get recommendations using those users. I will also include a running example of a recommender system with 8 users and 10 items detailing all the steps of the method I will present. The main highlights of the proposed method are: (a) it will be faster (more efficient) that the alternatives provided by other authors, and (b) it will be at least as precise and accurate as other studied solutions. To check this hypothesis I will conduct several experiments measuring the accuracy, the precision and the performance of my method. I will compare these results with the results generated by other methods of group recommendation. The experiments will be carried out using MovieLens and Netflix datasets.
Resumo:
Abstract Idea Management Systems are web applications that implement the notion of open innovation though crowdsourcing. Typically, organizations use those kind of systems to connect to large communities in order to gather ideas for improvement of products or services. Originating from simple suggestion boxes, Idea Management Systems advanced beyond collecting ideas and aspire to be a knowledge management solution capable to select best ideas via collaborative as well as expert assessment methods. In practice, however, the contemporary systems still face a number of problems usually related to information overflow and recognizing questionable quality of submissions with reasonable time and effort allocation. This thesis focuses on idea assessment problem area and contributes a number of solutions that allow to filter, compare and evaluate ideas submitted into an Idea Management System. With respect to Idea Management System interoperability the thesis proposes theoretical model of Idea Life Cycle and formalizes it as the Gi2MO ontology which enables to go beyond the boundaries of a single system to compare and assess innovation in an organization wide or market wide context. Furthermore, based on the ontology, the thesis builds a number of solutions for improving idea assessment via: community opinion analysis (MARL), annotation of idea characteristics (Gi2MO Types) and study of idea relationships (Gi2MO Links). The main achievements of the thesis are: application of theoretical innovation models for practice of Idea Management to successfully recognize the differentiation between communities, opinion metrics and their recognition as a new tool for idea assessment, discovery of new relationship types between ideas and their impact on idea clustering. Finally, the thesis outcome is establishment of Gi2MO Project that serves as an incubator for Idea Management solutions and mature open-source software alternatives for the widely available commercial suites. From the academic point of view the project delivers resources to undertake experiments in the Idea Management Systems area and managed to become a forum that gathered a number of academic and industrial partners. Resumen Los Sistemas de Gestión de Ideas son aplicaciones Web que implementan el concepto de innovación abierta con técnicas de crowdsourcing. Típicamente, las organizaciones utilizan ese tipo de sistemas para conectar con comunidades grandes y así recoger ideas sobre cómo mejorar productos o servicios. Los Sistemas de Gestión de Ideas lian avanzado más allá de recoger simplemente ideas de buzones de sugerencias y ahora aspiran ser una solución de gestión de conocimiento capaz de seleccionar las mejores ideas por medio de técnicas colaborativas, así como métodos de evaluación llevados a cabo por expertos. Sin embargo, en la práctica, los sistemas contemporáneos todavía se enfrentan a una serie de problemas, que, por lo general, están relacionados con la sobrecarga de información y el reconocimiento de las ideas de dudosa calidad con la asignación de un tiempo y un esfuerzo razonables. Esta tesis se centra en el área de la evaluación de ideas y aporta una serie de soluciones que permiten filtrar, comparar y evaluar las ideas publicadas en un Sistema de Gestión de Ideas. Con respecto a la interoperabilidad de los Sistemas de Gestión de Ideas, la tesis propone un modelo teórico del Ciclo de Vida de la Idea y lo formaliza como la ontología Gi2MO que permite ir más allá de los límites de un sistema único para comparar y evaluar la innovación en un contexto amplio dentro de cualquier organización o mercado. Por otra parte, basado en la ontología, la tesis desarrolla una serie de soluciones para mejorar la evaluación de las ideas a través de: análisis de las opiniones de la comunidad (MARL), la anotación de las características de las ideas (Gi2MO Types) y el estudio de las relaciones de las ideas (Gi2MO Links). Los logros principales de la tesis son: la aplicación de los modelos teóricos de innovación para la práctica de Sistemas de Gestión de Ideas para reconocer las diferenciasentre comu¬nidades, métricas de opiniones de comunidad y su reconocimiento como una nueva herramienta para la evaluación de ideas, el descubrimiento de nuevos tipos de relaciones entre ideas y su impacto en la agrupación de estas. Por último, el resultado de tesis es el establecimiento de proyecto Gi2MO que sirve como incubadora de soluciones para Gestión de Ideas y herramientas de código abierto ya maduras como alternativas a otros sistemas comerciales. Desde el punto de vista académico, el proyecto ha provisto de recursos a ciertos experimentos en el área de Sistemas de Gestión de Ideas y logró convertirse en un foro que reunión para un número de socios tanto académicos como industriales.
Resumo:
In ubiquitous data stream mining applications, different devices often aim to learn concepts that are similar to some extent. In these applications, such as spam filtering or news recommendation, the data stream underlying concept (e.g., interesting mail/news) is likely to change over time. Therefore, the resultant model must be continuously adapted to such changes. This paper presents a novel Collaborative Data Stream Mining (Coll-Stream) approach that explores the similarities in the knowledge available from other devices to improve local classification accuracy. Coll-Stream integrates the community knowledge using an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the feature space. We evaluate Coll-Stream classification accuracy in situations with concept drift, noise, partition granularity and concept similarity in relation to the local underlying concept. The experimental results show that Coll-Stream resultant model achieves stability and accuracy in a variety of situations using both synthetic and real world datasets.