19 resultados para twitter, conversation retrieval

em Universidad Politécnica de Madrid


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Durante la actividad diaria, la sociedad actual interactúa constantemente por medio de dispositivos electrónicos y servicios de telecomunicaciones, tales como el teléfono, correo electrónico, transacciones bancarias o redes sociales de Internet. Sin saberlo, masivamente dejamos rastros de nuestra actividad en las bases de datos de empresas proveedoras de servicios. Estas nuevas fuentes de datos tienen las dimensiones necesarias para que se puedan observar patrones de comportamiento humano a grandes escalas. Como resultado, ha surgido una reciente explosión sin precedentes de estudios de sistemas sociales, dirigidos por el análisis de datos y procesos computacionales. En esta tesis desarrollamos métodos computacionales y matemáticos para analizar sistemas sociales por medio del estudio combinado de datos derivados de la actividad humana y la teoría de redes complejas. Nuestro objetivo es caracterizar y entender los sistemas emergentes de interacciones sociales en los nuevos espacios tecnológicos, tales como la red social Twitter y la telefonía móvil. Analizamos los sistemas por medio de la construcción de redes complejas y series temporales, estudiando su estructura, funcionamiento y evolución en el tiempo. También, investigamos la naturaleza de los patrones observados por medio de los mecanismos que rigen las interacciones entre individuos, así como medimos el impacto de eventos críticos en el comportamiento del sistema. Para ello, hemos propuesto modelos que explican las estructuras globales y la dinámica emergente con que fluye la información en el sistema. Para los estudios de la red social Twitter, hemos basado nuestros análisis en conversaciones puntuales, tales como protestas políticas, grandes acontecimientos o procesos electorales. A partir de los mensajes de las conversaciones, identificamos a los usuarios que participan y construimos redes de interacciones entre los mismos. Específicamente, construimos una red para representar quién recibe los mensajes de quién y otra red para representar quién propaga los mensajes de quién. En general, hemos encontrado que estas estructuras tienen propiedades complejas, tales como crecimiento explosivo y distribuciones de grado libres de escala. En base a la topología de estas redes, hemos indentificado tres tipos de usuarios que determinan el flujo de información según su actividad e influencia. Para medir la influencia de los usuarios en las conversaciones, hemos introducido una nueva medida llamada eficiencia de usuario. La eficiencia se define como el número de retransmisiones obtenidas por mensaje enviado, y mide los efectos que tienen los esfuerzos individuales sobre la reacción colectiva. Hemos observado que la distribución de esta propiedad es ubicua en varias conversaciones de Twitter, sin importar sus dimensiones ni contextos. Con lo cual, sugerimos que existe universalidad en la relación entre esfuerzos individuales y reacciones colectivas en Twitter. Para explicar los factores que determinan la emergencia de la distribución de eficiencia, hemos desarrollado un modelo computacional que simula la propagación de mensajes en la red social de Twitter, basado en el mecanismo de cascadas independientes. Este modelo nos permite medir el efecto que tienen sobre la distribución de eficiencia, tanto la topología de la red social subyacente, como la forma en que los usuarios envían mensajes. Los resultados indican que la emergencia de un grupo selecto de usuarios altamente eficientes depende de la heterogeneidad de la red subyacente y no del comportamiento individual. Por otro lado, hemos desarrollado técnicas para inferir el grado de polarización política en redes sociales. Proponemos una metodología para estimar opiniones en redes sociales y medir el grado de polarización en las opiniones obtenidas. Hemos diseñado un modelo donde estudiamos el efecto que tiene la opinión de un pequeño grupo de usuarios influyentes, llamado élite, sobre las opiniones de la mayoría de usuarios. El modelo da como resultado una distribución de opiniones sobre la cual medimos el grado de polarización. Aplicamos nuestra metodología para medir la polarización en redes de difusión de mensajes, durante una conversación en Twitter de una sociedad políticamente polarizada. Los resultados obtenidos presentan una alta correspondencia con los datos offline. Con este estudio, hemos demostrado que la metodología propuesta es capaz de determinar diferentes grados de polarización dependiendo de la estructura de la red. Finalmente, hemos estudiado el comportamiento humano a partir de datos de telefonía móvil. Por una parte, hemos caracterizado el impacto que tienen desastres naturales, como innundaciones, sobre el comportamiento colectivo. Encontramos que los patrones de comunicación se alteran de forma abrupta en las áreas afectadas por la catástofre. Con lo cual, demostramos que se podría medir el impacto en la región casi en tiempo real y sin necesidad de desplegar esfuerzos en el terreno. Por otra parte, hemos estudiado los patrones de actividad y movilidad humana para caracterizar las interacciones entre regiones de un país en desarrollo. Encontramos que las redes de llamadas y trayectorias humanas tienen estructuras de comunidades asociadas a regiones y centros urbanos. En resumen, hemos mostrado que es posible entender procesos sociales complejos por medio del análisis de datos de actividad humana y la teoría de redes complejas. A lo largo de la tesis, hemos comprobado que fenómenos sociales como la influencia, polarización política o reacción a eventos críticos quedan reflejados en los patrones estructurales y dinámicos que presentan la redes construidas a partir de datos de conversaciones en redes sociales de Internet o telefonía móvil. ABSTRACT During daily routines, we are constantly interacting with electronic devices and telecommunication services. Unconsciously, we are massively leaving traces of our activity in the service providers’ databases. These new data sources have the dimensions required to enable the observation of human behavioral patterns at large scales. As a result, there has been an unprecedented explosion of data-driven social research. In this thesis, we develop computational and mathematical methods to analyze social systems by means of the combined study of human activity data and the theory of complex networks. Our goal is to characterize and understand the emergent systems from human interactions on the new technological spaces, such as the online social network Twitter and mobile phones. We analyze systems by means of the construction of complex networks and temporal series, studying their structure, functioning and temporal evolution. We also investigate on the nature of the observed patterns, by means of the mechanisms that rule the interactions among individuals, as well as on the impact of critical events on the system’s behavior. For this purpose, we have proposed models that explain the global structures and the emergent dynamics of information flow in the system. In the studies of the online social network Twitter, we have based our analysis on specific conversations, such as political protests, important announcements and electoral processes. From the messages related to the conversations, we identify the participant users and build networks of interactions with them. We specifically build one network to represent whoreceives- whose-messages and another to represent who-propagates-whose-messages. In general, we have found that these structures have complex properties, such as explosive growth and scale-free degree distributions. Based on the topological properties of these networks, we have identified three types of user behavior that determine the information flow dynamics due to their influence. In order to measure the users’ influence on the conversations, we have introduced a new measure called user efficiency. It is defined as the number of retransmissions obtained by message posted, and it measures the effects of the individual activity on the collective reacixtions. We have observed that the probability distribution of this property is ubiquitous across several Twitter conversation, regardlessly of their dimension or social context. Therefore, we suggest that there is a universal behavior in the relationship between individual efforts and collective reactions on Twitter. In order to explain the different factors that determine the user efficiency distribution, we have developed a computational model to simulate the diffusion of messages on Twitter, based on the mechanism of independent cascades. This model, allows us to measure the impact on the emergent efficiency distribution of the underlying network topology, as well as the way that users post messages. The results indicate that the emergence of an exclusive group of highly efficient users depends upon the heterogeneity of the underlying network instead of the individual behavior. Moreover, we have also developed techniques to infer the degree of polarization in social networks. We propose a methodology to estimate opinions in social networks and to measure the degree of polarization in the obtained opinions. We have designed a model to study the effects of the opinions of a small group of influential users, called elite, on the opinions of the majority of users. The model results in an opinions distribution to which we measure the degree of polarization. We apply our methodology to measure the polarization on graphs from the messages diffusion process, during a conversation on Twitter from a polarized society. The results are in very good agreement with offline and contextual data. With this study, we have shown that our methodology is capable of detecting several degrees of polarization depending on the structure of the networks. Finally, we have also inferred the human behavior from mobile phones’ data. On the one hand, we have characterized the impact of natural disasters, like flooding, on the collective behavior. We found that the communication patterns are abruptly altered in the areas affected by the catastrophe. Therefore, we demonstrate that we could measure the impact of the disaster on the region, almost in real-time and without needing to deploy further efforts. On the other hand, we have studied human activity and mobility patterns in order to characterize regional interactions on a developing country. We found that the calls and trajectories networks present community structure associated to regional and urban areas. In summary, we have shown that it is possible to understand complex social processes by means of analyzing human activity data and the theory of complex networks. Along the thesis, we have demonstrated that social phenomena, like influence, polarization and reaction to critical events, are reflected in the structural and dynamical patterns of the networks constructed from data regarding conversations on online social networks and mobile phones.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main goal of the bilingual and monolingual participation of the MIRACLE team in CLEF 2004 was to test the effect of combination approaches on information retrieval. The starting point was a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. A second order combination was also tested, mainly by averaging or selective combination of the documents retrieved by different approaches for a particular query.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Specialized search engines such as PubMed, MedScape or Cochrane have increased dramatically the visibility of biomedical scientific results. These web-based tools allow physicians to access scientific papers instantly. However, this decisive improvement had not a proportional impact in clinical practice due to the lack of advanced search methods. Even queries highly specified for a concrete pathology frequently retrieve too many information, with publications related to patients treated by the physician beyond the scope of the results examined. In this work we present a new method to improve scientific article search using patient information. Two pathologies have been used within the project to retrieve relevant literature to patient data and to be integrated with other sources. Promising results suggest the suitability of the approach, highlighting publications dealing with patient features and facilitating literature search to physicians.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ImageCLEF is a pilot experiment run at CLEF 2003 for cross language image retrieval using textual captions related to image contents. In this paper, we describe the participation of the MIRACLE research team (Multilingual Information RetrievAl at CLEF), detailing the different experiments and discussing their preliminary results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the first set of experiments defined by the MIRACLE (Multilingual Information RetrievAl for the CLEf campaign) research group for some of the cross language tasks defined by CLEF. These experiments combine different basic techniques, linguistic-oriented and statistic-oriented, to be applied to the indexing and retrieval processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the participation of DAEDALUS at ImageCLEF 2011 Medical Retrieval task. We have focused on multimodal (or mixed) experiments that combine textual and visual retrieval. The main objective of our research has been to evaluate the effect on the medical retrieval process of the existence of an extended corpus that is annotated with the image type, associated to both the image itself and also to its textual description. For this purpose, an image classifier has been developed to tag each document with its class (1st level of the hierarchy: Radiology, Microscopy, Photograph, Graphic, Other) and subclass (2nd level: AN, CT, MR, etc.). For the textual-based experiments, several runs using different semantic expansion techniques have been performed. For the visual-based retrieval, different runs are defined by the corpus used in the retrieval process and the strategy for obtaining the class and/or subclass. The best results are achieved in runs that make use of the image subclass based on the classification of the sample images. Although different multimodal strategies have been submitted, none of them has shown to be able to provide results that are at least comparable to the ones achieved by the textual retrieval alone. We believe that we have been unable to find a metric for the assessment of the relevance of the results provided by the visual and textual processes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Las redes sociales en la actualidad son muy relevantes, no solo ocupan mucho tiempo en la vida diaria de las personas si no que también sirve a millones de empresas para publicitarse entre otras cosas. Al fenómeno de las redes sociales se le ha unido la faceta empresarial. La liberación de las APIs de algunas redes sociales ha permitido el desarrollo de aplicaciones de todo tipo y que puedan tener diferentes objetivos como por ejemplo este proyecto. Este proyecto comenzó desde el interés por Ericsson del estudio del API de Google+ y sugerencias para dar valores añadidos a las empresas de telecomunicaciones. También ha complementando la referencia disponible en Ericsson y de los otros dos proyectos de recuperación de información de las redes sociales, añadiendo una serie de opciones para el usuario en la aplicación. Para ello, se ha analizado y realizado un ejemplo, de lo que podemos obtener de las redes sociales, principalmente Twitter y Google+. Lo primero en lo que se ha basado el proyecto ha sido en realizar un estudio teórico sobre el inicio de las redes sociales, el desarrollo y el estado en el que se encuentran, analizando así las principales redes sociales que existen y aportando una visión general sobre todas ellas. También se ha realizado un estado de arte sobre una serie de webs que se dedican al uso de esa información disponible en Internet. Posteriormente, de todas las redes sociales con APIs disponibles se realizó la elección de Google+ porque es una red social nueva aun por explorar y mejorar. Y la elección de Twitter por la serie de opciones y datos que se puede obtener de ella. De ambas se han estudiado sus APIs, para posteriormente con la información obtenida, realizar una aplicación prototipo que recogiera una serie de funciones útiles a partir de los datos de sus redes sociales. Por último se ha realizado una simple interfaz en la cual se puede acceder a los datos de la cuenta como si se estuviera en Twitter o Google+, además con los datos de Twitter se puede realizar una búsqueda avanzada con alertas, un análisis de sentimiento, ver tus mayores retweets de los que te siguen y por último realizar un seguimiento comparando lo que se comenta sobre dos temas determinados. Con este proyecto se ha pretendido proporcionar una idea general de todo lo relacionado con las redes sociales, las aplicaciones disponibles para trabajar con ellas, la información del API de Twitter y Google+ y un concepto de lo que se puede obtener. Today social networks are very relevant, they not only take a long time in daily life of people but also serve millions of businesses to advertise and other things. The phenomenon of social networks has been joined the business side. The release of the APIs of some social networks has allowed the development of applications of all types and different objectives such as this project. This project started from an interest in the study of Ericsson about Google+ API and suggestions to add value to telecommunications companies. This project has complementing the reference available in Ericsson and the other two projects of information retrieval of social networks, adding a number of options for the user in the application. To do this, we have analyzed and made an example of what we can get it from social networks, mainly Twitter and Google+. The first thing that has done in the project was to make a theoretical study on the initiation of social networks, the development and the state in which they are found, and analyze the major social networks that exist. There has also been made a state of art on a number of websites that are dedicated to the use of this information available online. Subsequently, about all the social networks APIs available, Google+ was choice because it is a new social network even to explore and improve. And the choice of Twitter for the number of options and data that can be obtained from it. In both APIs have been studied, and later with the information obtained, make a prototype application to collect a number of useful features from data of social networks. Finally there has been a simple interface, in which you can access the account as if you were on Twitter or Google+. With Twitter data can perform an advanced search with alerts, sentiment analysis, see retweets of who follow you and make comparing between two particular topics. This project is intended to provide an overview of everything related to social networks, applications available to work with them, information about API of Google+ and Twitter, and a concept of what you can get.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Manual de Twitter Segundo módulo de 6 del curso de Redes Sociales aplicadas al ámbito universitario, en el que se explica el uso de Twitter.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aplicaciones de Twitter a la enseñanza Universitaria Segundo módulo de 6 del curso de Redes Sociales aplicadas al ámbito universitario, en el que se explica el uso de Twitter aplicado a las enseñanzas universitarias con ejemplos.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last several years, micro-blogging Online Social Networks (OSNs), such as Twitter, have taken the world by storm, now boasting over 100 million subscribers. As an unparalleled stage for an enormous audience, they offer fast and reliable centralized diffusion of pithy tweets to great multitudes of information-hungry and always-connected followers. At the same time, this information gathering and dissemination paradigm prompts some important privacy concerns about relationships between tweeters, followers and interests of the latter. In this paper, we assess privacy in today?s Twitter-like OSNs and describe an architecture and a trial implementation of a privacy-preserving service called Hummingbird. It is essentially a variant of Twitter that protects tweet contents, hashtags and follower interests from the (potentially) prying eyes of the centralized server. We argue that, although inherently limited by Twitter?s mission of scalable information-sharing, this degree of privacy is valuable. We demonstrate, via a working prototype, that Hummingbird?s additional costs are tolerably low. We also sketch out some viable enhancements that might offer better privacy in the long term.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Twitter lists organise Twitter users into multiple, often overlapping, sets. We believe that these lists capture some form of emergent semantics, which may be useful to characterise. In this paper we describe an approach for such characterisation, which consists of deriving semantic relations between lists and users by analyzing the cooccurrence of keywords in list names. We use the vector space model and Latent Dirichlet Allocation to obtain similar keywords according to co-occurrence patterns. These results are then compared to similarity measures relying on WordNet and to existing Linked Data sets. Results show that co-occurrence of keywords based on members of the lists produce more synonyms and more correlated results to that of WordNet similarity measures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The emergence of cloud datacenters enhances the capability of online data storage. Since massive data is stored in datacenters, it is necessary to effectively locate and access interest data in such a distributed system. However, traditional search techniques only allow users to search images over exact-match keywords through a centralized index. These techniques cannot satisfy the requirements of content based image retrieval (CBIR). In this paper, we propose a scalable image retrieval framework which can efficiently support content similarity search and semantic search in the distributed environment. Its key idea is to integrate image feature vectors into distributed hash tables (DHTs) by exploiting the property of locality sensitive hashing (LSH). Thus, images with similar content are most likely gathered into the same node without the knowledge of any global information. For searching semantically close images, the relevance feedback is adopted in our system to overcome the gap between low-level features and high-level features. We show that our approach yields high recall rate with good load balance and only requires a few number of hops.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work we study Twitter data to understand influence dynamics in social networks. We define user efficiency on Twitter, as the ratio between the emergent spreading process and the activity employed by the user. We characterize this property by means of a quantitative analysis of the structural and dynamical patterns emergent from human interactions, and show it to be universal across several Twitter conversations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes our participation at SemEval- 2014 sentiment analysis task, in both contextual and message polarity classification. Our idea was to com- pare two different techniques for sentiment analysis. First, a machine learning classifier specifically built for the task using the provided training corpus. On the other hand, a lexicon-based approach using natural language processing techniques, developed for a ge- neric sentiment analysis task with no adaptation to the provided training corpus. Results, though far from the best runs, prove that the generic model is more robust as it achieves a more balanced evaluation for message polarity along the different test sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sentiment analysis has recently gained popularity in the financial domain thanks to its capability to predict the stock market based on the wisdom of the crowds. Nevertheless, current sentiment indicators are still silos that cannot be combined to get better insight about the mood of different communities. In this article we propose a Linked Data approach for modelling sentiment and emotions about financial entities. We aim at integrating sentiment information from different communities or providers, and complements existing initiatives such as FIBO. The ap- proach has been validated in the semantic annotation of tweets of several stocks in the Spanish stock market, including its sentiment information.