866 resultados para twitter, conversation retrieval
Resumo:
Ranking documents according to the Probability Ranking Principle has been theoretically shown to guarantee optimal retrieval effectiveness in tasks such as ad hoc document retrieval. This ranking strategy assumes independence among document relevance assessments. This assumption, however, often does not hold, for example in the scenarios where redundancy in retrieved documents is of major concern, as it is the case in the sub–topic retrieval task. In this chapter, we propose a new ranking strategy for sub–topic retrieval that builds upon the interdependent document relevance and topic–oriented models. With respect to the topic– oriented model, we investigate both static and dynamic clustering techniques, aiming to group topically similar documents. Evidence from clusters is then combined with information about document dependencies to form a new document ranking. We compare and contrast the proposed method against state–of–the–art approaches, such as Maximal Marginal Relevance, Portfolio Theory for Information Retrieval, and standard cluster–based diversification strategies. The empirical investigation is performed on the ImageCLEF 2009 Photo Retrieval collection, where images are assessed with respect to sub–topics of a more general query topic. The experimental results show that our approaches outperform the state–of–the–art strategies with respect to a number of diversity measures.
Resumo:
The assumptions underlying the Probability Ranking Principle (PRP) have led to a number of alternative approaches that cater or compensate for the PRP’s limitations. All alternatives deviate from the PRP by incorporating dependencies. This results in a re-ranking that promotes or demotes documents depending upon their relationship with the documents that have been already ranked. In this paper, we compare and contrast the behaviour of state-of-the-art ranking strategies and principles. To do so, we tease out analytical relationships between the ranking approaches and we investigate the document kinematics to visualise the effects of the different approaches on document ranking.
Resumo:
Quantum-inspired models have recently attracted increasing attention in Information Retrieval. An intriguing characteristic of the mathematical framework of quantum theory is the presence of complex numbers. However, it is unclear what such numbers could or would actually represent or mean in Information Retrieval. The goal of this paper is to discuss the role of complex numbers within the context of Information Retrieval. First, we introduce how complex numbers are used in quantum probability theory. Then, we examine van Rijsbergen’s proposal of evoking complex valued representations of informations objects. We empirically show that such a representation is unlikely to be effective in practice (confuting its usefulness in Information Retrieval). We then explore alternative proposals which may be more successful at realising the power of complex numbers.
Creation of a new evaluation benchmark for information retrieval targeting patient information needs
Resumo:
Searching for health advice on the web is becoming increasingly common. Because of the great importance of this activity for patients and clinicians and the effect that incorrect information may have on health outcomes, it is critical to present relevant and valuable information to a searcher. Previous evaluation campaigns on health information retrieval (IR) have provided benchmarks that have been widely used to improve health IR and record these improvements. However, in general these benchmarks have targeted the specialised information needs of physicians and other healthcare workers. In this paper, we describe the development of a new collection for evaluation of effectiveness in IR seeking to satisfy the health information needs of patients. Our methodology features a novel way to create statements of patients’ information needs using realistic short queries associated with patient discharge summaries, which provide details of patient disorders. We adopt a scenario where the patient then creates a query to seek information relating to these disorders. Thus, discharge summaries provide us with a means to create contextually driven search statements, since they may include details on the stage of the disease, family history etc. The collection will be used for the first time as part of the ShARe/-CLEF 2013 eHealth Evaluation Lab, which focuses on natural language processing and IR for clinical care.
Resumo:
Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) ignored complex numbers. We argue that to advance the use of quantum models of IR, one has to lift the constraint of real-valued representations of the information space, and package more information within the representation by means of complex numbers. As a first attempt, we propose a complex-valued representation for IR, which explicitly uses complex valued Hilbert spaces, and thus where terms, documents and queries are represented as complex-valued vectors. The proposal consists of integrating distributional semantics evidence within the real component of a term vector; whereas, ontological information is encoded in the imaginary component. Our proposal has the merit of lifting the role of complex numbers from a computational byproduct of the model to the very mathematical texture that unifies different levels of semantic information. An empirical instantiation of our proposal is tested in the TREC Medical Record task of retrieving cohorts for clinical studies.
Resumo:
This paper presents the results of task 3 of the ShARe/CLEF eHealth Evaluation Lab 2013. This evaluation lab focuses on improving access to medical information on the web. The task objective was to investigate the effect of using additional information such as the discharge summaries and external resources such as medical ontologies on the IR effectiveness. The participants were allowed to submit up to seven runs, one mandatory run using no additional information or external resources, and three each using or not using discharge summaries.
Resumo:
Twitter is the focus of much research attention, both in traditional academic circles and in commercial market and media research, as analytics give increasing insight into the performance of the platform in areas as diverse as political communication, crisis management, television audiencing and other industries. While methods for tracking Twitter keywords and hashtags have developed apace and are well documented, the make-up of the Twitter user base and its evolution over time have been less understood to date. Recent research efforts have taken advantage of functionality provided by Twitter's Application Programming Interface to develop methodologies to extract information that allows us to understand the growth of Twitter, its geographic spread and the processes by which particular Twitter users have attracted followers. From politicians to sporting teams, and from YouTube personalities to reality television stars, this technique enables us to gain an understanding of what prompts users to follow others on Twitter. This article outlines how we came upon this approach, describes the method we adopted to produce accession graphs and discusses their use in Twitter research. It also addresses the wider ethical implications of social network analytics, particularly in the context of a detailed study of the Twitter user base.
Resumo:
In this paper, we provide an account-centric analysis of the tweeting activity of, and public response to, Pope Benedict XVI via the @pontifex Twitter account(s). We focus our investigation on the particular phase around Pope Benedict XVI’s resignation to generate insights into the use of Twitter in response to a celebrity crisis event. Through a combined qualitative and quantitative methodological approach we generate an overview of the follower-base and tweeting activity of the @pontifex account. We identify a very one-directional communication pattern (many @mentions by followers yet zero @replies from the papal account itself), which prompts us to enquire further into what the public resonance of the @pontifex account is. We also examine reactions to the resurrection of the papal Twitter account by Pope Benedict XVI’s successor. In this way, we provide a comprehensive analysis of the public response to the immediate events around the crisis event of Pope Benedict XVI’s resignation and its aftermath via the network of users involved in the @pontifex account.
Resumo:
The affective communication patterns of conversations on Twitter can provide insights into the culture of online communities. In this paper we apply a combined quantitative and qualitative approach to investigate the structural make-up and emotional content of tweeting activity around the hashtag #auspol (for Australian politics) in order to highlight the polarity and conservativism that characterise this highly active community of politically engaged individuals. We document the centralised structure of this particular community, which is based around a deeply committed core of contributors. Through in-depth content analysis of the tweets of participants to the online debate we explore the communicative tone, patterns of engagement and thematic drivers that shape the affective character of the community and their effect on its cohesiveness. In this way we provide a comprehensive account of the complex techno-social, linguistic and cultural factors involved in conversations that are shaped in the Twittersphere.
Resumo:
This paper shows how soccer clubs from Germany’s first division have started to use Twitter. Analysis is based on tweets from and to club accounts as well as on follower numbers, and specific clubs are selected for case studies. This approach reveals that Twitter mirrors the conflicts between professional sports and traditional fandom.
Resumo:
Social media have become crucial tools for political activists and protest movements, providing another channel for promoting messages and garnering support. Twitter, in particular, has been identified as a noteworthy medium for protests in countries including Iran and Egypt to receive global attention. The Occupy movement, originating with protests in, and the physical occupation of, Wall Street, and inspiring similar demonstrations in other U.S. cities and around the world, has been intrinsically linked with social media through location-specific hashtags: #ows for Occupy Wall Street, #occupysf for San Francisco, and so on. While the individual protests have a specific geographical focus-highlighted by the physical occupation of parks, buildings, and other urban areas-Twitter provides a means for these different movements to be linked and promoted through tweets containing multiple hashtags. It also serves as a channel for tactical communications during actions and as a space in which movement debates take place. This paper examines Twitter's use within the Occupy Oakland movement. We use a mixture of ethnographic research through interviews with activists and participant observation of the movements' activities, and a dataset of public tweets containing the #oo hashtag from early 2012. This research methodology allows us to develop a more accurate and nuanced understanding of how movement activists use Twitter by cross-checking trends in the online data with observations and activists' own reported use of Twitter. We also study the connections between a geographically focused movement such as Occupy Oakland and related, but physically distant, protests taking place concurrently in other cities. This study forms part of a wider research project, Mapping Movements, exploring the politics of place, investigating how social movements are composed and sustained, and the uses of online communication within these movements.
Resumo:
The analysis of content and meta–data has long been the subject of most Twitter studies, however such research only tells part of the story of the development of Twitter as a platform. In this work, we introduce a methodology to determine the growth patterns of individual users of the platform, a technique we refer to as follower accession, and through a number of case studies consider the factors which lead to follower growth, and the identification of non–authentic followers. Finally, we consider what such an approach tells us about the history of the platform itself, and the way in which changes to the new user signup process have impacted upon users.
Resumo:
Early works on Private Information Retrieval (PIR) focused on minimizing the necessary communication overhead. They seemed to achieve this goal but at the expense of query response time. To mitigate this weakness, protocols with secure coprocessors were introduced. They achieve optimal communication complexity and better online processing complexity. Unfortunately, all secure coprocessor-based PIR protocols require heavy periodical preprocessing. In this paper, we propose a new protocol, which is free from the periodical preprocessing while offering the optimal communication complexity and almost optimal online processing complexity. The proposed protocol is proven to be secure.
Resumo:
In the field of information retrieval (IR), researchers and practitioners are often faced with a demand for valid approaches to evaluate the performance of retrieval systems. The Cranfield experiment paradigm has been dominant for the in-vitro evaluation of IR systems. Alternative to this paradigm, laboratory-based user studies have been widely used to evaluate interactive information retrieval (IIR) systems, and at the same time investigate users’ information searching behaviours. Major drawbacks of laboratory-based user studies for evaluating IIR systems include the high monetary and temporal costs involved in setting up and running those experiments, the lack of heterogeneity amongst the user population and the limited scale of the experiments, which usually involve a relatively restricted set of users. In this paper, we propose an alternative experimental methodology to laboratory-based user studies. Our novel experimental methodology uses a crowdsourcing platform as a means of engaging study participants. Through crowdsourcing, our experimental methodology can capture user interactions and searching behaviours at a lower cost, with more data, and within a shorter period than traditional laboratory-based user studies, and therefore can be used to assess the performances of IIR systems. In this article, we show the characteristic differences of our approach with respect to traditional IIR experimental and evaluation procedures. We also perform a use case study comparing crowdsourcing-based evaluation with laboratory-based evaluation of IIR systems, which can serve as a tutorial for setting up crowdsourcing-based IIR evaluations.
Resumo:
We consider the following problem: members in a dynamic group retrieve their encrypted data from an untrusted server based on keywords and without any loss of data confidentiality and member’s privacy. In this paper, we investigate common secure indices for conjunctive keyword-based retrieval over encrypted data, and construct an efficient scheme from Wang et al. dynamic accumulator, Nyberg combinatorial accumulator and Kiayias et al. public-key encryption system. The proposed scheme is trapdoorless and keyword-field free. The security is proved under the random oracle, decisional composite residuosity and extended strong RSA assumptions.