556 resultados para TWITTER
Resumo:
We introduce ReDites, a system for realtime event detection, tracking, monitoring and visualisation. It is designed to assist Information Analysts in understanding and exploring complex events as they unfold in the world. Events are automatically detected from the Twitter stream. Then those that are categorised as being security-relevant are tracked, geolocated, summarised and visualised for the end-user. Furthermore, the system tracks changes in emotions over events, signalling possible flashpoints or abatement. We demonstrate the capabilities of ReDites using an extended use case from the September 2013 Westgate shooting incident. Through an evaluation of system latencies, we also show that enriched events are made available for users to explore within seconds of that event occurring.
Resumo:
As microblog services such as Twitter become a fast and convenient communication approach, identification of trendy topics in microblog services has great academic and business value. However detecting trendy topics is very challenging due to huge number of users and short-text posts in microblog diffusion networks. In this paper we introduce a trendy topics detection system under computation and communication resource constraints. In stark contrast to retrieving and processing the whole microblog contents, we develop an idea of selecting a small set of microblog users and processing their posts to achieve an overall acceptable trendy topic coverage, without exceeding resource budget for detection. We formulate the selection operation of these subset users as mixed-integer optimization problems, and develop heuristic algorithms to compute their approximate solutions. The proposed system is evaluated with real-time test data retrieved from Sina Weibo, the dominant microblog service provider in China. It's shown that by monitoring 500 out of 1.6 million microblog users and tracking their microposts (about 15,000 daily) with our system, nearly 65% trendy topics can be detected, while on average 5 hours earlier before they appear in Sina Weibo official trends.
Resumo:
Learning user interests from online social networks helps to better understand user behaviors and provides useful guidance to design user-centric applications. Apart from analyzing users' online content, it is also important to consider users' social connections in the social Web. Graph regularization methods have been widely used in various text mining tasks, which can leverage the graph structure information extracted from data. Previously, graph regularization methods operate under the cluster assumption that nearby nodes are more similar and nodes on the same structure (typically referred to as a cluster or a manifold) are likely to be similar. We argue that learning user interests from complex, sparse, and dynamic social networks should be based on the link structure assumption under which node similarities are evaluated based on the local link structures instead of explicit links between two nodes. We propose a regularization framework based on the relation bipartite graph, which can be constructed from any type of relations. Using Twitter as our case study, we evaluate our proposed framework from social networks built from retweet relations. Both quantitative and qualitative experiments show that our proposed method outperforms a few competitive baselines in learning user interests over a set of predefined topics. It also gives superior results compared to the baselines on retweet prediction and topical authority identification. © 2014 ACM.
Resumo:
In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.
Resumo:
GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.
Resumo:
Short text messages a.k.a Microposts (e.g. Tweets) have proven to be an effective channel for revealing information about trends and events, ranging from those related to Disaster (e.g. hurricane Sandy) to those related to Violence (e.g. Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond. In this work we study the problem of topic classification (TC) of Microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of Microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to Microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of Microposts with features extracted only from the Microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of Microposts. Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen Microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and Microposts at a conceptual level, considering the enriched representation of these documents. Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures. © 2014 Elsevier B.V. All rights reserved.
Resumo:
There has been a huge growth of social network in the recent years. This trend does not only allow us to get connected and share the information in an efficient way, but also reveals some potential beneficial in dealing with several social issues, such as earthquake detection, social spam detection, flu pandemic tracking, media monitoring, etc. In this paper, we propose a new way of utilizing social network. By implementing what is called a Virtual Celebrator Machine (VCM), we are able to let everyone who has connection with this machine in term of social networking be able to share their cultural experience and points of view about certain social events locally or globally. In that way, we provide a way to reinforce the relationship and connection between people virtually, which, we believe, would help to flourish cultural heritage preservation.
Resumo:
People manage a spectrum of identities in cyber domains. Profiling individuals and assigning them to distinct groups or classes have potential applications in targeted services, online fraud detection, extensive social sorting, and cyber-security. This paper presents the Uncertainty of Identity Toolset, a framework for the identification and profiling of users from their social media accounts and e-mail addresses. More specifically, in this paper we discuss the design and implementation of two tools of the framework. The Twitter Geographic Profiler tool builds a map of the ethno-cultural communities of a person's friends on Twitter social media service. The E-mail Address Profiler tool identifies the probable identities of individuals from their e-mail addresses and maps their geographical distribution across the UK. To this end, this paper presents a framework for profiling the digital traces of individuals.
Resumo:
Spamming has been a widespread problem for social networks. In recent years there is an increasing interest in the analysis of anti-spamming for microblogs, such as Twitter. In this paper we present a systematic research on the analysis of spamming in Sina Weibo platform, which is currently a dominant microblogging service provider in China. Our research objectives are to understand the specific spamming behaviors in Sina Weibo and find approaches to identify and block spammers in Sina Weibo based on spamming behavior classifiers. To start with the analysis of spamming behaviors we devise several effective methods to collect a large set of spammer samples, including uses of proactive honeypots and crawlers, keywords based searching and buying spammer samples directly from online merchants. We processed the database associated with these spammer samples and interestingly we found three representative spamming behaviors: Aggressive advertising, repeated duplicate reposting and aggressive following. We extract various features and compare the behaviors of spammers and legitimate users with regard to these features. It is found that spamming behaviors and normal behaviors have distinct characteristics. Based on these findings we design an automatic online spammer identification system. Through tests with real data it is demonstrated that the system can effectively detect the spamming behaviors and identify spammers in Sina Weibo.
Resumo:
Uncertainty text detection is important to many social-media-based applications since more and more users utilize social media platforms (e.g., Twitter, Facebook, etc.) as information source to produce or derive interpretations based on them. However, existing uncertainty cues are ineffective in social media context because of its specific characteristics. In this paper, we propose a variant of annotation scheme for uncertainty identification and construct the first uncertainty corpus based on tweets. We then conduct experiments on the generated tweets corpus to study the effectiveness of different types of features for uncertainty text identification. © 2013 Association for Computational Linguistics.
Resumo:
Digital Business Discourse offers a distinctively language- and discourse-centered approach to digitally mediated business and professional communication, providing a timely and comprehensive assessment of the current digital communication practices of today's organisations and workplaces. It is the first dedicated publication to address how computer-mediated communication technologies affect institutional discourse practices, bringing together scholarship from a range of disciplinary backgrounds, including organisational and management studies, rhetorical and communication studies, communication training and discourse analysis. Covering a wide spectrum of communication technologies, such as email, instant messaging, message boards, Twitter, corporate blogs and consumer reviews, the chapters gather research drawing on empirical data from real professional contexts. In this way, the book contributes to both academic scholarship and business communication training, enabling researchers, trainers and practitioners to deepen their understanding of the impact of new communication technologies on professional and corporate communication practices.
Resumo:
Digital Business Discourse offers a distinctively language- and discourse-centered approach to digitally mediated business and professional communication, providing a timely and comprehensive assessment of the current digital communication practices of today's organisations and workplaces. It is the first dedicated publication to address how computer-mediated communication technologies affect institutional discourse practices, bringing together scholarship from a range of disciplinary backgrounds, including organisational and management studies, rhetorical and communication studies, communication training and discourse analysis. Covering a wide spectrum of communication technologies, such as email, instant messaging, message boards, Twitter, corporate blogs and consumer reviews, the chapters gather research drawing on empirical data from real professional contexts. In this way, the book contributes to both academic scholarship and business communication training, enabling researchers, trainers and practitioners to deepen their understanding of the impact of new communication technologies on professional and corporate communication practices.
Resumo:
In recent years, the boundaries between e-commerce and social networking have become increasingly blurred. Many e-commerce websites support the mechanism of social login where users can sign on the websites using their social network identities such as their Facebook or Twitter accounts. Users can also post their newly purchased products on microblogs with links to the e-commerce product web pages. In this paper, we propose a novel solution for cross-site cold-start product recommendation, which aims to recommend products from e-commerce websites to users at social networking sites in 'cold-start' situations, a problem which has rarely been explored before. A major challenge is how to leverage knowledge extracted from social networking sites for cross-site cold-start product recommendation. We propose to use the linked users across social networking sites and e-commerce websites (users who have social networking accounts and have made purchases on e-commerce websites) as a bridge to map users' social networking features to another feature representation for product recommendation. In specific, we propose learning both users' and products' feature representations (called user embeddings and product embeddings, respectively) from data collected from e-commerce websites using recurrent neural networks and then apply a modified gradient boosting trees method to transform users' social networking features into user embeddings. We then develop a feature-based matrix factorization approach which can leverage the learnt user embeddings for cold-start product recommendation. Experimental results on a large dataset constructed from the largest Chinese microblogging service Sina Weibo and the largest Chinese B2C e-commerce website JingDong have shown the effectiveness of our proposed framework.
Resumo:
New media platforms have changed the media landscape forever, as they have altered our perceptions of the limits of communication, and reception of information. Platforms such as Facebook, Twitter and WhatsApp enable individuals to circumvent the traditional mass media, converging audience and producer to create millions of ‘citizen journalists’. This new breed of journalist uses these platforms as a way of, not only receiving news, but of instantaneously, and often spontaneously, expressing opinions and venting and sharing emotions, thoughts and feelings. They are liberated from cultural and physical restraints, such as time, space and location, and they are not constrained by factors that impact upon the traditional media, such as editorial control, owner or political bias or the pressures of generating commercial revenue. A consequence of the way in which these platforms have become ingrained within our social culture is that habits, conventions and social norms, that were once informal and transitory manifestations of social life, are now infused within their use. What were casual and ephemeral actions and/or acts of expression, such as conversing with friends or colleagues or swapping/displaying pictures, or exchanging thoughts that were once kept private, or maybe shared with a select few, have now become formalised and potentially permanent, on view for the world to see. Incidentally, ‘traditional’ journalists and media outlets are also utilising new media, as it allows them to react, and disseminate news, instantaneously, within a hyper-competitive marketplace. However, in a world where we are saturated, not only by citizen journalists, but by traditional media outlets, offering access to news and opinion twenty-four hours a day, via multiple new media platforms, there is increased pressure to ‘break’ news fast and first. This paper will argue that new media, and the culture and environment it has created, for citizen journalists, traditional journalists and the media generally, has altered our perceptions of the limits and boundaries of freedom of expression dramatically, and that the corollary to this seismic shift is the impact on the notion of privacy and private life. Consequently, this paper will examine what a reasonable expectation of privacy may now mean, in a new media world.
Resumo:
The focus of this paper is brand destruction, however in a slightly different sense than the traditional marketing literature depicts it. The concept of brand destruction basically tends to be discussed either (1) as an accidental, counter-productive event in a campaign which leads to the ruining of the brand, or (2) an intentional act by competitors in the market, which results the same breakdown mentioned above. As this paper shows, there are other ways to consider as well, when speaking about brand destruction. An often overlooked type of brand destruction is a rather new phenomenon: destroying the brand by customers or business partners. The adequate scene for this case is the internet itself, especially different social media platforms, e. g. Facebook, Twitter, Tumblr, Instagram, etc. Also popular weblogs can play an important role in brand destruction made by customers or business partners (general cases related to social media are depicted in Lipsman – Mud – Rich – Bruich, 2012). This paper presents a couple of cases in the online field and focuses basically on online communicative activities, in which a brand’s negative properties come to discussion. Both Hungarian and foreign examples are easy to find and they all demonstrate the growing power of consumers. This observation led marketing experts to start talking about the ‘smooth seizure of power by consumers’. Whilst the critic of this concept is considered to be relevant, this paper describes the elements and methods of the ‘seizure’ – from an online social point of view. The key of handling brand destruction cases efficiently lies in the role of social media users. They are not only consumers, but the opportunity for producing online contents is in their hands as well – this fact results in the idea of ‘prosumers’. Thus customers on social media platforms must be handled as a ‘critical mass’: as civic warriors with strong weapons in their armoury. No companies are allowed to feel safe, as the slightest error may well be punished by the crowd.