781 resultados para big data storage
Resumo:
In this paper, we explore the use of Twitter as a political tool in the 2013 Australian Federal Election. We employ a ‘big data’ approach that combines qualitative and quantitative methods of analysis. By tracking the accounts of politicians and parties, and the tweeting activity to and around these accounts, as well as conversations on particular hashtagged topics, we gain a comprehensive insight into the ways in which Twitter is employed in the campaigning strategies of different parties. We compare and contrast the use of Twitter by political actors with its adoption by citizens as a tool for political conversation and participation. Our study provides an important longitudinal counterpoint, and opportunity for comparison, to the use of Twitter in previous Australian federal and state elections. Furthermore, we offer innovative methodologies for data gathering and evaluation that can contribute to the comparative study of the political uses of Twitter across diverse national media and political systems.
Resumo:
The placement of the mappers and reducers on the machines directly affects the performance and cost of the MapReduce computation in cloud computing. From the computational point of view, the mappers/reducers placement problem is a generalization of the classical bin packing problem, which is NP-complete. Thus, in this paper we propose a new heuristic algorithm for the mappers/reducers placement problem in cloud computing and evaluate it by comparing with other several heuristics on solution quality and computation time by solving a set of test problems with various characteristics. The computational results show that our heuristic algorithm is much more efficient than the other heuristics. Also, we verify the effectiveness of our heuristic algorithm by comparing the mapper/reducer placement for a benchmark problem generated by our heuristic algorithm with a conventional mapper/reducer placement. The comparison results show that the computation using our mapper/reducer placement is much cheaper while still satisfying the computation deadline.
Resumo:
We identify relation completion (RC) as one recurring problem that is central to the success of novel big data applications such as Entity Reconstruction and Data Enrichment. Given a semantic relation, RC attempts at linking entity pairs between two entity lists under the relation. To accomplish the RC goals, we propose to formulate search queries for each query entity α based on some auxiliary information, so that to detect its target entity β from the set of retrieved documents. For instance, a pattern-based method (PaRE) uses extracted patterns as the auxiliary information in formulating search queries. However, high-quality patterns may decrease the probability of finding suitable target entities. As an alternative, we propose CoRE method that uses context terms learned surrounding the expression of a relation as the auxiliary information in formulating queries. The experimental results based on several real-world web data collections demonstrate that CoRE reaches a much higher accuracy than PaRE for the purpose of RC.
The Arab Spring and its social media audiences : English and Arabic Twitter users and their networks
Resumo:
2011 ‘Arab Spring’ are likely to overstate the impact of Facebook and Twitter on these uprisings, it is nonetheless true that protests and unrest in countries from Tunisia to Syria generated a substantial amount of social media activity. On Twitter alone, several millions of tweets containing the hashtags #libya or #egypt were generated during 2011, both by directly affected citizens of these countries, and by onlookers from further afield. What remains unclear, though, is the extent to which there was any direct interaction between these two groups (especially considering potential language barriers between them). Building on hashtag datasets gathered between January and November 2011, this paper compares patterns of Twitter usage during the popular revolution in Egypt and the civil war in Libya. Using custom-made tools for processing ‘big data’, we examine the volume of tweets sent by English-, Arabic-, and mixed-language Twitter users over time, and examine the networks of interaction (variously through @replying, retweeting, or both) between these groups as they developed and shifted over the course of these uprisings. Examining @reply and retweet traffic, we identify general patterns of information flow between the English- and Arabic-speaking sides of the Twittersphere, and highlight the roles played by users bridging both language spheres.
Resumo:
This tutorial primarily focuses on the technical challenges surrounding the design and implementation of Accountable-eHealth (AeH) systems. The potential benefits of shared eHealth records systems are promising for the future of improved healthcare; however, their uptake is hindered by concerns over the privacy and security of patient information. In the current eHealth environment, there are competing requirements between healthcare consumers' (i.e. patients) requirements and healthcare professionals' requirements. While consumers want control over their information, healthcare professionals want access to as much information as required in order to make well informed decisions. This conflict is evident in the review of Australia's PCEHR system. Accountable-eHealth systems aim to balance these concerns by implementing Information Accountability (IA) mechanisms. AeH systems create an eHealth environment where health information is available to the right person at the right time without rigid barriers whilst empowering the consumers with information control and transparency, thus, enabling the creation of shared eHealth records that can be useful to both patients and HCPs. In this half-day tutorial, we will discuss and describe the technical challenges surrounding the implementation of AeH systems and the solutions we have devised. A prototype AeH system will be used to demonstrate the functionality of AeH systems, and illustrate some of the proposed solutions. The topics that will be covered include: designing for usability in AeH systems, the privacy and security of audit mechanisms, providing for diversity of users, the scalability of AeH systems, and finally the challenges of enabling research and Big Data Analytics on shared eHealth Records while ensuring accountability and privacy are maintained.
Resumo:
Citizen science projects have demonstrated the advantages of people with limited relevant prior knowledge participating in research. However, there is a difference between engaging the general public in a scientific project and entering an established expert community to conduct research. This paper describes our ongoing acoustic biodiversity monitoring collaborations with the bird watching community. We report on findings gathered over six years from participation in bird walks, observing conservation efforts, and records of personal activities of experienced birders. We offer an empirical study into extending existing protocols through in-context collaborative design involving scientists and domain experts.
Resumo:
A new online method is presented for estimation of the angular randomwalk and rate randomwalk coefficients of inertial measurement unit gyros and accelerometers. In the online method, a state-space model is proposed, and recursive parameter estimators are proposed for quantities previously measured from offline data techniques such as the Allan variance method. The Allan variance method has large offline computational effort and data storage requirements. The technique proposed here requires no data storage and computational effort of approximately 100 calculations per data sample.
Resumo:
A new online method is presented for estimation of the angular random walk and rate random walk coefficients of IMU (inertial measurement unit) gyros and accelerometers. The online method proposes a state space model and proposes parameter estimators for quantities previously measured from off-line data techniques such as the Allan variance graph. Allan variance graphs have large off-line computational effort and data storage requirements. The technique proposed here requires no data storage and computational effort of O(100) calculations per data sample.
Resumo:
Clustering is an important technique in organising and categorising web scale documents. The main challenges faced in clustering the billions of documents available on the web are the processing power required and the sheer size of the datasets available. More importantly, it is nigh impossible to generate the labels for a general web document collection containing billions of documents and a vast taxonomy of topics. However, document clusters are most commonly evaluated by comparison to a ground truth set of labels for documents. This paper presents a clustering and labeling solution where the Wikipedia is clustered and hundreds of millions of web documents in ClueWeb12 are mapped on to those clusters. This solution is based on the assumption that the Wikipedia contains such a wide range of diverse topics that it represents a small scale web. We found that it was possible to perform the web scale document clustering and labeling process on one desktop computer under a couple of days for the Wikipedia clustering solution containing about 1000 clusters. It takes longer to execute a solution with finer granularity clusters such as 10,000 or 50,000. These results were evaluated using a set of external data.
Resumo:
Building on hashtag datasets gathered since January 2011, this paper will compare patterns of Twitter usage during the popular revolution in Egypt and the civil war in Libya. Using custom-made tools for processing ‘big data’ (boyd & Crawford, 2011), we will examine the volume of tweets sent by English-, Arabic-, and mixed-language Twitter users over time, and examine the networks of interaction (variously through @replying, retweeting, or both) between these groups as they developed and shifted over the course of these uprisings. Examining @reply and retweet traffic, we will identify general patterns of information flow between the English- and Arabic-speaking sides of the Twittersphere, and highlight the roles played by key boundary riders connecting both language spheres. Further, we will examine the URLs shared in these hashtags by Twitter participants, to identify the most prominent overall information sources, examine differences in the information diet experienced by English- and Arabic-language users, and investigate whether there are any online sources whose URLs are transcending language boundaries more frequently than others.
Resumo:
Many organizations realize that increasing amounts of data (“Big Data”) need to be dealt with intelligently in order to compete with other organizations in terms of efficiency, speed and services. The goal is not to collect as much data as possible, but to turn event data into valuable insights that can be used to improve business processes. However, data-oriented analysis approaches fail to relate event data to process models. At the same time, large organizations are generating piles of process models that are disconnected from the real processes and information systems. In this chapter we propose to manage large collections of process models and event data in an integrated manner. Observed and modeled behavior need to be continuously compared and aligned. This results in a “liquid” business process model collection, i.e. a collection of process models that is in sync with the actual organizational behavior. The collection should self-adapt to evolving organizational behavior and incorporate relevant execution data (e.g. process performance and resource utilization) extracted from the logs, thereby allowing insightful reports to be produced from factual organizational data.
Resumo:
The Technology Acceptance Model (TAM) is a prominent framework that addresses the challenge of organisations to understand and promote the factors that lead to acceptance of new technologies. Nevertheless, our understanding of one of the model's key variables – social influence – remains limited. Drawing upon earlier studies that address the role of referent individuals to technology acceptance, this paper introduces the notion of ‘coalition’ as a social group that can affect the opinion of other members within an organisation. Our empirical study centres on an organisation that has recently decided to introduce Big Data into its formal operations. Through a unique empirical approach that analyses sentiments expressed by individuals about this technology on the organisation's online forum, we demonstrate the emergence of a central referent, and in turn the dynamics of a coalition that builds around this referent as the attitudes of individuals converge upon the Big Data issue. Our paper contributes to existing TAM frameworks by elaborating the social influence variable and providing a dynamic lens to the technology acceptance process. We concurrently offer a methodological tool for organisations to understand social dynamics that form about a newly introduced technology and accelerate its acceptance by employees.
Resumo:
Acoustic recordings play an increasingly important role in monitoring terrestrial environments. However, due to rapid advances in technology, ecologists are accumulating more audio than they can listen to. Our approach to this big-data challenge is to visualize the content of long-duration audio recordings by calculating acoustic indices. These are statistics which describe the temporal-spectral distribution of acoustic energy and reflect content of ecological interest. We combine spectral indices to produce false-color spectrogram images. These not only reveal acoustic content but also facilitate navigation. An additional analytic challenge is to find appropriate descriptors to summarize the content of 24-hour recordings, so that it becomes possible to monitor long-term changes in the acoustic environment at a single location and to compare the acoustic environments of different locations. We describe a 24-hour ‘acoustic-fingerprint’ which shows some preliminary promise.