420 resultados para Big data, Spark, Hadoop
Resumo:
As technological capabilities for capturing, aggregating, and processing large quantities of data continue to improve, the question becomes how to effectively utilise these resources. Whenever automatic methods fail, it is necessary to rely on human background knowledge, intuition, and deliberation. This creates demand for data exploration interfaces that support the analytical process, allowing users to absorb and derive knowledge from data. Such interfaces have historically been designed for experts. However, existing research has shown promise in involving a broader range of users that act as citizen scientists, placing high demands in terms of usability. Visualisation is one of the most effective analytical tools for humans to process abstract information. Our research focuses on the development of interfaces to support collaborative, community-led inquiry into data, which we refer to as Participatory Data Analytics. The development of data exploration interfaces to support independent investigations by local communities around topics of their interest presents a unique set of challenges, which we discuss in this paper. We present our preliminary work towards suitable high-level abstractions and interaction concepts to allow users to construct and tailor visualisations to their own needs.
Resumo:
This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.
Resumo:
Big Data and Learning Analytics’ promise to revolutionise educational institutions, endeavours, and actions through more and better data is now compelling. Multiple, and continually updating, data sets produce a new sense of ‘personalised learning’. A crucial attribute of the datafication, and subsequent profiling, of learner behaviour and engagement is the continual modification of the learning environment to induce greater levels of investment on the parts of each learner. The assumption is that more and better data, gathered faster and fed into ever-updating algorithms, provide more complete tools to understand, and therefore improve, learning experiences through adaptive personalisation. The argument in this paper is that Learning Personalisation names a new logistics of investment as the common ‘sense’ of the school, in which disciplinary education is ‘both disappearing and giving way to frightful continual training, to continual monitoring'.
Resumo:
Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.
Resumo:
In an increasingly business technology (BT) dependent world, the impact of the extraordinary changes brought about by the nexus of mobile and cloud technologies, social media and big data is increasingly being felt in the board room. As leaders of enterprises of every type and size, board directors can no longer afford to ignore, delegate or avoid BT-related decisions. Competitive, financial and reputational risk is increased if boards fail to recognize their role in governing technology as an asset and in removing barriers to improving enterprise business technology governance (EBTG). Directors’ awareness of the need for EBTG is increasing. However, industry research shows that board level willingness to rectify the gap between awareness and action is very low or non-existent. This literature review-based research identifies barriers to EBTG effectiveness. It provides a practical starting point for board analysis. We offer four outcomes that boards might focus on to ensure the organizations they govern are not left behind by those led by the upcoming new breed of technology-savvy leaders. Most extant research looks backward for examples, examining data pre-2010, the time when a tipping point in the personal and business use of multimedia and mobile-internet devices significantly deepened the impacts of the identified nexus technology forces, and began rapidly changing the way many businesses engage with their customers, employees and stakeholders. We situate our work amidst these nexus forces, discuss the board’s role in EBTG in this context, and modernize current definitions of enterprise technology governance. The primary limitation faced is the lack of scholarly research relating to EBTG in the rapidly changing digital economy. Although we have used recent (2011 - 2013) industry surveys, the volume of these surveys and congruence across them is significant in terms of levels of increased awareness and calls for increased board attention and competency in EBTG and strategic information use. Where possible we have used scholarly research to illustrate or discuss industry findings.
Resumo:
Now as in earlier periods of acute change in the media environment, new disciplinary articulations are producing new methods for media and communication research. At the same time, established media and communication studies meth- ods are being recombined, reconfigured, and remediated alongside their objects of study. This special issue of JOBEM seeks to explore the conceptual, political, and practical aspects of emerging methods for digital media research. It does so at the conjuncture of a number of important contemporary trends: the rise of a ‘‘third wave’’ of the Digital Humanities and the ‘‘computational turn’’ (Berry, 2011) associated with natively digital objects and the methods for studying them; the apparently ubiquitous Big Data paradigm—with its various manifestations across academia, business, and government — that brings with it a rapidly increasing interest in social media communication and online ‘‘behavior’’ from the ‘‘hard’’ sciences; along with the multisited, embodied, and emplaced nature of everyday digital media practice.
Resumo:
Although popular media narratives about the role of social media in driving the events of the 2011 “Arab Spring” are likely to overstate the impact of Facebook and Twitter on these uprisings, it is nonetheless true that protests and unrest in countries from Tunisia to Syria generated a substantial amount of social media activity. On Twitter alone, several millions of tweets containing the hashtags #libya or #egypt were generated during 2011, both by directly affected citizens of these countries and by onlookers from further afield. What remains unclear, though, is the extent to which there was any direct interaction between these two groups (especially considering potential language barriers between them). Building on hashtag data sets gathered between January and November 2011, this article compares patterns of Twitter usage during the popular revolution in Egypt and the civil war in Libya. Using custom-made tools for processing “big data,” we examine the volume of tweets sent by English-, Arabic-, and mixed-language Twitter users over time and examine the networks of interaction (variously through @replying, retweeting, or both) between these groups as they developed and shifted over the course of these uprisings. Examining @reply and retweet traffic, we identify general patterns of information flow between the English- and Arabic-speaking sides of the Twittersphere and highlight the roles played by users bridging both language spheres.
Resumo:
Acoustic recordings of the environment are an important aid to ecologists monitoring biodiversity and environmental health. However, rapid advances in recording technology, storage and computing make it possible to accumulate thousands of hours of recordings, of which, ecologists can only listen to a small fraction. The big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from hours, days, months to years. The visualization should facilitate navigation and yield ecologically meaningful information. Our approach is to extract (at one minute resolution) acoustic indices which reflect content of ecological interest. An acoustic index is a statistic that summarizes some aspect of the distribution of acoustic energy in a recording. We combine indices to produce false-colour images that reveal acoustic content and facilitate navigation through recordings that are months or even years in duration.
Resumo:
MapReduce is a computation model for processing large data sets in parallel on large clusters of machines, in a reliable, fault-tolerant manner. A MapReduce computation is broken down into a number of map tasks and reduce tasks, which are performed by so called mappers and reducers, respectively. The placement of the mappers and reducers on the machines directly affects the performance and cost of the MapReduce computation in cloud computing. From the computational point of view, the mappers/reducers placement problem is a generation of the classical bin packing problem, which is NP-complete. Thus, in this paper we propose a new heuristic algorithm for the mappers/reducers placement problem in cloud computing and evaluate it by comparing with other several heuristics on solution quality and computation time by solving a set of test problems with various characteristics. The computational results show that our heuristic algorithm is much more efficient than the other heuristics and it can obtain a better solution in a reasonable time. Furthermore, we verify the effectiveness of our heuristic algorithm by comparing the mapper/reducer placement for a benchmark problem generated by our heuristic algorithm with a conventional mapper/reducer placement which puts a fixed number of mapper/reducer on each machine. The comparison results show that the computation using our mapper/reducer placement is much cheaper than the computation using the conventional placement while still satisfying the computation deadline.
Resumo:
Social Media Analytics ist ein neuer Forschungsbereich, in dem interdisziplinäre Methoden kombiniert, erweitert und angepasst werden, um Social-Media-Daten auszuwerten. Neben der Beantwortung von Forschungsfragen ist es ebenfalls ein Ziel, Architekturentwürfe für die Entwicklung neuer Informationssysteme und Anwendungen bereitzustellen, die auf sozialen Medien basieren. Der Beitrag stellt die wichtigsten Aspekte des Bereichs Social Media Analytics vor und verweist auf die Notwendigkeit einer fächerübergreifenden Forschungsagenda, für deren Erstellung und Bearbeitung der Wirtschaftsinformatik eine wichtige Rolle zukommt.
Resumo:
Social Media Analytics is an emerging interdisciplinary research field that aims on combining, extending, and adapting methods for analysis of social media data. On the one hand it can support IS and other research disciplines to answer their research questions and on the other hand it helps to provide architectural designs as well as solution frameworks for new social media-based applications and information systems. The authors suggest that IS should contribute to this field and help to develop and process an interdisciplinary research agenda.
Resumo:
Twitter is the focus of much research attention, both in traditional academic circles and in commercial market and media research, as analytics give increasing insight into the performance of the platform in areas as diverse as political communication, crisis management, television audiencing and other industries. While methods for tracking Twitter keywords and hashtags have developed apace and are well documented, the make-up of the Twitter user base and its evolution over time have been less understood to date. Recent research efforts have taken advantage of functionality provided by Twitter's Application Programming Interface to develop methodologies to extract information that allows us to understand the growth of Twitter, its geographic spread and the processes by which particular Twitter users have attracted followers. From politicians to sporting teams, and from YouTube personalities to reality television stars, this technique enables us to gain an understanding of what prompts users to follow others on Twitter. This article outlines how we came upon this approach, describes the method we adopted to produce accession graphs and discusses their use in Twitter research. It also addresses the wider ethical implications of social network analytics, particularly in the context of a detailed study of the Twitter user base.
Resumo:
In this paper, we explore the use of Twitter as a political tool in the 2013 Australian Federal Election. We employ a ‘big data’ approach that combines qualitative and quantitative methods of analysis. By tracking the accounts of politicians and parties, and the tweeting activity to and around these accounts, as well as conversations on particular hashtagged topics, we gain a comprehensive insight into the ways in which Twitter is employed in the campaigning strategies of different parties. We compare and contrast the use of Twitter by political actors with its adoption by citizens as a tool for political conversation and participation. Our study provides an important longitudinal counterpoint, and opportunity for comparison, to the use of Twitter in previous Australian federal and state elections. Furthermore, we offer innovative methodologies for data gathering and evaluation that can contribute to the comparative study of the political uses of Twitter across diverse national media and political systems.
Resumo:
The placement of the mappers and reducers on the machines directly affects the performance and cost of the MapReduce computation in cloud computing. From the computational point of view, the mappers/reducers placement problem is a generalization of the classical bin packing problem, which is NP-complete. Thus, in this paper we propose a new heuristic algorithm for the mappers/reducers placement problem in cloud computing and evaluate it by comparing with other several heuristics on solution quality and computation time by solving a set of test problems with various characteristics. The computational results show that our heuristic algorithm is much more efficient than the other heuristics. Also, we verify the effectiveness of our heuristic algorithm by comparing the mapper/reducer placement for a benchmark problem generated by our heuristic algorithm with a conventional mapper/reducer placement. The comparison results show that the computation using our mapper/reducer placement is much cheaper while still satisfying the computation deadline.
Resumo:
We identify relation completion (RC) as one recurring problem that is central to the success of novel big data applications such as Entity Reconstruction and Data Enrichment. Given a semantic relation, RC attempts at linking entity pairs between two entity lists under the relation. To accomplish the RC goals, we propose to formulate search queries for each query entity α based on some auxiliary information, so that to detect its target entity β from the set of retrieved documents. For instance, a pattern-based method (PaRE) uses extracted patterns as the auxiliary information in formulating search queries. However, high-quality patterns may decrease the probability of finding suitable target entities. As an alternative, we propose CoRE method that uses context terms learned surrounding the expression of a relation as the auxiliary information in formulating queries. The experimental results based on several real-world web data collections demonstrate that CoRE reaches a much higher accuracy than PaRE for the purpose of RC.