124 resultados para big data storage
Resumo:
This poster presents key features of how QUT’s integrated research data storage and management services work with researchers through their own individual or team research life cycle. By understanding the characteristics of research data, and the long-term need to store this data, QUT has provided resources and tools that support QUT’s goal of being a research intensive institute. Key to successful delivery and operation has been the focus upon researchers’ individual needs and the collaboration between providers, in particular, Information Technology Services, High Performance Computing and Research Support, and QUT Library. QUT’s Research Data Storage service provides all QUT researchers (staff and Higher Degree Research students (HDRs)) with a secure data repository throughout the research data lifecycle. Three distinct storage areas provide for raw research data to be acquired, project data to be worked on, and published data to be archived. Since the service was launched in late 2014, it has provided research project teams from all QUT faculties with acquisition, working or archival data space. Feedback indicates that the storage suits the unique needs of researchers and their data. As part of the workflow to establish storage space for researchers, Research Support Specialists and Research Data Librarians consult with researchers and HDRs to identify data storage requirements for projects and individual researchers, and to select and implement the most suitable data storage services and facilities. While research can be a journey into the unknown[1], a plan can help navigate through the uncertainty. Intertwined in the storage provision is QUT’s Research Data Management Planning tool. Launched in March 2015, it has already attracted 273 QUT staff and 352 HDR student registrations, and over 620 plans have been created (2/10/2015). Developed in collaboration with Office of Research Ethics and Integrity (OREI), uptake of the plan has exceeded expectations.
Resumo:
Making Sense of Mass Education provides an engaging and accessible analysis of traditional issues associated with mass education. The book challenges preconceptions about social class, gender and ethnicity discrimination; highlights the interplay between technology, media, popular culture and schooling; and inspects the relevance of ethics and philosophy in the modern classroom. This new edition has been comprehensively updated to provide current information regarding literature, statistics and legal policies, and significantly expands on the previous edition's structure of derailing traditional myths about education as a point of discussion. It also features two new chapters on Big Data and Globalisation and what they mean for the Australian classroom. Written for students, practising teachers and academics alike, Making Sense of Mass Education summarises the current educational landscape in Australia and looks at fundamental issues in society as they relate to education.
Resumo:
Objective Vast amounts of injury narratives are collected daily and are available electronically in real time and have great potential for use in injury surveillance and evaluation. Machine learning algorithms have been developed to assist in identifying cases and classifying mechanisms leading to injury in a much timelier manner than is possible when relying on manual coding of narratives. The aim of this paper is to describe the background, growth, value, challenges and future directions of machine learning as applied to injury surveillance. Methods This paper reviews key aspects of machine learning using injury narratives, providing a case study to demonstrate an application to an established human-machine learning approach. Results The range of applications and utility of narrative text has increased greatly with advancements in computing techniques over time. Practical and feasible methods exist for semi-automatic classification of injury narratives which are accurate, efficient and meaningful. The human-machine learning approach described in the case study achieved high sensitivity and positive predictive value and reduced the need for human coding to less than one-third of cases in one large occupational injury database. Conclusion The last 20 years have seen a dramatic change in the potential for technological advancements in injury surveillance. Machine learning of ‘big injury narrative data’ opens up many possibilities for expanded sources of data which can provide more comprehensive, ongoing and timely surveillance to inform future injury prevention policy and practice.
Resumo:
This paper presents a cautious argument for re-thinking both the nature and the centrality of the one-to-one teacher/student relationship in contemporary pedagogy. A case is made that learning in and for our times requires us to broaden our understanding of pedagogical relations beyond the singularity of the teacher/student binary and to promote the connected teacher as better placed to lead learning for these times. The argument proceeds in three parts: first, a characterization of our times as defined increasingly by the digital knowledge explosion of Big Data; second, a re-thinking of the nature of pedagogical relationships in the context of Big Data; and third, an account of the ways in which leaders can support their teachers to become more effective in leading learning by being more closely connected to their professional colleagues.
Using Big Data to manage safety-related risk in the upstream oil and gas industry: A research agenda
Resumo:
Despite considerable effort and a broad range of new approaches to safety management over the years, the upstream oil & gas industry has been frustrated by the sector’s stubbornly high rate of injuries and fatalities. This short communication points out, however, that the industry may be in a position to make considerable progress by applying “Big Data” analytical tools to the large volumes of safety-related data that have been collected by these organizations. Toward making this case, we examine existing safety-related information management practices in the upstream oil & gas industry, and specifically note that data in this sector often tends to be highly customized, difficult to analyze using conventional quantitative tools, and frequently ignored. We then contend that the application of new Big Data kinds of analytical techniques could potentially reveal patterns and trends that have been hidden or unknown thus far, and argue that these tools could help the upstream oil & gas sector to improve its injury and fatality statistics. Finally, we offer a research agenda toward accelerating the rate at which Big Data and new analytical capabilities could play a material role in helping the industry to improve its health and safety performance.
Resumo:
Environmental monitoring is becoming critical as human activity and climate change place greater pressures on biodiversity, leading to an increasing need for data to make informed decisions. Acoustic sensors can help collect data across large areas for extended periods making them attractive in environmental monitoring. However, managing and analysing large volumes of environmental acoustic data is a great challenge and is consequently hindering the effective utilization of the big dataset collected. This paper presents an overview of our current techniques for collecting, storing and analysing large volumes of acoustic data efficiently, accurately, and cost-effectively.
Resumo:
Twitter ist eine besonders nützliche Quelle für Social-Media-Daten: mit dem Twitter-API (dem Application Programming Interface, das einen strukturierten Zugang zu Kommunikationsdaten in standardisierten Formaten bietet) ist es Forschern möglich, mit ein wenig Mühe und ausreichenden technische Ressourcen sehr große Archive öffentlich verbreiteter Tweets zu bestimmten Themen, Interessenbereichen, oder Veranstaltungen aufzubauen. Grundsätzlich liefert das API sehr langen Listen von Hunderten, Tausenden oder Millionen von Tweets und den Metadaten zu diesen Tweets; diese Daten können dann auf verschiedentlichste Weise extrahiert, kombiniert, und visualisiert werden, um die Dynamik der Social-Media-Kommunikation zu verstehen. Diese Forschung ist häufig um althergebrachte Fragestellungen herum aufgebaut, wird aber in der Regel in einem bislang unbekannt großen Maßstab durchgeführt. Die Projekte von Medien- und Kommunikationswissenschaftlern wie Papacharissi und de Fatima Oliveira (2012), Wood und Baughman (2012) oder Lotan et al. (2011) – um nur eine Handvoll der letzten Beispiele zu nennen – sind grundlegend auf Twitterdatensätze aufgebaut, die jetzt routinemäßig Millionen von Tweets und zugehörigen Metadaten umfassen, erfaßt nach einer Vielzahl von Kriterien. Was allen diesen Fällen gemein ist, ist jedoch die Notwendigkeit, neue methodische Wege in der Verarbeitung und Analyse derart großer Datensätze zur medienvermittelten sozialen Interaktion zu gehen.
Resumo:
Monitoring the environment with acoustic sensors is an effective method for understanding changes in ecosystems. Through extensive monitoring, large-scale, ecologically relevant, datasets can be produced that can inform environmental policy. The collection of acoustic sensor data is a solved problem; the current challenge is the management and analysis of raw audio data to produce useful datasets for ecologists. This paper presents the applied research we use to analyze big acoustic datasets. Its core contribution is the presentation of practical large-scale acoustic data analysis methodologies. We describe details of the data workflows we use to provide both citizen scientists and researchers practical access to large volumes of ecoacoustic data. Finally, we propose a work in progress large-scale architecture for analysis driven by a hybrid cloud-and-local production-grade website.
Resumo:
In public transport, seamless coordinated transfer strengthens the quality of service and attracts ridership. The problem of transfer coordination is sophisticated due to (1) the stochasticity of travel time variability, (2) unavailability of passenger transfer plan. However, the proliferation of Big Data technologies provides a tremendous opportunity to solve these problems. This dissertation enhances passenger transfer quality by offline and online transfer coordination. While offline transfer coordination exploits the knowledge of travel time variability to coordinate transfers, online transfer coordination provides simultaneous vehicle arrivals at stops to facilitate transfers by employing the knowledge of passenger behaviours.
Resumo:
Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop's data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.
Resumo:
This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.
Resumo:
Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.
Resumo:
Our paper approaches Twitter through the lens of “platform politics” (Gillespie, 2010), focusing in particular on controversies around user data access, ownership, and control. We characterise different actors in the Twitter data ecosystem: private and institutional end users of Twitter, commercial data resellers such as Gnip and DataSift, data scientists, and finally Twitter, Inc. itself; and describe their conflicting interests. We furthermore study Twitter’s Terms of Service and application programming interface (API) as material instantiations of regulatory instruments used by the platform provider and argue for a more promotion of data rights and literacy to strengthen the position of end users.