12 resultados para Data Analytics
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
We present a mathematically rigorous Quality-of-Service (QoS) metric which relates the achievable quality of service metric (QoS) for a real-time analytics service to the server energy cost of offering the service. Using a new iso-QoS evaluation methodology, we scale server resources to meet QoS targets and directly rank the servers in terms of their energy-efficiency and by extension cost of ownership. Our metric and method are platform-independent and enable fair comparison of datacenter compute servers with significant architectural diversity, including micro-servers. We deploy our metric and methodology to compare three servers running financial option pricing workloads on real-life market data. We find that server ranking is sensitive to data inputs and desired QoS level and that although scale-out micro-servers can be up to two times more energy-efficient than conventional heavyweight servers for the same target QoS, they are still six times less energy efficient than high-performance computational accelerators.
Resumo:
NanoStreams is a consortium project funded by the European Commission under its FP7 programme and is a major effort to address the challenges of processing vast amounts of data in real-time, with a markedly lower carbon footprint than the state of the art. The project addresses both the energy challenge and the high-performance required by emerging applications in real-time streaming data analytics. NanoStreams achieves this goal by designing and building disruptive micro-server solutions incorporating real-silicon prototype micro-servers based on System-on-Chip and reconfigurable hardware technologies.
Resumo:
We make a case for studying the impact of intra-node parallelism on the performance of data analytics. We identify four performance optimizations that are enabled by an increasing number of processing cores on a chip. We discuss the performance impact of these opimizations on two analytics operators and we identify how these optimizations affect each another.
Resumo:
As data analytics are growing in importance they are also quickly becoming one of the dominant application domains that require parallel processing. This paper investigates the applicability of OpenMP, the dominant shared-memory parallel programming model in high-performance computing, to the domain of data analytics. We contrast the performance and programmability of key data analytics benchmarks against Phoenix++, a state-of-the-art shared memory map/reduce programming system. Our study shows that OpenMP outperforms the Phoenix++ system by a large margin for several benchmarks. In other cases, however, the programming model is lacking support for this application domain.
Resumo:
The continued use of traditional lecturing across Higher Education as the main teaching and learning approach in many disciplines must be challenged. An increasing number of studies suggest that this approach, compared to more active learning methods, is the least effective. In counterargument, the use of traditional lectures are often justified as necessary given a large student population. By analysing the implementation of a web based broadcasting approach which replaced the traditional lecture within a programming-based module, and thereby removed the student population rationale, it was hoped that the student learning experience would become more active and ultimately enhance learning on the module. The implemented model replaces the traditional approach of students attending an on-campus lecture theatre with a web-based live broadcast approach that focuses on students being active learners rather than passive recipients. Students ‘attend’ by viewing a live broadcast of the lecturer, presented as a talking head, and the lecturer’s desktop, via a web browser. Video and audio communication is primarily from tutor to students, with text-based comments used to provide communication from students to tutor. This approach promotes active learning by allowing student to perform activities on their own computer rather than the passive viewing and listening common encountered in large lecture classes. By analysing this approach over two years (n = 234 students) results indicate that 89.6% of students rated the approach as offering a highly positive learning experience. Comparing student performance across three academic years also indicates a positive change. A small data analytic analysis was conducted into student participation levels and suggests that the student cohort's willingness to engage with the broadcast lectures material is high.
Resumo:
Inherently error-resilient applications in areas such as signal processing, machine learning and data analytics provide opportunities for relaxing reliability requirements, and thereby reducing the overhead incurred by conventional error correction schemes. In this paper, we exploit the tolerable imprecision of such applications by designing an energy-efficient fault-mitigation scheme for unreliable data memories to meet target yield. The proposed approach uses a bit-shuffling mechanism to isolate faults into bit locations with lower significance. This skews the bit-error distribution towards the low order bits, substantially limiting the output error magnitude. By controlling the granularity of the shuffling, the proposed technique enables trading-off quality for power, area, and timing overhead. Compared to error-correction codes, this can reduce the overhead by as much as 83% in read power, 77% in read access time, and 89% in area, when applied to various data mining applications in 28nm process technology.
Resumo:
Graph analytics is an important and computationally demanding class of data analytics. It is essential to balance scalability, ease-of-use and high performance in large scale graph analytics. As such, it is necessary to hide the complexity of parallelism, data distribution and memory locality behind an abstract interface. The aim of this work is to build a scalable graph analytics framework that does not demand significant parallel programming experience based on NUMA-awareness.
The realization of such a system faces two key problems:
(i)~how to develop a scale-free parallel programming framework that scales efficiently across NUMA domains; (ii)~how to efficiently apply graph partitioning in order to create separate and largely independent work items that can be distributed among threads.
Resumo:
The foundational concept of Network Enabled Capability relies on effective, timely information sharing. This information is used in analysis, trade and scenario studies, and ultimately decision-making. In this paper, the concept of visual analytics is explored as an enabler to facilitate rapid, defensible, and superior decision-making. By coupling analytical reasoning with the exceptional human capability to rapidly internalize and understand visual data, visual analytics allows individual and collaborative decision-making to occur in the face of vast and disparate data, time pressures, and uncertainty. An example visual analytics framework is presented in the form of a decision-making environment centered on the Lockheed C-5A and C-5M aircraft. This environment allows rapid trade studies to be conducted on design, logistics, and capability within the aircraft?s operational roles. Through this example, the use of a visual analytics decision-making environment within a military environment is demonstrated.
Resumo:
Recent technological advances have increased the quantity of movement data being recorded. While valuable knowledge can be gained by analysing such data, its sheer volume creates challenges. Geovisual analytics, which helps the human cognition process by using tools to reason about data, offers powerful techniques to resolve these challenges. This paper introduces such a geovisual analytics environment for exploring movement trajectories, which provides visualisation interfaces, based on the classic space-time cube. Additionally, a new approach, using the mathematical description of motion within a space-time cube, is used to determine the similarity of trajectories and forms the basis for clustering them. These techniques were used to analyse pedestrian movement. The results reveal interesting and useful spatiotemporal patterns and clusters of pedestrians exhibiting similar behaviour.
Resumo:
NanoStreams explores the design, implementation,and system software stack of micro-servers aimed at processingdata in-situ and in real time. These micro-servers can serve theemerging Edge computing ecosystem, namely the provisioningof advanced computational, storage, and networking capabilitynear data sources to achieve both low latency event processingand high throughput analytical processing, before consideringoff-loading some of this processing to high-capacity datacentres.NanoStreams explores a scale-out micro-server architecture thatcan achieve equivalent QoS to that of conventional rack-mountedservers for high-capacity datacentres, but with dramaticallyreduced form factors and power consumption. To this end,NanoStreams introduces novel solutions in programmable & con-figurable hardware accelerators, as well as the system softwarestack used to access, share, and program those accelerators.Our NanoStreams micro-server prototype has demonstrated 5.5×higher energy-efficiency than a standard Xeon Server. Simulationsof the microserver’s memory system extended to leveragehybrid DDR/NVM main memory indicated 5× higher energyefficiencythan a conventional DDR-based system.