53 resultados para scalability


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a low energy memory decoder architecture for ultra-low-voltage systems containing multiple voltage domains. Due to limitations in scalability of memory supply voltages, these systems typically contain a core operating at subthreshold voltages and memories operating at a higher voltage. This difference in voltage provides a timing slack on the memory path as the core supply is scaled. The paper analyzes the feasibility and trade-offs in utilizing this timing slack to operate a greater section of memory decoder circuitry at the lower supply. A 256x16-bit SRAM interface has been designed in UMC 65nm low-leakage process to evaluate the above technique with the core and memory operating at 280 mV and 500 mV respectively. The technique provides a reduction of up to 20% in energy/cycle of the row decoder without any penalty in area and system-delay.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

QR decomposition (QRD) is a widely used Numerical Linear Algebra (NLA) kernel with applications ranging from SONAR beamforming to wireless MIMO receivers. In this paper, we propose a novel Givens Rotation (GR) based QRD (GR QRD) where we reduce the computational complexity of GR and exploit higher degree of parallelism. This low complexity Column-wise GR (CGR) can annihilate multiple elements of a column of a matrix simultaneously. The algorithm is first realized on a Two-Dimensional (2 D) systolic array and then implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA). We benchmark the proposed implementation against state-of-the-art implementations to report better throughput, convergence and scalability.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We describe our novel LED communication infrastructure and demonstrate its scalability across platforms. Our system achieves 50 kilo bits per second on very simple SoCs and scales to megabits bits per second rates on dual processor based mobile phone platforms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cache analysis plays a very important role in obtaining precise Worst Case Execution Time (WCET) estimates of programs for real-time systems. While Abstract Interpretation based approaches are almost universally used for cache analysis, they fail to take advantage of its unique requirement: it is not necessary to find the guaranteed cache behavior that holds across all executions of a program. We only need the cache behavior along one particular program path, which is the path with the maximum execution time. In this work, we introduce the concept of cache miss paths, which allows us to use the worst-case path information to improve the precision of AI-based cache analysis. We use Abstract Interpretation to determine the cache miss paths, and then integrate them in the IPET formulation. An added advantage is that this further allows us to use infeasible path information for cache analysis. Experimentally, our approach gives more precise WCETs as compared to AI-based cache analysis, and we also provide techniques to trade-off analysis time with precision to provide scalability.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The nodes with dynamicity, and management without administrator are key features of mobile ad hoc networks (1VIANETs). Increasing resource requirements of nodes running different applications, scarcity of resources, and node mobility in MANETs are the important issues to be considered in allocation of resources. Moreover, management of limited resources for optimal allocation is a crucial task. In our proposed work we discuss a design of resource allocation protocol and its performance evaluation. The proposed protocol uses both static and mobile agents. The protocol does the distribution and parallelization of message propagation (mobile agent with information) in an efficient way to achieve scalability and speed up message delivery to the nodes in the sectors of the zones of a MANET. The protocol functionality has been simulated using Java Agent Development Environment (JADE) Framework for agent generation, migration and communication. A mobile agent migrates from central resource rich node with message and navigate autonomously in the zone of network until the boundary node. With the performance evaluation, it has been concluded that the proposed protocol consumes much less time to allocate the required resources to the nodes under requirement, utilize less network resources and increase the network scalability. (C) 2015 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Clock synchronization is highly desirable in distributed systems, including many applications in the Internet of Things and Humans. It improves the efficiency, modularity, and scalability of the system, and optimizes use of event triggers. For IoTH, BLE - a subset of the recent Bluetooth v4.0 stack - provides a low-power and loosely coupled mechanism for sensor data collection with ubiquitous units (e.g., smartphones and tablets) carried by humans. This fundamental design paradigm of BLE is enabled by a range of broadcast advertising modes. While its operational benefits are numerous, the lack of a common time reference in the broadcast mode of BLE has been a fundamental limitation. This article presents and describes CheepSync, a time synchronization service for BLE advertisers, especially tailored for applications requiring high time precision on resource constrained BLE platforms. Designed on top of the existing Bluetooth v4.0 standard, the CheepSync framework utilizes low-level time-stamping and comprehensive error compensation mechanisms for overcoming uncertainties in message transmission, clock drift, and other system-specific constraints. CheepSync was implemented on custom designed nRF24Cheep beacon platforms (as broadcasters) and commercial off-the-shelf Android ported smartphones (as passive listeners). We demonstrate the efficacy of CheepSync by numerous empirical evaluations in a variety of experimental setups, and show that its average (single-hop) time synchronization accuracy is in the 10 mu s range.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of continuous curvature path planning for passages is considered. This problem arises when an autonomous vehicle traverses between prescribed boundaries such as corridors, tunnels, channels, etc. Passage boundaries with curvature and heading discontinuities pose challenges for generating smooth paths passing through them. Continuous curvature half-S shaped paths derived from the Four Parameter Logistic Curve family are proposed as a prospective path planning solution. Analytic conditions are derived for generating continuous curvature paths confined within the passage boundaries. Zero end curvature highlights the scalability of the proposed solution and its compatibility with other path planners in terms of larger path planning domains. Various scenarios with curvature and heading discontinuities are considered presenting viability of the proposed solution.