222 resultados para multicore


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This article is aimed to delineate groundwater sources in Holocene deposits area in the Gulf of Mannar Coast from Southern India. For this purpose 2-D electrical resistivity tomography (ERT), hydrochemical and granulomerical studies were carried out and integrated to identify hydrogeological structures and portable groundwater resource in shallow depths which in general appears in the coastal tracts. The 2-D ERT was used to determine the two-dimensional subsurface geological formations by multicore cable with Wenner array. Low resistivity of 1-5 Omega m for saline water appeared due to calcite at the depth of about 5 m below the ground level (bgl). Sea water intrusion was observed around the maximum resistivity as 5 Omega m at the 8 m depth, bgl in the calcite environs, but the calcareous sandstone layer shows around 15-64 Omega m at the 6 m depth, bgl. The hydrochemical variation of TDS, HCO3-, Cl-, Na+, K+, Ca2+, and Mg2+ concentrations was observed for the saline and sea water intrusion in the groundwater system. The granulometic analysis shows that the study area was under the sea between 5400 and 3000 year ago. The events of ice melting an unnatural ice-stone rain/hail among 5000-4000 years ago resulted in the inundation of sea over the area and deposits of late Holocene marine transgression formation up to Puthukottai quartzite region for a stretch of around 17 km.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective utilization of parallelism available on modern architectures. For applications that demand high performance, the traditional options are to use optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious. The focus of our system, PolyMage, is on automatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. Our optimization approach primarily relies on the transformation and code generation capabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically. Experimental results on a modern multicore system show that the performance achieved by our automatic approach is up to 1.81x better than that achieved through manual tuning in Halide, a state-of-the-art language and compiler for image processing pipelines. For a camera raw image processing pipeline, our performance is comparable to that of a hand-tuned implementation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We demonstrate a sub-nanosecond electro-optical switch with low crosstalk in a silicon-on-insulator (SOI) dual-coupled micro-ring embedded with p-i-n diodes. A crosstalk of -23 dB is obtained in the 20-mu m-radius micro-ring with the well-designing asymmetric dual-coupling structure. By optimizations of the doping profiles and the fabrication processes, the sub-nanosecond switch-on/off time of < 400 ps is finally realized under an electrical pre-emphasized driving signal. This compact and fast-response micro-ring switch, which can be fabricated by complementary metal oxide semiconductor (CMOS) compatible technologies, have enormous potential in optical interconnects of multicore networks-on-chip.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Multicore computational accelerators such as GPUs are now commodity components for highperformance computing at scale. While such accelerators have been studied in some detail as stand-alone computational engines, their integration in large-scale distributed systems raises new challenges and trade-offs. In this paper, we present an exploration of resource management alternatives for building asymmetric accelerator-based distributed systems. We present these alternatives in the context of a capabilities-aware framework for data-intensive computing, which uses an enhanced implementation of the MapReduce programming model for accelerator-based clusters, compared to the state of the art. The framework can transparently utilize heterogeneous accelerators for deriving high performance with low programming effort. Our work is the first to compare heterogeneous types of accelerators, GPUs and a Cell processors, in the same environment and the first to explore the trade-offs between compute-efficient and control-efficient accelerators on data-intensive systems. Our investigation shows that our framework scales well with the number of different compute nodes. Furthermore, it runs simultaneously on two different types of accelerators, successfully adapts to the resource capabilities, and performs 26.9% better on average than a static execution approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Computing has recently reached an inflection point with the introduction of multicore processors. On-chip thread-level parallelism is doubling approximately every other year. Concurrency lends itself naturally to allowing a program to trade performance for power savings by regulating the number of active cores; however, in several domains, users are unwilling to sacrifice performance to save power. We present a prediction model for identifying energy-efficient operating points of concurrency in well-tuned multithreaded scientific applications and a runtime system that uses live program analysis to optimize applications dynamically. We describe a dynamic phase-aware performance prediction model that combines multivariate regression techniques with runtime analysis of data collected from hardware event counters to locate optimal operating points of concurrency. Using our model, we develop a prediction-driven phase-aware runtime optimization scheme that throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each program phase. The use of prediction reduces the overhead of searching the optimization space while achieving near-optimal performance and power savings. A thorough evaluation of our approach shows a reduction in power consumption of 10.8 percent, simultaneous with an improvement in performance of 17.9 percent, resulting in energy savings of 26.7 percent.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs. We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more "domain specific" programming models. Experimental results demonstrating the feasibility of the approach are presented. © 2012 World Scientific Publishing Company.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sphere Decoding (SD) is a highly effective detection technique for Multiple-Input Multiple-Output (MIMO) wireless communications receivers, offering quasi-optimal accuracy with relatively low computational complexity as compared to the ideal ML detector. Despite this, the computational demands of even low-complexity SD variants, such as Fixed Complexity SD (FSD), remains such that implementation on modern software-defined network equipment is a highly challenging process, and indeed real-time solutions for MIMO systems such as 4 4 16-QAM 802.11n are unreported. This paper overcomes this barrier. By exploiting large-scale networks of fine-grained softwareprogrammable processors on Field Programmable Gate Array (FPGA), a series of unique SD implementations are presented, culminating in the only single-chip, real-time quasi-optimal SD for 44 16-QAM 802.11n MIMO. Furthermore, it demonstrates that the high performance software-defined architectures which enable these implementations exhibit cost comparable to dedicated circuit architectures.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

To enable reliable data transfer in next generation Multiple-Input Multiple-Output (MIMO) communication systems, terminals must be able to react to fluctuating channel conditions by having flexible modulation schemes and antenna configurations. This creates a challenging real-time implementation problem: to provide the high performance required of cutting edge MIMO standards, such as 802.11n, with the flexibility for this behavioural variability. FPGA softcore processors offer a solution to this problem, and in this paper we show how heterogeneous SISD/SIMD/MIMD architectures can enable programmable multicore architectures on FPGA with similar performance and cost as traditional dedicated circuit-based architectures. When applied to a 4×4 16-QAM Fixed-Complexity Sphere Decoder (FSD) detector we present the first soft-processor based solution for real-time 802.11n MIMO.