49 resultados para Distributed computer systems


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Massively parallel networks of highly efficient, high performance Single Instruction Multiple Data (SIMD) processors have been shown to enable FPGA-based implementation of real-time signal processing applications with performance and
cost comparable to dedicated hardware architectures. This is achieved by exploiting simple datapath units with deep processing pipelines. However, these architectures are highly susceptible to pipeline bubbles resulting from data and control hazards; the only way to mitigate against these is manual interleaving of
application tasks on each datapath, since no suitable automated interleaving approach exists. In this paper we describe a new automated integrated mapping/scheduling approach to map algorithm tasks to processors and a new low-complexity list scheduling technique to generate the interleaved schedules. When applied to a spatial Fixed-Complexity Sphere Decoding (FSD) detector
for next-generation Multiple-Input Multiple-Output (MIMO) systems, the resulting schedules achieve real-time performance for IEEE 802.11n systems on a network of 16-way SIMD processors on FPGA, enable better performance/complexity balance than current approaches and produce results comparable to handcrafted implementations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Realising high performance image and signal processing
applications on modern FPGA presents a challenging implementation problem due to the large data frames streaming through these systems. Specifically, to meet the high bandwidth and data storage demands of these applications, complex hierarchical memory architectures must be manually specified
at the Register Transfer Level (RTL). Automated approaches which convert high-level operation descriptions, for instance in the form of C programs, to an FPGA architecture, are unable to automatically realise such architectures. This paper
presents a solution to this problem. It presents a compiler to automatically derive such memory architectures from a C program. By transforming the input C program to a unique dataflow modelling dialect, known as Valved Dataflow (VDF), a mapping and synthesis approach developed for this dialect can
be exploited to automatically create high performance image and video processing architectures. Memory intensive C kernels for Motion Estimation (CIF Frames at 30 fps), Matrix Multiplication (128x128 @ 500 iter/sec) and Sobel Edge Detection (720p @ 30 fps), which are unrealisable by current state-of-the-art C-based synthesis tools, are automatically derived from a C description of the algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

NanoStreams explores the design, implementation,and system software stack of micro-servers aimed at processingdata in-situ and in real time. These micro-servers can serve theemerging Edge computing ecosystem, namely the provisioningof advanced computational, storage, and networking capabilitynear data sources to achieve both low latency event processingand high throughput analytical processing, before consideringoff-loading some of this processing to high-capacity datacentres.NanoStreams explores a scale-out micro-server architecture thatcan achieve equivalent QoS to that of conventional rack-mountedservers for high-capacity datacentres, but with dramaticallyreduced form factors and power consumption. To this end,NanoStreams introduces novel solutions in programmable & con-figurable hardware accelerators, as well as the system softwarestack used to access, share, and program those accelerators.Our NanoStreams micro-server prototype has demonstrated 5.5×higher energy-efficiency than a standard Xeon Server. Simulationsof the microserver’s memory system extended to leveragehybrid DDR/NVM main memory indicated 5× higher energyefficiencythan a conventional DDR-based system. 

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Power capping is a fundamental method for reducing the energy consumption of a wide range of modern computing environments, ranging from mobile embedded systems to datacentres. Unfortunately, maximising performance and system efficiency under static power caps remains challenging, while maximising performance under dynamic power caps has been largely unexplored. We present an adaptive power capping method that reduces the power consumption and maximizes the performance of heterogeneous SoCs for mobile and server platforms. Our technique combines power capping with coordinated DVFS, data partitioning and core allocations on a heterogeneous SoC with ARM processors and FPGA resources. We design our framework as a run-time system based on OpenMP and OpenCL to utilise the heterogeneous resources. We evaluate it through five data-parallel benchmarks on the Xilinx SoC which allows fully voltage and frequency control. Our experiments show a significant performance boost of 30% under dynamic power caps with concurrent execution on ARM and FPGA, compared to a naive separate approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Monitoring multiple myeloma patients for relapse requires sensitive methods to measure minimal residual disease and to establish a more precise prognosis. The present study aimed to standardize a real-time quantitative polymerase chain reaction (PCR) test for the IgH gene with a JH consensus self-quenched fluorescence reverse primer and a VDJH or DJH allele-specific sense primer (self-quenched PCR). This method was compared with allele-specific real-time quantitative PCR test for the IgH gene using a TaqMan probe and a JH consensus primer (TaqMan PCR). We studied nine multiple myeloma patients from the Spanish group treated with the MM2000 therapeutic protocol. Self-quenched PCR demonstrated sensitivity of >or=10(-4) or 16 genomes in most cases, efficiency was 1.71 to 2.14, and intra-assay and interassay reproducibilities were 1.18 and 0.75%, respectively. Sensitivity, efficiency, and residual disease detection were similar with both PCR methods. TaqMan PCR failed in one case because of a mutation in the JH primer binding site, and self-quenched PCR worked well in this case. In conclusion, self-quenched PCR is a sensitive and reproducible method for quantifying residual disease in multiple myeloma patients; it yields similar results to TaqMan PCR and may be more effective than the latter when somatic mutations are present in the JH intronic primer binding site.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

We discuss how common problems arising with multi/many core distributed architectures can he effectively handled through co-design of parallel/distributed programming abstractions and of autonomic management of non-functional concerns. In particular, we demonstrate how restricted patterns (or skeletons) may be efficiently managed by rule-based autonomic managers. We discuss the basic principles underlying pattern+manager co-design, current implementations inspired by this approach and some result achieved with proof-or-concept, prototype.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

FastFlow is a structured parallel programming framework targeting shared memory multi-core architectures. In this paper we introduce a FastFlow extension aimed at supporting also a network of multi-core workstations. The extension supports the execution of FastFlow programs by coordinating-in a structured way-the fine grain parallel activities running on a single workstation. We discuss the design and the implementation of this extension presenting preliminary experimental results validating it on state-of-the-art networked multi-core nodes. © 2013 Springer-Verlag.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The reverse engineering of a skeleton based programming environment and redesign to distribute management activities of the system and thereby remove a potential single point of failure is considered. The Ore notation is used to facilitate abstraction of the design and analysis of its properties. It is argued that Ore is particularly suited to this role as this type of management is essentially an orchestration activity. The Ore specification of the original version of the system is modified via a series of semi-formally justified derivation steps to obtain a specification of the decentralized management version which is then used as a basis for its implementation. Analysis of the two specifications allows qualitative prediction of the expected performance of the derived version with respect to the original, and this prediction is borne out in practice.