246 resultados para parallel architectures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to carry out high-precision machining of aerospace structural components with large size, thin wall and complex surface, this paper proposes a novel parallel kinematic machine (PKM) and formulates its semi-analytical theoretical stiffness model considering gravitational effects that is verified by stiffness experiments. From the viewpoint of topology structure, the novel PKM consists of two substructures in terms of the redundant and overconstrained parallel mechanisms that are connected by two interlinked revolute joints. The theoretical stiffness model of the novel PKM is established based upon the virtual work principle and deformation superposition principle after mapping the stiffness models of substructures from joint space to operated space by Jacobian matrices and considering the deformation contributions of interlinked revolute joints to two substructures. Meanwhile, the component gravities are treated as external payloads exerting on the end reference point of the novel PKM resorting to static equivalence principle. This approach is proved by comparing the theoretical stiffness values with experimental stiffness values in the same configurations, which also indicates equivalent gravity can be employed to describe the actual distributed gravities in an acceptable accuracy manner. Finally, on the basis of the verified theoretical stiffness model, the stiffness distributions of the novel PKM are illustrated and the contributions of component gravities to the stiffness of the novel PKM are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Power, and consequently energy, has recently attained first-class system resource status, on par with conventional metrics such as CPU time. To reduce energy consumption, many hardware- and OS-level solutions have been investigated. However, application-level information - which can provide the system with valuable insights unattainable otherwise - was only considered in a handful of cases. We introduce OpenMPE, an extension to OpenMP designed for power management. OpenMP is the de-facto standard for programming parallel shared memory systems, but does not yet provide any support for power control. Our extension exposes (i) per-region multi-objective optimization hints and (ii) application-level adaptation parameters, in order to create energy-saving opportunities for the whole system stack. We have implemented OpenMPE support in a compiler and runtime system, and empirically evaluated its performance on two architectures, mobile and desktop. Our results demonstrate the effectiveness of OpenMPE with geometric mean energy savings across 9 use cases of 15 % while maintaining full quality of service.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

DRAM technology faces density and power challenges to increase capacity because of limitations of physical cell design. To overcome these limitations, system designers are exploring alternative solutions that combine DRAM and emerging NVRAM technologies. Previous work on heterogeneous memories focuses, mainly, on two system designs: PCache, a hierarchical, inclusive memory system, and HRank, a flat, non-inclusive memory system. We demonstrate that neither of these designs can universally achieve high performance and energy efficiency across a suite of HPC workloads. In this work, we investigate the impact of a number of multilevel memory designs on the performance, power, and energy consumption of applications. To achieve this goal and overcome the limited number of available tools to study heterogeneous memories, we created HMsim, an infrastructure that enables n-level, heterogeneous memory studies by leveraging existing memory simulators. We, then, propose HpMC, a new memory controller design that combines the best aspects of existing management policies to improve performance and energy. Our energy-aware memory management system dynamically switches between PCache and HRank based on the temporal locality of applications. Our results show that HpMC reduces energy consumption from 13% to 45% compared to PCache and HRank, while providing the same bandwidth and higher capacity than a conventional DRAM system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multiple Table Lookup architectures in Software Defined Networking (SDN) open the door for exciting new network applications. The development of the OpenFlow protocol supported the SDN paradigm. However, the first version of the OpenFlow protocol specified a single table lookup model with the associated constraints in flow entry numbers and search capabilities. With the introduction of multiple table lookup in OpenFlow v1.1, flexible and efficient search to support SDN application innovation became possible. However, implementation of multiple table lookup in hardware to meet high performance requirements is non-trivial. One possible approach involves the use of multi-dimensional lookup algorithms. A high lookup performance can be achieved by using embedded memory for flow entry storage. A detailed study of OpenFlow flow filters for multi-dimensional lookup is presented in this paper. Based on a proposed multiple table lookup architecture, the memory consumption and update performance using parallel single field searches are evaluated. The results demonstrate an efficient multi-table lookup implementation with minimum memory usage.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large integer multiplication is a major performance bottleneck in fully homomorphic encryption (FHE) schemes over the integers. In this paper two optimised multiplier architectures for large integer multiplication are proposed. The first of these is a low-latency hardware architecture of an integer-FFT multiplier. Secondly, the use of low Hamming weight (LHW) parameters is applied to create a novel hardware architecture for large integer multiplication in integer-based FHE schemes. The proposed architectures are implemented, verified and compared on the Xilinx Virtex-7 FPGA platform. Finally, the proposed implementations are employed to evaluate the large multiplication in the encryption step of FHE over the integers. The analysis shows a speed improvement factor of up to 26.2 for the low-latency design compared to the corresponding original integer-based FHE software implementation. When the proposed LHW architecture is combined with the low-latency integer-FFT accelerator to evaluate a single FHE encryption operation, the performance results show that a speed improvement by a factor of approximately 130 is possible.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As a newly invented parallel kinematic machine (PKM), Exechon has attracted intensive attention from both academic and industrial fields due to its conceptual high performance. Nevertheless, the dynamic behaviors of Exechon PKM have not been thoroughly investigated because of its structural and kinematic complexities. To identify the dynamic characteristics of Exechon PKM, an elastodynamic model is proposed with the substructure synthesis technique in this paper. The Exechon PKM is divided into a moving platform subsystem, a fixed base subsystem and three limb subsystems according to its structural features. Differential equations of motion for the limb subsystem are derived through finite element (FE) formulations by modeling the complex limb structure as a spatial beam with corresponding geometric cross sections. Meanwhile, revolute, universal, and spherical joints are simplified into virtual lumped springs associated with equivalent stiffnesses and mass at their geometric centers. Differential equations of motion for the moving platform are derived with Newton's second law after treating the platform as a rigid body due to its comparatively high rigidity. After introducing the deformation compatibility conditions between the platform and the limbs, governing differential equations of motion for Exechon PKM are derived. The solution to characteristic equations leads to natural frequencies and corresponding modal shapes of the PKM at any typical configuration. In order to predict the dynamic behaviors in a quick manner, an algorithm is proposed to numerically compute the distributions of natural frequencies throughout the workspace. Simulation results reveal that the lower natural frequencies are strongly position-dependent and distributed axial-symmetrically due to the structure symmetry of the limbs. At the last stage, a parametric analysis is carried out to identify the effects of structural, dimensional, and stiffness parameters on the system's dynamic characteristics with the purpose of providing useful information for optimal design and performance improvement of the Exechon PKM. The elastodynamic modeling methodology and dynamic analysis procedure can be well extended to other overconstrained PKMs with minor modifications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we investigate the impact of faulty memory bit-cells on the performance of LDPC and Turbo channel decoders based on realistic memory failure models. Our study investigates the inherent error resilience of such codes to potential memory faults affecting the decoding process. We develop two mitigation mechanisms that reduce the impact of memory faults rather than correcting every single error. We show how protection of only few bit-cells is sufficient to deal with high defect rates. In addition, we show how the use of repair-iterations specifically helps mitigating the impact of faults that occur inside the decoder itself.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increasing design complexity associated with modern Field Programmable Gate Array (FPGA) has prompted the emergence of 'soft'-programmable processors which attempt to replace at least part of the custom circuit design problem with a problem of programming parallel processors. Despite substantial advances in this technology, its performance and resource efficiency for computationally complex operations remains in doubt. In this paper we present the first recorded implementation of a softcore Fast-Fourier Transform (FFT) on Xilinx Virtex FPGA technology. By employing a streaming processing architecture, we show how it is possible to achieve architectures which offer 1.1 GSamples/s throughput and up to 19 times speed-up against the Xilinx Radix-2 FFT dedicated circuit with comparable cost.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Routine molecular diagnostics modalities are unable to confidently detect low frequency mutations (<5-15%) that may indicate response to targeted therapies. We confirm the presence of a low frequency NRAS mutation in a rectal cancer patient using massively parallel sequencing when previous Sanger sequencing results proved negative and Q-PCR testing inconclusive. There is increasing evidence that these low frequency mutations may confer resistance to anti-EGFR therapy. In view of negative/inconclusive Sanger sequencing and Q-PCR results for NRAS mutations in a KRAS wt rectal case, the diagnostic biopsy and 4 distinct subpopulations of cells in the resection specimen after conventional chemo/radiotherapy were massively parallel sequenced using the Ion Torrent PGM. DNA was derived from FFPE rectal cancer tissue and amplicons produced using the Cancer Hotspot Panel V2 and sequenced using semiconductor technology. NRAS mutations were observed at varying frequencies in the patient biopsy (12.2%) and all four subpopulations of cells in the resection with an average frequency of 7.3% (lowest 2.6%). The results of the NGS also provided the mutational status of 49 other genes that may have prognostic or predictive value, including KRAS and PIK3CA. NGS technology has been postulated in diagnostics because of its capability to generate results in large panels of clinically meaningful genes in a cost-effective manner. This case illustrates another potential advantage of this technology: its use for detecting low frequency mutations that may influence therapeutic decisions in cancer treatment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the reinsurance market, the risks natural catastrophes pose to portfolios of properties must be quantified, so that they can be priced, and insurance offered. The analysis of such risks at a portfolio level requires a simulation of up to 800 000 trials with an average of 1000 catastrophic events per trial. This is sufficient to capture risk for a global multi-peril reinsurance portfolio covering a range of perils including earthquake, hurricane, tornado, hail, severe thunderstorm, wind storm, storm surge and riverine flooding, and wildfire. Such simulations are both computation and data intensive, making the application of high-performance computing techniques desirable.

In this paper, we explore the design and implementation of portfolio risk analysis on both multi-core and many-core computing platforms. Given a portfolio of property catastrophe insurance treaties, key risk measures, such as probable maximum loss, are computed by taking both primary and secondary uncertainties into account. Primary uncertainty is associated with whether or not an event occurs in a simulated year, while secondary uncertainty captures the uncertainty in the level of loss due to the use of simplified physical models and limitations in the available data. A combination of fast lookup structures, multi-threading and careful hand tuning of numerical operations is required to achieve good performance. Experimental results are reported for multi-core processors and systems using NVIDIA graphics processing unit and Intel Phi many-core accelerators.