63 resultados para design space exploration

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Efficiently exploring exponential-size architectural design spaces with many interacting parameters remains an open problem: the sheer number of experiments required renders detailed simulation intractable.We attack this via an automated approach that builds accurate predictive models. We simulate sampled points, using results to teach our models the function describing relationships among design parameters. The models can be queried and are very fast, enabling efficient design tradeoff discovery. We validate our approach via two uniprocessor sensitivity studies, predicting IPC with only 1–2% error. In an experimental study using the approach, training on 1% of a 250-K-point CMP design space allows our models to predict performance with only 4–5% error. Our predictive modeling combines well with techniques that reduce the time taken by each simulation experiment, achieving net time savings of three-four orders of magnitude.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper summarizes numerous research activities in high-performance networks and network security processing, and explores technology related performance constraints such as critical performance limitations of circuit architectures, which are set by the semiconductor technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bitwidth and data representation. This is the case for the development of complex algorithms such as Low-Density Parity-Check (LDPC) decoders used in modern communication systems. Currently, high-performance computing offers a wide set of acceleration options, that range from multicore CPUs to graphics processing units (GPUs) and FPGAs. Depending on the simulation requirements, the ideal architecture to use can vary. In this paper we propose a new design flow based on OpenCL, a unified multiplatform programming model, which accelerates LDPC decoding simulations, thereby significantly reducing architectural exploration and design time. OpenCL-based parallel kernels are used without modifications or code tuning on multicore CPUs, GPUs and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL for mapping the simulations into FPGAs. To the best of our knowledge, this is the first time that a single, unmodified OpenCL code is used to target those three different platforms. We show that, depending on the design parameters to be explored in the simulation, on the dimension and phase of the design, the GPU or the FPGA may suit different purposes more conveniently, providing different acceleration factors. For example, although simulations can typically execute more than 3x faster on FPGAs than on GPUs, the overhead of circuit synthesis often outweighs the benefits of FPGA-accelerated execution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel cost-effective and low-latency wormhole router for packet-switched NoC designs, tailored for FPGA, is presented. This has been designed to be scalable at system level to fully exploit the characteristics and constraints of FPGA based systems, rather than custom ASIC technology. A key feature is that it achieves a low packet propagation latency of only two cycles per hop including both router pipeline delay and link traversal delay - a significant enhancement over existing FPGA designs - whilst being very competitive in terms of performance and hardware complexity. It can also be configured in various network topologies including 1-D, 2-D, and 3-D. Detailed design-space exploration has been carried for a range of scaling parameters, with the results of various design trade-offs being presented and discussed. By taking advantage of abundant buildin reconfigurable logic and routing resources, we have been able to create a new scalable on-chip FPGA based router that exhibits high dimensionality and connectivity. The architecture proposed can be easily migrated across many FPGA families to provide flexible, robust and cost-effective NoC solutions suitable for the implementation of high-performance FPGA computing systems. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a unique cross-layer design framework that allows systematic exploration of the energy-delay-quality trade-offs at the algorithm, architecture and circuit level of design abstraction for each block of a system. In addition, taking into consideration the interactions between different sub-blocks of a system, it identifies the design solutions that can ensure the least energy at the "right amount of quality" for each sub-block/system under user quality/delay constraints. This is achieved by deriving sensitivity based design criteria, the balancing of which form the quantitative relations that can be used early in the system design process to evaluate the energy efficiency of various design options. The proposed framework when applied to the exploration of energy-quality design space of the main blocks of a digital camera and a wireless receiver, achieves 58% and 33% energy savings under 41% and 20% error increase, respectively. © 2010 ACM.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A methodology which allows a non-specialist to rapidly design silicon wavelet transform cores has been developed. This methodology is based on a generic architecture utilizing time-interleaved coefficients for the wavelet transform filters. The architecture is scaleable and it has been parameterized in terms of wavelet family, wavelet type, data word length and coefficient word length. The control circuit is designed in such a way that the cores can also be cascaded without any interface glue logic for any desired level of decomposition. This parameterization allows the use of any orthonormal wavelet family thereby extending the design space for improved transformation from algorithm to silicon. Case studies for stand alone and cascaded silicon cores for single and multi-stage analysis respectively are reported. The typical design time to produce silicon layout of a wavelet based system has been reduced by an order of magnitude. The cores are comparable in area and performance to hand-crafted designs. The designs have been captured in VHDL so they are portable across a range of foundries and are also applicable to FPGA and PLD implementations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simulation intractable. We attack this problem via an automated approach that builds accurate, confident predictive design-space models. We simulate sampled points, using the results to teach our models the function describing relationships among design parameters. The models produce highly accurate performance estimates for other points in the space, can be queried to predict performance impacts of architectural changes, and are very fast compared to simulation, enabling efficient discovery of tradeoffs among parameters in different regions. We validate our approach via sensitivity studies on memory hierarchy and CPU design spaces: our models generally predict IPC with only 1-2% error and reduce required simulation by two orders of magnitude. We also show the efficacy of our technique for exploring chip multiprocessor (CMP) design spaces: when trained on a 1% sample drawn from a CMP design space with 250K points and up to 55x performance swings among different system configurations, our models predict performance with only 4-5% error on average. Our approach combines with techniques to reduce time per simulation, achieving net time savings of three-four orders of magnitude. Copyright © 2006 ACM.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Low-velocity impact damage can drastically reduce the residual mechanical properties of the composite structure even when there is barely visible impact damage. The ability to computationally predict the extent of damage and compression after impact (CAI) strength of a composite structure can potentially lead to the exploration of a larger design space without incurring significant development time and cost penalties. A three-dimensional damage model, to predict both low-velocity impact damage and compression after impact CAI strength of composite laminates, has been developed and implemented as a user material subroutine in the commercial finite element package, ABAQUS/Explicit. The virtual tests were executed in two steps, one to capture the impact damage and the other to predict the CAI strength. The observed intra-laminar damage features, delamination damage area as well as residual strength are discussed. It is shown that the predicted results for impact damage and CAI strength correlated well with experimental testing.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Low-velocity impact damage can drastically reduce the residual strength of a composite structure even when the damage is barely visible. The ability to computationally predict the extent of damage and compression-after-impact (CAI) strength of a composite structure can potentially lead to the exploration of a larger design space without incurring significant time and cost penalties. A high-fidelity three-dimensional composite damage model, to predict both low-velocity impact damage and CAI strength of composite laminates, has been developed and implemented as a user material subroutine in the commercial finite element package, ABAQUS/Explicit. The intralaminar damage model component accounts for physically-based tensile and compressive failure mechanisms, of the fibres and matrix, when subjected to a three-dimensional stress state. Cohesive behaviour was employed to model the interlaminar failure between plies with a bi-linear traction–separation law for capturing damage onset and subsequent damage evolution. The virtual tests, set up in ABAQUS/Explicit, were executed in three steps, one to capture the impact damage, the second to stabilize the specimen by imposing new boundary conditions required for compression testing, and the third to predict the CAI strength. The observed intralaminar damage features, delamination damage area as well as residual strength are discussed. It is shown that the predicted results for impact damage and CAI strength correlated well with experimental testing without the need of model calibration which is often required with other damage models.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Low-velocity impact damage can drastically reduce the residual mechanical properties of the composite structure even when there is barely visible impact damage. The ability to computationally predict the extent of damage and compression after impact (CAI) strength of a composite structure can potentially lead to the exploration of a larger design space without incurring significant development time and cost penalties. A three-dimensional damage model, to predict both low-velocity impact damage and compression after impact CAI strength of composite laminates, has been developed and implemented as a user material subroutine in the commercial finite element package, ABAQUS/Explicit. The virtual tests were executed in two steps, one to capture the impact damage and the other to predict the CAI strength. The observed intra-laminar damage features, delamination damage area as well as residual strength are discussed. It is shown that the predicted results for impact damage and CAI strength correlated well with experimental testing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dapivirine mucoadhesive gels and freeze-dried tablets were prepared using a 3 x 3 x 2 factorial design. An artificial neural network (ANN) with multi-layer perception was used to investigate the effect of hydroxypropyl-methylcellulose (HPMC): polyvinylpyrrolidone (PVP) ratio (XI), mucoadhesive concentration (X2) and delivery system (gel or freeze-dried mucoadhesive tablet, X3) on response variables; cumulative release of dapivirine at 24 h (Q(24)), mucoadhesive force (F-max) and zero-rate viscosity. Optimisation was performed by minimising the error between the experimental and predicted values of responses by ANN. The method was validated using check point analysis by preparing six formulations of gels and their corresponding freeze-dried tablets randomly selected from within the design space of contour plots. Experimental and predicted values of response variables were not significantly different (p > 0.05, two-sided paired t-test). For gels, Q(24) values were higher than their corresponding freeze-dried tablets. F-max values for freeze-dried tablets were significantly different (2-4 times greater, p > 0.05, two-sided paired t-test) compared to equivalent gets. Freeze-dried tablets having lower values for X1 and higher values for X2 components offered the best compromise between effective dapivirine release, mucoadhesion and viscosity such that increased vaginal residence time was likely to be achieved. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Oscillating wave surge converters (OWSCs) are a class of wave power technology that exploits the enhanced horizontal fluid particle movement of waves in the nearshore coastal zone with water depths of 10–20 m. OWSCs predominantly oscillate horizontally in surge as opposed to the majority of wave devices, which oscillate vertically in heave and usually are deployed in deeper water. The characteristics of the nearshore wave resource are described along with the hydrodynamics of OWSCs. The variables in the OWSC design space are discussed together with a presentation of some of their effects on capture width, frequency bandwidth response and power take-off characteristics. There are notable differences between the different OWSCs under development worldwide, and these are highlighted. The final section of the paper describes Aquamarine Power’s 315kW Oyster 1 prototype, which was deployed at the European Marine Energy Centre in August 2009. Its place in the OWSC design space is described along with the practical experience gained. This has led to the design of Oyster 2, which was deployed in August 2011. It is concluded that nearshore OWSCs are serious contenders in the mix of wave power technologies. The nearshore wave climate has a narrower directional spread than the offshore, the largest waves are filtered out and the exploitable resource is typically only 10–20% less in 10m depth compared with 50m depth. Regarding the devices, a key conclusion is that OWSCs such as Oyster primarily respond in the working frequency range to the horizontal fluid acceleration; Oyster is not a drag device responding to horizontal fluid velocity. The hydrodynamics of Oyster is dominated by inertia with added inertia being a very significant contributor. It is unlikely that individual flap modules will exceed 1MW in installed capacity owing to wave resource, hydrodynamic and economic constraints. Generating stations will be made up of line arrays of flaps with communal secondary power conversion every 5–10 units.