82 resultados para Graphics hardware


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Radio-frequency (RF) impairments in the transceiver hardware of communication systems (e.g., phase noise (PN), high power amplifier (HPA) nonlinearities, or in-phase/quadrature-phase (I/Q) imbalance) can severely degrade the performance of traditional multiple-input multiple-output (MIMO) systems. Although calibration algorithms can partially compensate these impairments, the remaining distortion still has substantial impact. Despite this, most prior works have not analyzed this type of distortion. In this paper, we investigate the impact of residual transceiver hardware impairments on the MIMO system performance. In particular, we consider a transceiver impairment model, which has been experimentally validated, and derive analytical ergodic capacity expressions for both exact and high signal-to-noise ratios (SNRs). We demonstrate that the capacity saturates in the high-SNR regime, thereby creating a finite capacity ceiling. We also present a linear approximation for the ergodic capacity in the low-SNR regime, and show that impairments have only a second-order impact on the capacity. Furthermore, we analyze the effect of transceiver impairments on large-scale MIMO systems; interestingly, we prove that if one increases the number of antennas at one side only, the capacity behaves similar to the finite-dimensional case. On the contrary, if the number of antennas on both sides increases with a fixed ratio, the capacity ceiling vanishes; thus, impairments cause only a bounded offset in the capacity compared to the ideal transceiver hardware case.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Massive multiple-input multiple-output (MIMO) systems are cellular networks where the base stations (BSs) are equipped with unconventionally many antennas. Such large antenna arrays offer huge spatial degrees-of-freedom for transmission optimization; in particular, great signal gains, resilience to imperfect channel knowledge, and small inter-user interference are all achievable without extensive inter-cell coordination. The key to cost-efficient deployment of large arrays is the use of hardware-constrained base stations with low-cost antenna elements, as compared to today's expensive and power-hungry BSs. Low-cost transceivers are prone to hardware imperfections, but it has been conjectured that the excessive degrees-of-freedom of massive MIMO would bring robustness to such imperfections. We herein prove this claim for an uplink channel with multiplicative phase-drift, additive distortion noise, and noise amplification. Specifically, we derive a closed-form scaling law that shows how fast the imperfections increase with the number of antennas.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bitwidth and data representation. This is the case for the development of complex algorithms such as Low-Density Parity-Check (LDPC) decoders used in modern communication systems. Currently, high-performance computing offers a wide set of acceleration options, that range from multicore CPUs to graphics processing units (GPUs) and FPGAs. Depending on the simulation requirements, the ideal architecture to use can vary. In this paper we propose a new design flow based on OpenCL, a unified multiplatform programming model, which accelerates LDPC decoding simulations, thereby significantly reducing architectural exploration and design time. OpenCL-based parallel kernels are used without modifications or code tuning on multicore CPUs, GPUs and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL for mapping the simulations into FPGAs. To the best of our knowledge, this is the first time that a single, unmodified OpenCL code is used to target those three different platforms. We show that, depending on the design parameters to be explored in the simulation, on the dimension and phase of the design, the GPU or the FPGA may suit different purposes more conveniently, providing different acceleration factors. For example, although simulations can typically execute more than 3x faster on FPGAs than on GPUs, the overhead of circuit synthesis often outweighs the benefits of FPGA-accelerated execution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The overall aim of the work presented in this paper has been to develop Montgomery modular multiplication architectures suitable for implementation on modern reconfigurable hardware. Accordingly, novel high-radix systolic array Montgomery multiplier designs are presented, as we believe that the inherent regular structure and absence of global interconnect associated with these, make them well-suited for implementation on modern FPGAs. Unlike previous approaches, each processing element (PE) comprises both an adder and a multiplier. The inclusion of a multiplier in the PE means that the need to pre-compute or store any multiples of the operands is avoided. This also allows very high-radix implementations to be realised, further reducing the amount of clock cycles per modular multiplication, while still maintaining a competitive critical delay. For demonstrative purposes, 512-bit and 1024-bit FPGA implementations using radices of 2(8) and 2(16) are presented. The subsequent throughput rates are the fastest reported to date.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a hardware solution for network flow processing at full line rate. Advanced memory architecture using DDR3 SDRAMs is proposed to cope with the flow match limitations in packet throughput, number of supported flows and number of packet header fields (or tuples) supported for flow identifications. The described architecture has been prototyped for accommodating 8 million flows, and tested on an FPGA platform achieving a minimum of 70 million lookups per second. This is sufficient to process internet traffic flows at 40 Gigabit Ethernet.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we investigate the end-to-end performance of dual-hop proactive decode-and-forward relaying networks with Nth best relay selection in the presence of two practical deleterious effects: i) hardware impairment and ii) cochannel interference. In particular, we derive new exact and asymptotic closed-form expressions for the outage probability and average channel capacity of Nth best partial and opportunistic relay selection schemes over Rayleigh fading channels. Insightful discussions are provided. It is shown that, when the system cannot select the best relay for cooperation, the partial relay selection scheme outperforms the opportunistic method under the impact of the same co-channel interference (CCI). In addition, without CCI but under the effect of hardware impairment, it is shown that both selection strategies have the same asymptotic channel capacity. Monte Carlo simulations are presented to corroborate our analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Distributed massive multiple-input multiple-output (MIMO) combines the array gain of coherent MIMO processing with the proximity gains of distributed antenna setups. In this paper, we analyze how transceiver hardware impairments affect the downlink with maximum ratio transmission. We derive closed-form spectral efficiencies expressions and study their asymptotic behavior as the number of the antennas increases. We prove a scaling law on the hardware quality, which reveals that massive MIMO is resilient to additive distortions, while multiplicative phase noise is a limiting factor. It is also better to have separate oscillators at each antenna than one per BS.

Relevância:

20.00% 20.00%

Publicador:

Resumo:


This paper describes how urban agriculture differs from conventional agriculture not only in the way it engages with the technologies of growing, but also in the choice of crop and the way these are brought to market. The authors propose a new model for understanding these new relationships, which is analogous to a systems view of information technology, namely Hardware-Software- Interface.
The first component of the system is hardware. This is the technological component of the agricultural system. Technology is often thought of as equipment, but its linguistic roots are in ‘technis’ which means ‘know how’. Urban agriculture has to engage new technologies, ones that deal with the scale of operation and its context which is different than rural agriculture. Often the scale is very small, and soils are polluted. There this technology in agriculture could be technical such as aquaponic systems, or could be soil-based agriculture such as allotments, window-boxes, or permaculture. The choice of method does not necessarily determine the crop produced or its efficiency. This is linked to the biotic that is added to the hardware, which is seen as the ‘software’.
The software of the system are the ecological parts of the system. These produce the crop which may or may not be determined by the technology used. For example, a hydroponic system could produce a range of crops, or even fish or edible flowers. Software choice can be driven by ideological preferences such as permaculture, where companion planting is used to reduce disease and pests, or by economic factors such as the local market at a particular time of the year. The monetary value of the ‘software’ is determined by the market. Obviously small, locally produced crops are unlikely to compete against intensive products produced globally, however the value locally might be measured in different ways, and might be sold on a different market. This leads to the final part of the analogy - interface.
The interface is the link between the system and the consumer. In traditional agriculture, there is a tenuous link between the producer of asparagus in Peru and the consumer in Europe. In fact very little of the money spent by the consumer ever reaches the grower. Most of the money is spent on refrigeration, transport and profit for agents and supermarket chains. Local or hyper-local agriculture needs to bypass or circumvent these systems, and be connected more directly to the consumer. This is the interface. In hyper-localised systems effectiveness is often more important than efficiency, and direct links between producer and consumer create new economies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The research presented here is product of a practice-based process that primarily generates knowledge through collaboration and exchange in performance situations. This collaboration and exchange with various musicians over a period of five years that constitutes a body of practice that is here reflected upon. The paper focuses on non-instructional graphic scores and presents some insights based on performances of works by the author. We address how composition processes are revealed in graphic scores by looking at the conditions of decision making at the point of preparing a performance. We argue that three key elements are at play in the interpretation of these types of graphic scores: performance practice, mapping and musical form. By reflecting particularly on the work Cipher Series (Rebelo, 2010) we offer insights into the strategies for approaching the performance of graphic scores that go beyond symbolic codification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is demand for an easily programmable, high performance image processing platform based on FPGAs. In previous work, a novel, high performance processor - IPPro was developed and a Histogram of Orientated Gradients (HOG) algorithm study undertaken on a Xilinx Zynq platform. Here, we identify and explore a number of mapping strategies to improve processing efficiency for soft-cores and a number of options for creation of a division coprocessor. This is demonstrated for the revised high definition HOG implementation on a Zynq platform, resulting in a performance of 328 fps which represents a 146% speed improvement over the original realization and a tenfold reduction in energy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

How can GPU acceleration be obtained as a service in a cluster? This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), such that the nodes of a cluster can request the acceleration of a set of remote GPUs on demand. The rCUDA framework exploits virtualisation and ensures that multiple nodes can share the same GPU. In this paper we test the feasibility of the rCUDA framework on a real-world application employed in the financial risk industry that can benefit from AaaS in the production setting. The results confirm the feasibility of rCUDA and highlight that rCUDA achieves similar performance compared to CUDA, provides consistent results, and more importantly, allows for a single application to benefit from all the GPUs available in the cluster without loosing efficiency.