80 resultados para hardware implementation
Resumo:
Numerical Linear Algebra (NLA) kernels are at the heart of all computational problems. These kernels require hardware acceleration for increased throughput. NLA Solvers for dense and sparse matrices differ in the way the matrices are stored and operated upon although they exhibit similar computational properties. While ASIC solutions for NLA Solvers can deliver high performance, they are not scalable, and hence are not commercially viable. In this paper, we show how NLA kernels can be accelerated on REDEFINE, a scalable runtime reconfigurable hardware platform. Compared to a software implementation, Direct Solver (Modified Faddeev's algorithm) on REDEFINE shows a 29X improvement on an average and Iterative Solver (Conjugate Gradient algorithm) shows a 15-20% improvement. We further show that solution on REDEFINE is scalable over larger problem sizes without any notable degradation in performance.
Resumo:
Diffuse optical tomography (DOT) using near-infrared (NIR) light is a promising tool for noninvasive imaging of deep tissue. This technique is capable of quantitative reconstructions of absorption coefficient inhomogeneities of tissue. The motivation for reconstructing the optical property variation is that it, and, in particular, the absorption coefficient variation, can be used to diagnose different metabolic and disease states of tissue. In DOT, like any other medical imaging modality, the aim is to produce a reconstruction with good spatial resolution and accuracy from noisy measurements. We study the performance of a phase array system for detection of optical inhomogeneities in tissue. The light transport through a tissue is diffusive in nature and can be modeled using diffusion equation if the optical parameters of the inhomogeneity are close to the optical properties of the background. The amplitude cancellation method that uses dual out-of-phase sources (phase array) can detect and locate small objects in turbid medium. The inverse problem is solved using model based iterative image reconstruction. Diffusion equation is solved using finite element method for providing the forward model for photon transport. The solution of the forward problem is used for computing the Jacobian and the simultaneous equation is solved using conjugate gradient search. The simulation studies have been carried out and the results show that a phase array system can resolve inhomogeneities with sizes of 5 mm when the absorption coefficient of the inhomogeneity is twice that of the background tissue. To validate this result, a prototype model for performing a dual-source system has been developed. Experiments are carried out by inserting an inhomogeneity of high optical absorption coefficient in an otherwise homogeneous phantom while keeping the scattering coefficient same. The high frequency (100 MHz) modulated dual out-of-phase laser source light is propagated through the phantom. The interference of these sources creates an amplitude null and a phase shift of 180° along a plane between the two sources with a homogeneous object. A solid resin phantom with inhomogeneities simulating the tumor is used in our experiment. The amplitude and phase changes are found to be disturbed by the presence of the inhomogeneity in the object. The experimental data (amplitude and the phase measured at the detector) are used for reconstruction. The results show that the method is able to detect multiple inhomogeneities with sizes of 4 mm. The localization error for a 5 mm inhomogeneity is found to be approximately 1 mm.
Resumo:
The Ulam’s problem is a two person game in which one of the player tries to search, in minimum queries, a number thought by the other player. Classically the problem scales polynomially with the size of the number. The quantum version of the Ulam’s problem has a query complexity that is independent of the dimension of the search space. The experimental implementation of the quantum Ulam’s problem in a Nuclear Magnetic Resonance Information Processor with 3 quantum bits is reported here.
Resumo:
Expanding energy access to the rural population of India presents a critical challenge for its government. The presence of 364 million people without access to electricity and 726 million who rely on biomass for cooking indicate both the failure of past policies and programs, and the need for a radical redesign of the current system. We propose an integrated implementation framework with recommendations for adopting business principles with innovative institutional, regulatory, financing and delivery mechanisms. The framework entails establishment of rural energy access authorities and energy access funds, both at the national and regional levels, to be empowered with enabling regulatory policies, capital resources and the support of multi-stakeholder partnership. These institutions are expected to design, lead, manage and monitor the rural energy interventions. At the other end, trained entrepreneurs would be expected to establish bioenergy-based micro-enterprises that will produce and distribute energy carriers to rural households at an affordable cost. The ESCOs will function as intermediaries between these enterprises and the international carbon market both in aggregating carbon credits and in trading them under CDM. If implemented, such a program could address the challenges of rural energy empowerment by creating access to modern energy carriers and climate change mitigation. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
This paper describes the different types of space vector based bus clamped PWM algorithms for three level inverters. A novel bus clamp PWM algorithm for low modulation indices region is also presented. The principles and switching sequences of all the types of bus clamped algorithms for high switching frequency are presented. Synchronized version of the PWM sequences for high power applications where switching frequency is low is also presented. The implementation details on DSP based digital controller and experimental results are presented. The THD of the output waveforms is studied for the entire operating region and is compared with the conventional space vector PWM technique. The bus clamped techniques can be used to reduce the switching losses or to improve the output voltage quality or both.. Different issues dominate depending on the type of application and power rating of the inverters. The results presented in this paper can be used for judicious use of the PWM techniques, which result in improved system efficiency and performance.
Resumo:
In todays era of energy crisis and global warming, hydrogen has been projected as a sustainable alternative to depleting CO2-emitting fossil fuels. However, its deployment as an energy source is impeded by many issues, one of the most important being storage. Chemical hydrogen storage materials, in particular B?N compounds such as ammonia borane, with a potential storage capacity of 19.6 wt?% H2 and 0.145 kg?H?2?L-1, have been intensively studied from the standpoint of addressing the storage issues. Ammonia borane undergoes dehydrogenation through hydrolysis at room temperature in the presence of a catalyst, but its practical implementation is hindered by several problems affecting all of the chemical compounds in the reaction scheme, including ammonia borane, water, borate byproducts, and hydrogen. In this Minireview, we exhaustively survey the state of the art, discuss the fundamental problems, and, where applicable, propose solutions with the prospect of technological applications.
Resumo:
In large flexible software systems, bloat occurs in many forms, causing excess resource utilization and resource bottlenecks. This results in lost throughput and wasted joules. However, mitigating bloat is not easy; efforts are best applied where savings would be substantial. To aid this we develop an analytical model establishing the relation between bottleneck in resources, bloat, performance and power. Analyses with the model places into perspective results from the first experimental study of the power-performance implications of bloat. In the experiments we find that while bloat reduction can provide as much as 40% energy savings, the degree of impact depends on hardware and software characteristics. We confirm predictions from our model with selected results from our experimental study. Our findings show that a software-only view is inadequate when assessing the effects of bloat. The impact of bloat on physical resource usage and power should be understood for a full systems perspective to properly deploy bloat reduction solutions and reap their power-performance benefits.
Resumo:
Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.
Resumo:
Space vector based PWM strategies for three-level inverters have a broader choice of switching sequences to generate the required reference vector than triangle comparison based PWM techniques. However, space vector based PWM involves numerous steps which are computationally intensive. A simplified algorithm is proposed here, which is shown to reduce the computation time significantly. The developed algorithm is used to implement synchronous and asynchronous conventional space vector PWM, synchronized modified space vector PWM and an asynchronous advanced bus-clamping PWM technique on a low-cost dsPIC digital controller. Experimental results are presented for a comparative evaluation of the performance of different PWM methods.
Resumo:
The linearization of the Drucker-Prager yield criterion associated with an axisymmetric problem has been achieved by simulating a sphere with the truncated icosahedron with 32 faces and 60 vertices. On this basis, a numerical formulation has been proposed for solving an axisymmetric stability problem with the usage of the lower-bound limit analysis, finite elements, and linear optimization. To compare the results, the linearization of the Mohr-Coulomb yield criterion, by replacing the three cones with interior polyhedron, as proposed earlier by Pastor and Turgeman for an axisymmetric problem, has also been implemented. The two formulations have been applied for determining the collapse loads for a circular footing resting on a cohesive-friction material with nonzero unit weight. The computational results are found to be quite convincing. (C) 2013 American Society of Civil Engineers.
Resumo:
We propose energy harvesting technologies and cooperative relaying techniques to power the devices and improve reliability. We propose schemes to (a) maximize the packet reception ratio (PRR) by cooperation and (b) minimize the average packet delay (APD) by cooperation amongst nodes. Our key result and insight from the testbed implementation is about total data transmitted by each relay. A greedy policy that relays more data under a good harvesting condition turns out to be a sub optimal policy. This is because, energy replenishment is a slow process. The optimal scheme offers a low APD and also improves PRR.
Resumo:
Frequency hopping communications, used in the military present significant opportunities for spectrum reuse via the cognitive radio technology. We propose a MAC which incorporates hop instant identification, and supports network discovery and formation, QOS Scheduling and secondary communications. The spectrum sensing algorithm is optimized to deal with the problem of spectral leakage. The algorithms are implemented in a SDR platform based test bed and measurement results are presented.
Resumo:
Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 suite that we used in our work. Current GPUs therefore allow concurrent execution of kernels to improve utilization. In this work, we study concurrent execution of GPU kernels using multiprogram workloads on current NVIDIA Fermi GPUs. On two-program workloads from the Parboil2 benchmark suite we find concurrent execution is often no better than serialized execution. We identify that the lack of control over resource allocation to kernels is a major serialization bottleneck. We propose transformations that convert CUDA kernels into elastic kernels which permit fine-grained control over their resource usage. We then propose several elastic-kernel aware concurrency policies that offer significantly better performance and concurrency compared to the current CUDA policy. We evaluate our proposals on real hardware using multiprogrammed workloads constructed from benchmarks in the Parboil 2 suite. On average, our proposals increase system throughput (STP) by 1.21x and improve the average normalized turnaround time (ANTT) by 3.73x for two-program workloads when compared to the current CUDA concurrency implementation.
Resumo:
In a typical enterprise WLAN, a station has a choice of multiple access points to associate with. The default association policy is based on metrics such as Re-ceived Signal Strength(RSS), and “link quality” to choose a particular access point among many. Such an approach can lead to unequal load sharing and diminished system performance. We consider the RAT (Rate And Throughput) policy [1] which leads to better system performance. The RAT policy has been implemented on home-grown centralized WLAN controller, ADWISER [2] and we demonstrate that the RAT policy indeed provides a better system performance.
Resumo:
We present a nonequilibrium strong-coupling approach to inhomogeneous systems of ultracold atoms in optical lattices. We demonstrate its application to the Mott-insulating phase of a two-dimensional Fermi-Hubbard model in the presence of a trap potential. Since the theory is formulated self-consistently, the numerical implementation relies on a massively parallel evaluation of the self-energy and the Green's function at each lattice site, employing thousands of CPUs. While the computation of the self-energy is straightforward to parallelize, the evaluation of the Green's function requires the inversion of a large sparse 10(d) x 10(d) matrix, with d > 6. As a crucial ingredient, our solution heavily relies on the smallness of the hopping as compared to the interaction strength and yields a widely scalable realization of a rapidly converging iterative algorithm which evaluates all elements of the Green's function. Results are validated by comparing with the homogeneous case via the local-density approximation. These calculations also show that the local-density approximation is valid in nonequilibrium setups without mass transport.