140 resultados para hardware deskribapen lengoaiak


Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this work, first a Fortran code is developed for three dimensional linear elastostatics using constant boundary elements; the code is based on a MATLAB code developed by the author earlier. Next, the code is parallelized using BLACS, MPI, and ScaLAPACK. Later, the parallelized code is used to demonstrate the usefulness of the Boundary Element Method (BEM) as applied to the realtime computational simulation of biological organs, while focusing on the speed and accuracy offered by BEM. A computer cluster is used in this part of the work. The commercial software package ANSYS is used to obtain the `exact' solution against which the solution from BEM is compared; analytical solutions, wherever available, are also used to establish the accuracy of BEM. A pig liver is the biological organ considered. Next, instead of the computer cluster, a Graphics Processing Unit (GPU) is used as the parallel hardware. Results indicate that BEM is an interesting choice for the simulation of biological organs. Although the use of BEM for the simulation of biological organs is not new, the results presented in the present study are not found elsewhere in the literature. Also, a serial MATLAB code, and both serial and parallel versions of a Fortran code, which can solve three dimensional (3D) linear elastostatic problems using constant boundary elements, are provided as supplementary files that can be freely downloaded.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The twin demands of energy-efficiency and higher performance on DRAM are highly emphasized in multicore architectures. A variety of schemes have been proposed to address either the latency or the energy consumption of DRAMs. These schemes typically require non-trivial hardware changes and end up improving latency at the cost of energy or vice-versa. One specific DRAM performance problem in multicores is that interleaved accesses from different cores can potentially degrade row-buffer locality. In this paper, based on the temporal and spatial locality characteristics of memory accesses, we propose a reorganization of the existing single large row-buffer in a DRAM bank into multiple sub-row buffers (MSRB). This re-organization not only improves row hit rates, and hence the average memory latency, but also brings down the energy consumed by the DRAM. The first major contribution of this work is proposing such a reorganization without requiring any significant changes to the existing widely accepted DRAM specifications. Our proposed reorganization improves weighted speedup by 35.8%, 14.5% and 21.6% in quad, eight and sixteen core workloads along with a 42%, 28% and 31% reduction in DRAM energy. The proposed MSRB organization enables opportunities for the management of multiple row-buffers at the memory controller level. As the memory controller is aware of the behaviour of individual cores it allows us to implement coordinated buffer allocation schemes for different cores that take into account program behaviour. We demonstrate two such schemes, namely Fairness Oriented Allocation and Performance Oriented Allocation, which show the flexibility that memory controllers can now exploit in our MSRB organization to improve overall performance and/or fairness. Further, the MSRB organization enables additional opportunities for DRAM intra-bank parallelism and selective early precharging of the LRU row-buffer to further improve memory access latencies. These two optimizations together provide an additional 5.9% performance improvement.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the underlay mode of cognitive radio, secondary users can transmit when the primary is transmitting, but under tight interference constraints, which limit the secondary system performance. Antenna selection (AS)-based multiple antenna techniques, which require less hardware and yet exploit spatial diversity, help improve the secondary system performance. In this paper, we develop the optimal transmit AS rule that minimizes the symbol error probability (SEP) of an average interference-constrained secondary system that operates in the underlay mode. We show that the optimal rule is a non-linear function of the power gains of the channels from secondary transmit antenna to primary receiver and secondary transmit antenna to secondary receive antenna. The optimal rule is different from the several ad hoc rules that have been proposed in the literature. We also propose a closed-form, tractable variant of the optimal rule and analyze its SEP. Several results are presented to compare the performance of the closed-form rule with the ad hoc rules, and interesting inter-relationships among them are brought out.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Transmit antenna selection (AS) is a popular, low hardware complexity technique that improves the performance of an underlay cognitive radio system, in which a secondary transmitter can transmit when the primary is on but under tight constraints on the interference it causes to the primary. The underlay interference constraint fundamentally changes the criterion used to select the antenna because the channel gains to the secondary and primary receivers must be both taken into account. We develop a novel and optimal joint AS and transmit power adaptation policy that minimizes a Chernoff upper bound on the symbol error probability (SEP) at the secondary receiver subject to an average transmit power constraint and an average primary interference constraint. Explicit expressions for the optimal antenna and power are provided in terms of the channel gains to the primary and secondary receivers. The SEP of the optimal policy is at least an order of magnitude lower than that achieved by several ad hoc selection rules proposed in the literature and even the optimal antenna selection rule for the case where the transmit power is either zero or a fixed value.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Single receive antenna selection (AS) allows single-input single-output (SISO) systems to retain the diversity benefits of multiple antennas with minimum hardware costs. We propose a single receive AS method for time-varying channels, in which practical limitations imposed by next-generation wireless standards such as training, packetization and antenna switching time are taken into account. The proposed method utilizes low-complexity subspace projection techniques spanned by discrete prolate spheroidal (DPS) sequences. It only uses Doppler bandwidth knowledge, and does not need detailed correlation knowledge. Results show that the proposed AS method outperforms ideal conventional SISO systems with perfect CSI but no AS at the receiver and AS using the conventional Fourier estimation/prediction method. A closed-form expression for the symbol error probability (SEP) of phase-shift keying (MPSK) with symbol-by-symbol receive AS is derived.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Moore's Law has driven the semiconductor revolution enabling over four decades of scaling in frequency, size, complexity, and power. However, the limits of physics are preventing further scaling of speed, forcing a paradigm shift towards multicore computing and parallelization. In effect, the system is taking over the role that the single CPU was playing: high-speed signals running through chips but also packages and boards connect ever more complex systems. High-speed signals making their way through the entire system cause new challenges in the design of computing hardware. Inductance, phase shifts and velocity of light effects, material resonances, and wave behavior become not only prevalent but need to be calculated accurately and rapidly to enable short design cycle times. In essence, to continue scaling with Moore's Law requires the incorporation of Maxwell's equations in the design process. Incorporating Maxwell's equations into the design flow is only possible through the combined power that new algorithms, parallelization and high-speed computing provide. At the same time, incorporation of Maxwell-based models into circuit and system-level simulation presents a massive accuracy, passivity, and scalability challenge. In this tutorial, we navigate through the often confusing terminology and concepts behind field solvers, show how advances in field solvers enable integration into EDA flows, present novel methods for model generation and passivity assurance in large systems, and demonstrate the power of cloud computing in enabling the next generation of scalable Maxwell solvers and the next generation of Moore's Law scaling of systems. We intend to show the truly symbiotic growing relationship between Maxwell and Moore!

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The following paper presents a Powerline Communication (PLC) Method for Single Phase interfaced inverters in domestic microgrids. The PLC method is based on the injection of a repeating sequence of a specific harmonic, which is then modulated on the fundamental component of the grid current supplied by the inverters to the microgrid. The power flow and information exchange are simultaneously accomplished by the grid interacting inverters based on current programmed vector control, hence there is no need for dedicated hardware. Simulation results have been shown for inter-inverter communication under different operating conditions to propose the viability. These simulations have been experimentally validated and the corresponding results have also been presented in the paper.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a fully parallel 64K point radix-4(4) FFT processor. The radix-4(4) parallel unrolled architecture uses a novel radix-4 butterfly unit which takes all four inputs in parallel and can selectively produce one out of the four outputs. The radix-4(4) block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. The resultant 64K point FFT processor shows significant reduction in intermediate memory but with increased hardware complexity. Compared to the state-of-art implementation 5], our architecture shows reduced latency with comparable throughput and area. The 64K point FFT architecture was synthesized using a 130nm CMOS technology which resulted in a throughput of 1.4 GSPS and latency of 47.7 mu s with a maximum clock frequency of 350MHz. When compared to 5], the latency is reduced by 303 mu s with 50.8% reduction in area.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The performance of an underlay cognitive radio (CR) system, which can transmit when the primary is on, is curtailed by tight constraints on the interference it can cause to the primary receiver. Transmit antenna selection (AS) improves the performance of underlay CR by exploiting spatial diversity but with less hardware. However, the selected antenna and its transmit power now both depend on the channel gains to the secondary and primary receivers. We develop a novel Chernoffbound based optimal AS and power adaptation (CBBOASPA) policy that minimizes an upper bound on the symbol error probability (SEP) at the secondary receiver, subject to constraints on the average transmit power and the average interference to the primary. The optimal antenna and its power are presented in an insightful closed form in terms of the channel gains. We then analyze the SEP of CBBOASPA. Extensive benchmarking shows that the SEP of CBBOASPA for both MPSK and MQAM is one to two orders of magnitude lower than several ad hoc AS policies and even optimal AS with on-off power control.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper discusses a novel high-speed approach for human action recognition in H. 264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of our work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can effect in reduced hardware utilization and fast recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust in outdoor as well as indoor testing scenarios. We have tested our method on two benchmark action datasets and achieved more than 85% accuracy. The proposed algorithm classifies actions with speed (>2000 fps) approximately 100 times more than existing state-of-the-art pixel-domain algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Among the intelligent safety technologies for road vehicles, active suspensions controlled by embedded computing elements for preventing rollover have received a lot of attention. The existing models for synthesizing and allocating forces in such suspensions are conservatively based on the constraints that are valid until no wheels lift off the ground. However, the fault tolerance of the rollover-preventive systems can be enhanced if the smart/active suspensions can intervene in the more severe situation in which the wheels have just lifted off the ground. The difficulty in computing control in the last situation is that the vehicle dynamics then passes into the regime that yields a model involving disjunctive constraints on the dynamics. Simulation of dynamics with disjunctive constraints in this context becomes necessary to estimate, synthesize, and allocate the intended hardware realizable forces in an active suspension. In this paper, we give an algorithm for the previously mentioned problem by solving it as a disjunctive dynamic optimization problem. Based on this, we synthesize and allocate the roll-stabilizing time-dependent active suspension forces in terms of sensor output data. We show that the forces obtained from disjunctive dynamics are comparable with existing force allocations and, hence, are possibly realizable in the existing hardware framework toward enhancing the safety and fault tolerance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A simple ball-drop impact tester is developed for studying the dynamic response of hierarchical, complex, small-sized systems and materials. The developed algorithm and set-up have provisions for applying programmable potential difference along the height of a test specimen during an impact loading; this enables us to conduct experiments on various materials and smart structures whose mechanical behavior is sensitive to electric field. The software-hardware system allows not only acquisition of dynamic force-time data at very fast sampling rate (up to 2 x 10(6) samples/s), but also application of a pre-set potential difference (up to +/- 10 V) across a test specimen for a duration determined by feedback from the force-time data. We illustrate the functioning of the set-up by studying the effect of electric field on the energy absorption capability of carbon nanotube foams of 5 x 5 x 1.2 mm(3) size under impact conditions. (C) 2014 AIP Publishing LLC.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spin noise phenomenon was predicted way back in 1946. However, experimental investigations regarding spin noise became possible only recently with major technological improvements in NMR hardware. These experiments have several potential novel applications and also demand refinements in the existing theoretical framework to explain the phenomenon. Elegance of noise spectroscopy in gathering information about the properties of a system lies in the fact that it does not require external perturbation, and the system remains in thermal equilibrium. Spin noise is intrinsic magnetic fluctuations, and both longitudinal and transverse components have been detected independently in many systems. Detection of fluctuating longitudinal magnetization leads to field of Magnetic Resonance Force Microscopy (MRFM) that can efficiently probe very few spins even down to the level of single spin utilizing ultrasensitive cantilevers. Transverse component of spin noise, which can simultaneously monitor different resonances over a given frequency range enabling one to distinguish between different chemical environments, has also received considerable attention, and found many novel applications. These experiments demand a detailed understanding of the underlying spin noise phenomenon in order to perform perturbation-free magnetic resonance and widen the highly promising application area. Detailed investigations of noise magnetization have been performed recently using force microscopy on equilibrium ensemble of paramagnetic alkali atoms. It was observed that random fluctuations generate spontaneous spin coherences which has similar characteristics as generated by macroscopic magnetization of polarized ensemble in terms of precession and relaxation properties. Several other intrinsic properties like g-factors, isotope-abundance ratios, hyperfine splitting, spin coherence lifetimes etc. also have been achieved without having to excite the sample. In contrast to MRFM-approaches, detection of transverse spin noise also offers novel applications, attracting considerable attention. This has unique advantage as different resonances over a given frequency range enable one to distinguish between different chemical environments. Since these noise signatures scale inversely with sample size, these approaches lead to the possibility of non-perturbative magnetic resonance of small systems down to nano-scale. In this review, these different approaches will be highlighted with main emphasis on transverse spin noise investigations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Grid simulators are used to test the control performance of grid-connected inverters under a wide range of grid disturbance conditions. In the present work, a three phase back-to-back connected inverter sharing a common dc bus has been programmed as a grid simulator. Three phase balanced disturbance voltages applied to three-phase balanced loads has been considered in the present work. The developed grid simulator can generate three phase balanced voltage sags, voltage swells, frequency deviations and phase jumps. The grid simulator uses a novel disturbance generation algorithm. The algorithm allows the user to reference the disturbance to any of the three phases at any desired phase angle. Further, the exit of the disturbance condition can be referenced to the desired phase angle of any phase by adjusting the duration of the disturbance. The grid simulator hardware has been tested with different loads – a linear purely resistive load, a non-linear diode-bridge load and a grid-connected inverter load.