37 resultados para implementations


Relevância:

10.00% 10.00%

Publicador:

Resumo:

H.264 is a video codec standard which delivers high resolution video even at low bit rates. To provide high throughput at low bit rates hardware implementations are essential. In this paper, we propose hardware implementations for speed and area optimized DCT and quantizer modules. To target above criteria we propose two architectures. First architecture is speed optimized which gives a high throughput and can meet requirements of 4096x2304 frame at 30 frames/sec. Second architecture is area optimized and occupies 2009 LUTs in Altera’s stratix-II and can meet the requirements of 1080HD at 30 frames/sec.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Software transactional memory (STM) has been proposed as a promising programming paradigm for shared memory multi-threaded programs as an alternative to conventional lock based synchronization primitives. Typical STM implementations employ a conflict detection scheme, which works with uniform access granularity, tracking shared data accesses either at word/cache line or at object level. It is well known that a single fixed access tracking granularity cannot meet the conflicting goals of reducing false conflicts without impacting concurrency adversely. A fine grained granularity while improving concurrency can have an adverse impact on performance due to lock aliasing, lock validation overheads, and additional cache pressure. On the other hand, a coarse grained granularity can impact performance due to reduced concurrency. Thus, in general, a fixed or uniform granularity access tracking (UGAT) scheme is application-unaware and rarely matches the access patterns of individual application or parts of an application, leading to sub-optimal performance for different parts of the application(s). In order to mitigate the disadvantages associated with UGAT scheme, we propose a Variable Granularity Access Tracking (VGAT) scheme in this paper. We propose a compiler based approach wherein the compiler uses inter-procedural whole program static analysis to select the access tracking granularity for different shared data structures of the application based on the application's data access pattern. We describe our prototype VGAT scheme, using TL2 as our STM implementation. Our experimental results reveal that VGAT-STM scheme can improve the application performance of STAMP benchmarks from 1.87% to up to 21.2%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Real-Time services are traditionally supported on circuit switched network. However, there is a need to port these services on packet switched network. Architecture for audio conferencing application over the Internet in the light of ITU-T H.323 recommendations is considered. In a conference, considering packets only from a set of selected clients can reduce speech quality degradation because mixing packets from all clients can lead to lack of speech clarity. A distributed algorithm and architecture for selecting clients for mixing is suggested here based on a new quantifier of the voice activity called “Loudness Number” (LN). The proposed system distributes the computation load and reduces the load on client terminals. The highlights of this architecture are scalability, bandwidth saving and speech quality enhancement. Client selection for playing out tries to mimic a physical conference where the most vocal participants attract more attention. The contributions of the paper are expected to aid H.323 recommendations implementations for Multipoint Processors (MP). A working prototype based on the proposed architecture is already functional.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the design and implementation of a learning controller for the Automatic Generation Control (AGC) in power systems based on a reinforcement learning (RL) framework. In contrast to the recent RL scheme for AGC proposed by us, the present method permits handling of power system variables such as Area Control Error (ACE) and deviations from scheduled frequency and tie-line flows as continuous variables. (In the earlier scheme, these variables have to be quantized into finitely many levels). The optimal control law is arrived at in the RL framework by making use of Q-learning strategy. Since the state variables are continuous, we propose the use of Radial Basis Function (RBF) neural networks to compute the Q-values for a given input state. Since, in this application we cannot provide training data appropriate for the standard supervised learning framework, a reinforcement learning algorithm is employed to train the RBF network. We also employ a novel exploration strategy, based on a Learning Automata algorithm,for generating training samples during Q-learning. The proposed scheme, in addition to being simple to implement, inherits all the attractive features of an RL scheme such as model independent design, flexibility in control objective specification, robustness etc. Two implementations of the proposed approach are presented. Through simulation studies the attractiveness of this approach is demonstrated.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We introduce ToleRace, a runtime system that allows programs to detect and even tolerate asymmetric data races. Asymmetric races are race conditions where one thread correctly acquires and releases a lock for a shared variable while another thread improperly accesses the same variable. ToleRace provides approximate isolation in the critical sections of lock-based parallel programs by creating a local copy of each shared variable when entering a critical section, operating on the local copies, and propagating the appropriate copies upon leaving the critical section. We start by characterizing all possible interleavings that can cause races and precisely describe the effect of ToleRace in each case. Then, we study the theoretical aspects of an oracle that knows exactly what type of interleaving has occurred. Finally, we present software implementations of ToleRace and evaluate them on multithreaded applications from the SPLASH2 and PARSEC suites.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Fast content addressable data access mechanisms have compelling applications in today's systems. Many of these exploit the powerful wildcard matching capabilities provided by ternary content addressable memories. For example, TCAM based implementations of important algorithms in data mining been developed in recent years; these achieve an an order of magnitude speedup over prevalent techniques. However, large hardware TCAMs are still prohibitively expensive in terms of power consumption and cost per bit. This has been a barrier to extending their exploitation beyond niche and special purpose systems. We propose an approach to overcome this barrier by extending the traditional virtual memory hierarchy to scale up the user visible capacity of TCAMs while mitigating the power consumption overhead. By exploiting the notion of content locality (as opposed to spatial locality), we devise a novel combination of software and hardware techniques to provide an abstraction of a large virtual ternary content addressable space. In the long run, such abstractions enable applications to disassociate considerations of spatial locality and contiguity from the way data is referenced. If successful, ideas for making content addressability a first class abstraction in computing systems can open up a radical shift in the way applications are optimized for memory locality, just as storage class memories are soon expected to shift away from the way in which applications are typically optimized for disk access locality.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ab initio GW calculations are a standard method for computing the spectroscopic properties of many materials. The most computationally expensive part in conventional implementations of the method is the generation and summation over the large number of empty orbitals required to converge the electron self-energy. We propose a scheme to reduce the summation over empty states by the use of a modified static remainder approximation, which is simple to implement and yields accurate self-energies for both bulk and molecular systems requiring a small fraction of the typical number of empty orbitals.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The key requirements for enabling real-time remote healthcare service on a mobile platform, in the present day heterogeneous wireless access network environment, are uninterrupted and continuous access to the online patient vital medical data, monitor the physical condition of the patient through video streaming, and so on. For an application, this continuity has to be sufficiently transparent both from a performance perspective as well as a Quality of Experience (QoE) perspective. While mobility protocols (MIPv6, HIP, SCTP, DSMIP, PMIP, and SIP) strive to provide both and do so, limited or non-availability (deployment) of these protocols on provider networks and server side infrastructure has impeded adoption of mobility on end user platforms. Add to this, the cumbersome OS configuration procedures required to enable mobility protocol support on end user devices and the user's enthusiasm to add this support is lost. Considering the lack of proper mobility implementations that meet the remote healthcare requirements above, we propose SeaMo+ that comprises a light-weight application layer framework, termed as the Virtual Real-time Multimedia Service (VRMS) for mobile devices to provide an uninterrupted real-time multimedia information access to the mobile user. VRMS is easy to configure, platform independent, and does not require additional network infrastructure unlike other existing schemes. We illustrate the working of SeaMo+ in two realistic remote patient monitoring application scenarios.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

SARAS is a correlation spectrometer purpose designed for precision measurements of the cosmic radio background and faint features in the sky spectrum at long wavelengths that arise from redshifted 21-cm from gas in the reionization epoch. SARAS operates in the octave band 87.5-175 MHz. We present herein the system design arguing for a complex correlation spectrometer concept. The SARAS design concept provides a differential measurement between the antenna temperature and that of an internal reference termination, with measurements in switched system states allowing for cancellation of additive contaminants from a large part of the signal flow path including the digital spectrometer. A switched noise injection scheme provides absolute spectral calibration. Additionally, we argue for an electrically small frequency-independent antenna over an absorber ground. Various critical design features that aid in avoidance of systematics and in providing calibration products for the parametrization of other unavoidable systematics are described and the rationale discussed. The signal flow and processing is analyzed and the response to noise temperatures of the antenna, reference termination and amplifiers is computed. Multi-path propagation arising from internal reflections are considered in the analysis, which includes a harmonic series of internal reflections. We opine that the SARAS design concept is advantageous for precision measurement of the absolute cosmic radio background spectrum; therefore, the design features and analysis methods presented here are expected to serve as a basis for implementations tailored to measurements of a multiplicity of features in the background sky at long wavelengths, which may arise from events in the dark ages and subsequent reionization era.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Girsanov linearization method (GLM), proposed earlier in Saha, N., and Roy, D., 2007, ``The Girsanov Linearisation Method for Stochastically Driven Nonlinear Oscillators,'' J. Appl. Mech., 74, pp. 885-897, is reformulated to arrive at a nearly exact, semianalytical, weak and explicit scheme for nonlinear mechanical oscillators under additive stochastic excitations. At the heart of the reformulated linearization is a temporally localized rejection sampling strategy that, combined with a resampling scheme, enables selecting from and appropriately modifying an ensemble of locally linearized trajectories while weakly applying the Girsanov correction (the Radon-Nikodym derivative) for the linearization errors. The semianalyticity is due to an explicit linearization of the nonlinear drift terms and it plays a crucial role in keeping the Radon-Nikodym derivative ``nearly bounded'' above by the inverse of the linearization time step (which means that only a subset of linearized trajectories with low, yet finite, probability exceeds this bound). Drift linearization is conveniently accomplished via the first few (lower order) terms in the associated stochastic (Ito) Taylor expansion to exclude (multiple) stochastic integrals from the numerical treatment. Similarly, the Radon-Nikodym derivative, which is a strictly positive, exponential (super-) martingale, is converted to a canonical form and evaluated over each time step without directly computing the stochastic integrals appearing in its argument. Through their numeric implementations for a few low-dimensional nonlinear oscillators, the proposed variants of the scheme, presently referred to as the Girsanov corrected linearization method (GCLM), are shown to exhibit remarkably higher numerical accuracy over a much larger range of the time step size than is possible with the local drift-linearization schemes on their own.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The ability of the continuous wavelet transform (CWT) to provide good time and frequency localization has made it a popular tool in time-frequency analysis of signals. Wavelets exhibit constant-Q property, which is also possessed by the basilar membrane filters in the peripheral auditory system. The basilar membrane filters or auditory filters are often modeled by a Gammatone function, which provides a good approximation to experimentally determined responses. The filterbank derived from these filters is referred to as a Gammatone filterbank. In general, wavelet analysis can be likened to a filterbank analysis and hence the interesting link between standard wavelet analysis and Gammatone filterbank. However, the Gammatone function does not exactly qualify as a wavelet because its time average is not zero. We show how bona fide wavelets can be constructed out of Gammatone functions. We analyze properties such as admissibility, time-bandwidth product, vanishing moments, which are particularly relevant in the context of wavelets. We also show how the proposed auditory wavelets are produced as the impulse response of a linear, shift-invariant system governed by a linear differential equation with constant coefficients. We propose analog circuit implementations of the proposed CWT. We also show how the Gammatone-derived wavelets can be used for singularity detection and time-frequency analysis of transient signals. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a Radix-4(3) based FFT architecture suitable for OFDM based WLAN applications. The radix-4(3) parallel unrolled architecture presented here, uses a radix-4 butterfly unit which takes all four inputs in parallel and can selectively produce one out of the four outputs. A 64 point FFT processor based on the proposed architecture has been implemented in UMC 130nm 1P8M CMOS process with a maximum clock frequency of 100 MHz and area of 0.83mm(2). The proposed processor provides a throughput of four times the clock rate and can finish one 64 point FFT computation in 16 clock cycles. For IEEE 802.11a/g WLAN, the processor needs to be operated at a clock rate of 5 MHz with a power consumption of 2.27 mW which is 27% less than the previously reported low power implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new representation of spatio-temporal random processes is proposed in this work. In practical applications, such processes are used to model velocity fields, temperature distributions, response of vibrating systems, to name a few. Finding an efficient representation for any random process leads to encapsulation of information which makes it more convenient for a practical implementations, for instance, in a computational mechanics problem. For a single-parameter process such as spatial or temporal process, the eigenvalue decomposition of the covariance matrix leads to the well-known Karhunen-Loeve (KL) decomposition. However, for multiparameter processes such as a spatio-temporal process, the covariance function itself can be defined in multiple ways. Here the process is assumed to be measured at a finite set of spatial locations and a finite number of time instants. Then the spatial covariance matrix at different time instants are considered to define the covariance of the process. This set of square, symmetric, positive semi-definite matrices is then represented as a third-order tensor. A suitable decomposition of this tensor can identify the dominant components of the process, and these components are then used to define a closed-form representation of the process. The procedure is analogous to the KL decomposition for a single-parameter process, however, the decompositions and interpretations vary significantly. The tensor decompositions are successfully applied on (i) a heat conduction problem, (ii) a vibration problem, and (iii) a covariance function taken from the literature that was fitted to model a measured wind velocity data. It is observed that the proposed representation provides an efficient approximation to some processes. Furthermore, a comparison with KL decomposition showed that the proposed method is computationally cheaper than the KL, both in terms of computer memory and execution time.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Designing and implementing thread-safe multithreaded libraries can be a daunting task as developers of these libraries need to ensure that their implementations are free from concurrency bugs, including deadlocks. The usual practice involves employing software testing and/or dynamic analysis to detect. deadlocks. Their effectiveness is dependent on well-designed multithreaded test cases. Unsurprisingly, developing multithreaded tests is significantly harder than developing sequential tests for obvious reasons. In this paper, we address the problem of automatically synthesizing multithreaded tests that can induce deadlocks. The key insight to our approach is that a subset of the properties observed when a deadlock manifests in a concurrent execution can also be observed in a single threaded execution. We design a novel, automatic, scalable and directed approach that identifies these properties and synthesizes a deadlock revealing multithreaded test. The input to our approach is the library implementation under consideration and the output is a set of deadlock revealing multithreaded tests. We have implemented our approach as part of a tool, named OMEN1. OMEN is able to synthesize multithreaded tests on many multithreaded Java libraries. Applying a dynamic deadlock detector on the execution of the synthesized tests results in the detection of a number of deadlocks, including 35 real deadlocks in classes documented as thread-safe. Moreover, our experimental results show that dynamic analysis on multithreaded tests that are either synthesized randomly or developed by third-party programmers are ineffective in detecting the deadlocks.