97 resultados para streaming SIMD extensions

em Indian Institute of Science - Bangalore - Índia


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, efficient scheduling algorithms based on Lagrangian relaxation have been proposed for scheduling parallel machine systems and job shops. In this article, we develop real-world extensions to these scheduling methods. In the first part of the paper, we consider the problem of scheduling single operation jobs on parallel identical machines and extend the methodology to handle multiple classes of jobs, taking into account setup times and setup costs, The proposed methodology uses Lagrangian relaxation and simulated annealing in a hybrid framework, In the second part of the paper, we consider a Lagrangian relaxation based method for scheduling job shops and extend it to obtain a scheduling methodology for a real-world flexible manufacturing system with centralized material handling.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The instability of coupled longitudinal and transverse electromagnetic modes associated with long wavelengths is studied in bounded streaming plasmas. The main conclusions are as follows: (i) For long waves for which O (k 2)=0, in the absence of relative streaming motion of electrons and ions and aωp/c<0.66, the whole spectrum of harmonic waves is excited due to finite temperature and boundary effects consisting of two subseries. One of these subseries can be identified with Tonks-Dattner resonance oscillations for the electrons, and arises primarily due to the electrons with frequencies greater than the electrostatic plasma frequency corresponding to the electron density in the midplane in the undisturbed state. The other series arises primarily due to ion motion. When aωp/c>0.66, in addition to the above spectrum of harmonic waves, the system admits an infinite number of growing and decaying waves. The instability associated with these modes is found to arise due to the interaction of the waves inside the plasma with the external electromagnetic field. (ii) For modes with comparatively shorter wavelengths for which O (k3)=0, the coupling due to finite temperature sets in, and it is found that the two series of harmonic waves obtained in (i) deriving energy from the transverse modes also become unstable. Thus, for these wavelengths the system admits three sets of growing and decaying modes, first two for all values of aωp/c and the third for (aωp/c) > 0.66. (iii) The presence of streaming velocities introduces various other coupling mechanisms, and we find that even for the wavelengths for which O (k2)=0, we get three sets of growing and decaying waves. The numerical values for the growth rates show that the streaming velocities enhance the growth rates of instability significantly.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We provide a survey of some of our recent results ([9], [13], [4], [6], [7]) on the analytical performance modeling of IEEE 802.11 wireless local area networks (WLANs). We first present extensions of the decoupling approach of Bianchi ([1]) to the saturation analysis of IEEE 802.11e networks with multiple traffic classes. We have found that even when analysing WLANs with unsaturated nodes the following state dependent service model works well: when a certain set of nodes is nonempty, their channel attempt behaviour is obtained from the corresponding fixed point analysis of the saturated system. We will present our experiences in using this approximation to model multimedia traffic over an IEEE 802.11e network using the enhanced DCF channel access (EDCA) mechanism. We have found that we can model TCP controlled file transfers, VoIP packet telephony, and streaming video in the IEEE802.11e setting by this simple approximation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The max-coloring problem is to compute a legal coloring of the vertices of a graph G = (V, E) with a non-negative weight function w on V such that Sigma(k)(i=1) max(v epsilon Ci) w(v(i)) is minimized, where C-1, ... , C-k are the various color classes. Max-coloring general graphs is as hard as the classical vertex coloring problem, a special case where vertices have unit weight. In fact, in some cases it can even be harder: for example, no polynomial time algorithm is known for max-coloring trees. In this paper we consider the problem of max-coloring paths and its generalization, max-coloring abroad class of trees and show it can be solved in time O(vertical bar V vertical bar+time for sorting the vertex weights). When vertex weights belong to R, we show a matching lower bound of Omega(vertical bar V vertical bar log vertical bar V vertical bar) in the algebraic computation tree model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an SIMD machine which has been tuned to execute low-level vision algorithms employing the relaxation labeling paradigm. Novel features of the design include: 1. (1) a communication scheme capable of window accessing under a single instruction. 2. (2) flexible I/O instructions to load overlapped data segments; and 3. (3) data-conditional instructions which can be nested to an arbitrary degree. A time analysis of the stereo correspondence problem, as implemented on a simulated version of the machine using the probabilistic relaxation technique, shows a speed up of almost N2 for an N × N array of PEs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Massively parallel SIMD computing is applied to obtain an order of magnitude improvement in the executional speed of an important algorithm in VLSI design automation. The physical design of a VLSI circuit involves logic module placement as a subtask. The paper is concerned with accelerating the well known Min-cut placement technique for logic cell placement. The inherent parallelism of the Min-cut algorithm is identified, and it is shown that a parallel machine based on the efficient execution of the placement procedure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Video streaming applications have hitherto been supported by single server systems. A major drawback of such a solution is that it increases the server load. The server restricts the number of clients that can be simultaneously supported due to limitation in bandwidth. The constraints of a single server system can be overcome in video streaming if we exploit the endless resources available in a distributed and networked system. We explore a P2P system for streaming video applications. In this paper we build a P2P streaming video (SVP2P) service in which multiple peers co-operate to serve video segments for new requests, thereby reducing server load and bandwidth used. Our simulation shows the playback latency using SVP2P is roughly 1/4th of the latency incurred when the server directly streams the video. Bandwidth consumed for control messages (overhead) is as low as 1.5% of the total data transfered. The most important observation is that the capacity of the SVP2P grows dynamically.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study the effect of acoustic streaming on nanoparticle motion and morphological evolution inside an acoustically levitated droplet using an analytical approach coupled with experiments. Nanoparticle migration due to internal recirculation forms a density stratification, the location of which depends on initial particle concentration. The time scale of density stratification is similar to that of perikinetic-driven agglomeration of particle flocculation. The density stratification ultimately leads to force imbalance leading to a unique bowl-shaped structure. Our analysis shows the mechanism of bowl formation and how it is affected by particle size, concentration, internal recirculation and fluid viscosity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Streaming applications demand hard bandwidth and throughput guarantees in a multiprocessor environment amidst resource competing processes. We present a Label Switching based Network-on-Chip (LS-NoC) motivated by throughput guarantees offered by bandwidth reservation. Label switching is a packet relaying technique in which individual packets carry route information in the form of labels. A centralized LS-NoC Management framework engineers traffic into Quality of Service (QoS) guaranteed routes. LS-NoC caters to the requirements of streaming applications where communication channels are fixed over the lifetime of the application. The proposed NoC framework inherently supports heterogeneous and ad hoc system-on-chips. The LS-NoC can be used in conjunction with conventional best effort NoC as a QoS guaranteed communication network or as a replacement to the conventional NoC. A multicast, broadcast capable label switched router for the LS-NoC has been designed. A 5 port, 256 bit data bus, 4 bit label router occupies 0.431 mm(2) in 130 nm and delivers peak bandwidth of 80 Gbits/s per link at 312.5 MHz. Bandwidth and latency guarantees of LS-NoC have been demonstrated on traffic from example streaming applications and on constant and variable bit rate traffic patterns. LS-NoC was found to have a competitive AreaxPower/Throughput figure of merit with state-of-the-art NoCs providing QoS. Circuit switching with link sharing abilities and support for asynchronous operation make LS-NoC a desirable choice for QoS servicing in chip multiprocessors. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Given a smooth, projective variety Y over an algebraically closed field of characteristic zero, and a smooth, ample hyperplane section X subset of Y, we study the question of when a bundle E on X, extends to a bundle epsilon on a Zariski open set U subset of Y containing X. The main ingredients used are explicit descriptions of various obstruction classes in the deformation theory of bundles, together with Grothendieck-Lefschetz theory. As a consequence, we prove a Noether-Lefschetz theorem for higher rank bundles, which recovers and unifies the Noether-Lefschetz theorems of Joshi and Ravindra-Srinivas.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The analytic signal (AS) was proposed by Gabor as a complex signal corresponding to a given real signal. The AS has a one-sided spectrum and gives rise to meaningful spectral averages. The Hilbert transform (HT) is a key component in Gabor's AS construction. We generalize the construction methodology by employing the fractional Hilbert transform (FrHT), without going through the standard fractional Fourier transform (FrFT) route. We discuss some properties of the fractional Hilbert operator and show how decomposition of the operator in terms of the identity and the standard Hilbert operators enables the construction of a family of analytic signals. We show that these analytic signals also satisfy Bedrosian-type properties and that their time-frequency localization properties are unaltered. We also propose a generalized-phase AS (GPAS) using a generalized-phase Hilbert transform (GPHT). We show that the GPHT shares many properties of the FrHT, in particular, selective highlighting of singularities, and a connection with Lie groups. We also investigate the duality between analyticity and causality concepts to arrive at a representation of causal signals in terms of the FrHT and GPHT. On the application front, we develop a secure multi-key single-sideband (SSB) modulation scheme and analyze its performance in noise and sensitivity to security key perturbations. (C) 2013 Elsevier B.V. All rights reserved.