97 resultados para streaming SIMD extensions


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Oscillatory flow in a tube of slowly varying cross section is investigated in the presence of a uniform magnetic field in the axial direction. A perturbation solution including steady streaming is presented. The pressure and shear stress on the wall for various parameters governing the flow are discussed. Physics of Fluids is copyrighted by The American Institute of Physics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The StreamIt programming model has been proposed to exploit parallelism in streaming applications oil general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method to orchestrate the execution of if StreamIt program oil a multicore platform equipped with an accelerator. The proposed approach identifies, using profiling, the relative benefits of executing a task oil the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers and the required buffer layout transformations associated with the partitioning, as all integrated Integer Linear Program (ILP) which can then be solved by an ILP solver. We also propose an efficient heuristic algorithm for the work-partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solution on an average across the benchmark Suite. The partitioned tasks are then software pipelined to execute oil the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with 8 CPU cores and a GeForce 8800 GTS 512 GPU show a geometric mean speedup of 6.94X with it maximum of 51.96X over it single threaded CPU execution across the StreamIt benchmarks. This is a 18.9% improvement over it partitioning strategy that maps only the filters that cannot be executed oil the GPU - the filters with state that is persistent across firings - onto the CPU.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The heat capacity of a substance is related to the structure and constitution of the material and its measurement is a standard technique of physical investigation. In this review, the classical methods are first analyzed briefly and their recent extensions are summarized. The merits and demerits of these methods are pointed out. The newer techniques such as the a.c. method, the relaxation method, the pulse methods, the laser flash calorimetry and other methods developed to extend the heat capacity measurements to newer classes of materials and to extreme conditions of sample geometry, pressure and temperature are comprehensively reviewed. Examples of recent work and details of the experimental systems are provided for each method. The introduction of automation in control systems for the monitoring of the experiments and for data processing is also discussed. Two hundred and eight references and 18 figures are used to illustrate the various techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Plywood manufacture includes two fundamental stages. The first is to peel or separate logs into veneer sheets of different thicknesses. The second is to assemble veneer sheets into finished plywood products. At the first stage a decision must be made as to the number of different veneer thicknesses to be peeled and what these thicknesses should be. At the second stage, choices must be made as to how these veneers will be assembled into final products to meet certain constraints while minimizing wood loss. These decisions present a fundamental management dilemma. Costs of peeling, drying, storage, handling, etc. can be reduced by decreasing the number of veneer thicknesses peeled. However, a reduced set of thickness options may make it infeasible to produce the variety of products demanded by the market or increase wood loss by requiring less efficient selection of thicknesses for assembly. In this paper the joint problem of veneer choice and plywood construction is formulated as a nonlinear integer programming problem. A relatively simple optimal solution procedure is developed that exploits special problem structure. This procedure is examined on data from a British Columbia plywood mill. Restricted to the existing set of veneer thicknesses and plywood designs used by that mill, the procedure generated a solution that reduced wood loss by 79 percent, thereby increasing net revenue by 6.86 percent. Additional experiments were performed that examined the consequences of changing the number of veneer thicknesses used. Extensions are discussed that permit the consideration of more than one wood species.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Database schemes can be viewed as hypergraphs with individual relation schemes corresponding to the edges of a hypergraph. Under this setting, a new class of "acyclic" database schemes was recently introduced and was shown to have a claim to a number of desirable properties. However, unlike the case of ordinary undirected graphs, there are several unequivalent notions of acyclicity of hypergraphs. Of special interest among these are agr-, beta-, and gamma-, degrees of acyclicity, each characterizing an equivalence class of desirable properties for database schemes, represented as hypergraphs. In this paper, two complementary approaches to designing beta-acyclic database schemes have been presented. For the first part, a new notion called "independent cycle" is introduced. Based on this, a criterion for beta-acyclicity is developed and is shown equivalent to the existing definitions of beta-acyclicity. From this and the concept of the dual of a hypergraph, an efficient algorithm for testing beta-acyclicity is developed. As for the second part, a procedure is evolved for top-down generation of beta-acyclic schemes and its correctness is established. Finally, extensions and applications of ideas are described.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data flow computers are high-speed machines in which an instruction is executed as soon as all its operands are available. This paper describes the EXtended MANchester (EXMAN) data flow computer which incorporates three major extensions to the basic Manchester machine. As extensions we provide a multiple matching units scheme, an efficient, implementation of array data structure, and a facility to concurrently execute reentrant routines. A simulator for the EXMAN computer has been coded in the discrete event simulation language, SIMULA 67, on the DEC 1090 system. Performance analysis studies have been conducted on the simulated EXMAN computer to study the effectiveness of the proposed extensions. The performance experiments have been carried out using three sample problems: matrix multiplication, Bresenham's line drawing algorithm, and the polygon scan-conversion algorithm.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper the kinematics of a weak shock front governed by a hyperbolic system of conservation laws is studied. This is used to develop a method for solving problems, involving the propagation of nonlinear unimodal waves. It consists of first solving the nonlinear wave problem by moving along the bicharacteristics of the system and then fitting the shock into this solution field, so that it satisfies the necessary jump conditions. The kinematics of the shock leads in a natural way to the definition of ldquoshock-raysrdquo, which play the same role as the ldquoraysrdquo in a continuous flow. A special case of a circular cylinder introduced suddenly in a constant streaming flow is studied in detail. The shock fitted in the upstream region propagates with a velocity which is the mean of the velocities of the linear and the nonlinear wave fronts. In the downstream the solution is given by an expansion wave.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Using a new embedding technique, short time exact analytical solution of a two-dimensional axisymmetric problem of solidification of a superheated melt in a long cylindrical mold is presented in this paper. The prescribed flux could be space and time dependent. The method of solution is simple and is applicable to a variety of problems and consists of assuming suitable fictitious initial temperatures for some suitable fictitious extensions of the actual regions. The numerical results indicate that even a small solidified thickness can affect the initial temperature of the melt appreciably.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A linear excitation of electromagnetic modes at frequencies (n + ı89 in a plasma through which two electron beams are contra-streaming along the magnetic field is investigated. This may be a source of the observed {cote emissions at auroral latitudes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We extend the modeling heuristic of (Harsha et al. 2006. In IEEE IWQoS 06, pp 178 - 187) to evaluate the performance of an IEEE 802.11e infrastructure network carrying packet telephone calls, streaming video sessions and TCP controlled file downloads, using Enhanced Distributed Channel Access (EDCA). We identify the time boundaries of activities on the channel (called channel slot boundaries) and derive a Markov Renewal Process of the contending nodes on these epochs. This is achieved by the use of attempt probabilities of the contending nodes as those obtained from the saturation fixed point analysis of (Ramaiyan et al. 2005. In Proceedings ACM Sigmetrics, `05. Journal version accepted for publication in IEEE TON). Regenerative analysis on this MRP yields the desired steady state performance measures. We then use the MRP model to develop an effective bandwidth approach for obtaining a bound on the size of the buffer required at the video queue of the AP, such that the streaming video packet loss probability is kept to less than 1%. The results obtained match well with simulations using the network simulator, ns-2. We find that, with the default IEEE 802.11e EDCA parameters for access categories AC 1, AC 2 and AC 3, the voice call capacity decreases if even one streaming video session and one TCP file download are initiated by some wireless station. Subsequently, reducing the voice calls increases the video downlink stream throughput by 0.38 Mbps and file download capacity by 0.14 Mbps, for every voice call (for the 11 Mbps PHY). We find that a buffer size of 75KB is sufficient to ensure that the video packet loss probability at the QAP is within 1%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new framework is proposed in this work to solve multidimensional population balance equations (PBEs) using the method of discretization. A continuous PBE is considered as a statement of evolution of one evolving property of particles and conservation of their n internal attributes. Discretization must therefore preserve n + I properties of particles. Continuously distributed population is represented on discrete fixed pivots as in the fixed pivot technique of Kumar and Ramkrishna [1996a. On the solution of population balance equation by discretization-I A fixed pivot technique. Chemical Engineering Science 51(8), 1311-1332] for 1-d PBEs, but instead of the earlier extensions of this technique proposed in the literature which preserve 2(n) properties of non-pivot particles, the new framework requires n + I properties to be preserved. This opens up the use of triangular and tetrahedral elements to solve 2-d and 3-d PBEs, instead of the rectangles and cuboids that are suggested in the literature. Capabilities of computational fluid dynamics and other packages available for generating complex meshes can also be harnessed. The numerical results obtained indeed show the effectiveness of the new framework. It also brings out the hitherto unknown role of directionality of the grid in controlling the accuracy of the numerical solution of multidimensional PBEs. The numerical results obtained show that the quality of the numerical solution can be improved significantly just by altering the directionality of the grid, which does not require any increase in the number of points, or any refinement of the grid, or even redistribution of pivots in space. Directionality of a grid can be altered simply by regrouping of pivots.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

By using the algebraic locus of the coupler curve of a PRRP planar linkage, in this paper, a kinematic theory is developed for planar, radially foldable closed-loop linkages. This theory helps derive the previously invented building blocks, which consist of only two inter-connected angulated elements, for planar foldable structures. Furthermore, a special case of a circumferentially actuatable foldable linkage (which is different from the previously known cases) is derived from the theory, A quantitative description of some known and some new properties of planar foldable linkages, including the extent of foldability, shape-preservation of the interior polygons, multi-segmented assemblies and heterogeneous circumferential arrangemants, is also presented. The design equations derived here make the conception of even complex planar radially foldable linkages systematic and straightforward. Representative examples are presented to illustrate the usage of the design equations and the construction of prototypes. The current limitations and some possible extensions of the theory are also noted. (c) 2007, Elsevier Ltd. All ri-hts reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Micromachined antennas are recieving great interest as carrier frequencies move higher into the frequency spectrum due to their superior performance and amenability for integration with active devices. However their design is cumbersome owing to the complexity of the structure. To overcome this, in this paper, an iterative procedure is suggested to facilitate fast design of micromachined patch antennas based on a simulation study. A microstrip line on a micromachined Silicon substrate is simulated in a full wave simulator by solving for the ports only. From the obtained propagation constant, the effective dilectric constant for the micromachined substrate is estimated. The process is repeated for a number of values of the width of the microstrip and a plot is made for the variation of the effective dielectric constant with the microstrip width. Then an iterative method in combination with the extrapolated permittivity which includes the effect of cavity extensions in all the directions, is used to obtain the width and the corresponding effective dielectric constant. This method has been verified to be quite accurate by comparison with full wave simulations and hence it can function as a good starting point for designers to design micromachined antennas.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.