42 resultados para SIMD


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an SIMD machine which has been tuned to execute low-level vision algorithms employing the relaxation labeling paradigm. Novel features of the design include: 1. (1) a communication scheme capable of window accessing under a single instruction. 2. (2) flexible I/O instructions to load overlapped data segments; and 3. (3) data-conditional instructions which can be nested to an arbitrary degree. A time analysis of the stereo correspondence problem, as implemented on a simulated version of the machine using the probabilistic relaxation technique, shows a speed up of almost N2 for an N × N array of PEs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Massively parallel SIMD computing is applied to obtain an order of magnitude improvement in the executional speed of an important algorithm in VLSI design automation. The physical design of a VLSI circuit involves logic module placement as a subtask. The paper is concerned with accelerating the well known Min-cut placement technique for logic cell placement. The inherent parallelism of the Min-cut algorithm is identified, and it is shown that a parallel machine based on the efficient execution of the placement procedure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A programmable vision chip for real-time vision applications is presented. The chip architecture is a combination of a SIMD processing element array and row-parallel processors, which can perform pixel-parallel and row-parallel operations at high speed. It implements the mathematical morphology method to carry out low-level and mid-level image processing and sends out image features for high-level image processing without I/O bottleneck. The chip can perform many algorithms through software control. The simulated maximum frequency of the vision chip is 300 MHz with 16 x 16 pixels resolution. It achieves the rate of 1000 frames per second in real-time vision. A prototype chip with a 16 x 16 PE array is fabricated by the 0.18 mu m standard CMOS process. It has a pixel size of 30 mu m x 40 mu m and 8.72 mW power consumption with a 1.8 V power supply. Experiments including the mathematical morphology method and target tracking application demonstrated that the chip is fully functional and can be applied in real-time vision applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increased diversity of Internet application requirements has spurred recent interests in flexible congestion control mechanisms. Window-based congestion control schemes use increase rules to probe available bandwidth, and decrease rules to back off when congestion is detected. The parameterization of these control rules is done so as to ensure that the resulting protocol is TCP-friendly in terms of the relationship between throughput and packet loss rate. In this paper, we propose a novel window-based congestion control algorithm called SIMD (Square-Increase/Multiplicative-Decrease). Contrary to previous memory-less controls, SIMD utilizes history information in its control rules. It uses multiplicative decrease but the increase in window size is in proportion to the square of the time elapsed since the detection of the last loss event. Thus, SIMD can efficiently probe available bandwidth. Nevertheless, SIMD is TCP-friendly as well as TCP-compatible under RED, and it has much better convergence behavior than TCP-friendly AIMD and binomial algorithms proposed recently.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Schraudolph proposed an excellent exponential approximation providing increased performance particularly suited to the logistic squashing function used within many neural networking applications. This note applies Intel's streaming SIMD Extensions 2 (SSE2), where SIMD is single instruction multiple data, of the Pentum IV class processor to Schraudolph's technique, further increasing the performance of the logistic squashing function. It was found that the calculation of the new 32-bit SSE2 logistic squashing function described here was up to 38 times faster than the conventional exponential function and up to 16 times faster than a Schraudolph-style 32-bit method on an Intel Pentum D 3.6 GHz CPU.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The most promising way to maintain reliable data transfer across the rapidly fluctuating channels used by next generation multiple-input multiple-output communications schemes is to exploit run-time variable modulation and antenna configurations. This demands that the baseband signal processing architectures employed in the communications terminals must provide low cost and high performance with runtime reconfigurability. We present a softcore-processor based solution to this issue, and show for the first time, that such programmable architectures can enable real-time data operation for cutting-edge standards
such as 802.11n; furthermore, by exploiting deep processing pipelines and interleaved task execution, the cost and performance of these architectures is shown to be on a par with traditional dedicated circuit based solutions. We believe this to be the first such programmable architecture to achieve this, and the combination of implementation efficiency and programmability makes this implementation style the most promising approach for hosting such dynamic architectures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To enable reliable data transfer in next generation Multiple-Input Multiple-Output (MIMO) communication systems, terminals must be able to react to fluctuating channel conditions by having flexible modulation schemes and antenna configurations. This creates a challenging real-time implementation problem: to provide the high performance required of cutting edge MIMO standards, such as 802.11n, with the flexibility for this behavioural variability. FPGA softcore processors offer a solution to this problem, and in this paper we show how heterogeneous SISD/SIMD/MIMD architectures can enable programmable multicore architectures on FPGA with similar performance and cost as traditional dedicated circuit-based architectures. When applied to a 4×4 16-QAM Fixed-Complexity Sphere Decoder (FSD) detector we present the first soft-processor based solution for real-time 802.11n MIMO.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to the growth of design size and complexity, design verification is an important aspect of the Logic Circuit development process. The purpose of verification is to validate that the design meets the system requirements and specification. This is done by either functional or formal verification. The most popular approach to functional verification is the use of simulation based techniques. Using models to replicate the behaviour of an actual system is called simulation. In this thesis, a software/data structure architecture without explicit locks is proposed to accelerate logic gate circuit simulation. We call thus system ZSIM. The ZSIM software architecture simulator targets low cost SIMD multi-core machines. Its performance is evaluated on the Intel Xeon Phi and 2 other machines (Intel Xeon and AMD Opteron). The aim of these experiments is to: • Verify that the data structure used allows SIMD acceleration, particularly on machines with gather instructions ( section 5.3.1). • Verify that, on sufficiently large circuits, substantial gains could be made from multicore parallelism ( section 5.3.2 ). • Show that a simulator using this approach out-performs an existing commercial simulator on a standard workstation ( section 5.3.3 ). • Show that the performance on a cheap Xeon Phi card is competitive with results reported elsewhere on much more expensive super-computers ( section 5.3.5 ). To evaluate the ZSIM, two types of test circuits were used: 1. Circuits from the IWLS benchmark suit [1] which allow direct comparison with other published studies of parallel simulators.2. Circuits generated by a parametrised circuit synthesizer. The synthesizer used an algorithm that has been shown to generate circuits that are statistically representative of real logic circuits. The synthesizer allowed testing of a range of very large circuits, larger than the ones for which it was possible to obtain open source files. The experimental results show that with SIMD acceleration and multicore, ZSIM gained a peak parallelisation factor of 300 on Intel Xeon Phi and 11 on Intel Xeon. With only SIMD enabled, ZSIM achieved a maximum parallelistion gain of 10 on Intel Xeon Phi and 4 on Intel Xeon. Furthermore, it was shown that this software architecture simulator running on a SIMD machine is much faster than, and can handle much bigger circuits than a widely used commercial simulator (Xilinx) running on a workstation. The performance achieved by ZSIM was also compared with similar pre-existing work on logic simulation targeting GPUs and supercomputers. It was shown that ZSIM simulator running on a Xeon Phi machine gives comparable simulation performance to the IBM Blue Gene supercomputer at very much lower cost. The experimental results have shown that the Xeon Phi is competitive with simulation on GPUs and allows the handling of much larger circuits than have been reported for GPU simulation. When targeting Xeon Phi architecture, the automatic cache management of the Xeon Phi, handles and manages the on-chip local store without any explicit mention of the local store being made in the architecture of the simulator itself. However, targeting GPUs, explicit cache management in program increases the complexity of the software architecture. Furthermore, one of the strongest points of the ZSIM simulator is its portability. Note that the same code was tested on both AMD and Xeon Phi machines. The same architecture that efficiently performs on Xeon Phi, was ported into a 64 core NUMA AMD Opteron. To conclude, the two main achievements are restated as following: The primary achievement of this work was proving that the ZSIM architecture was faster than previously published logic simulators on low cost platforms. The secondary achievement was the development of a synthetic testing suite that went beyond the scale range that was previously publicly available, based on prior work that showed the synthesis technique is valid.

Relevância:

10.00% 10.00%

Publicador: