42 resultados para SIMD


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel architecture of vision chip for fast traffic lane detection (FTLD). The architecture consists of a 32*32 SIMD processing element (PE) array processor and a dual-core RISC processor. The PE array processor performs low-level pixel-parallel image processing at high speed and outputs image features for high-level image processing without I/O bottleneck. The dual-core processor carries out high-level image processing. A parallel fast lane detection algorithm for this architecture is developed. The FPGA system with a CMOS image sensor is used to implement the architecture. Experiment results show that the system can perform the fast traffic lane detection at 50fps rate. It is much faster than previous works and has good robustness that can operate in various intensity of light. The novel architecture of vision chip is able to meet the demand of real-time lane departure warning system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper investigates the power of genetic algorithms at solving the MAX-CLIQUE problem. We measure the performance of a standard genetic algorithm on an elementary set of problem instances consisting of embedded cliques in random graphs. We indicate the need for improvement, and introduce a new genetic algorithm, the multi-phase annealed GA, which exhibits superior performance on the same problem set. As we scale up the problem size and test on \hard" benchmark instances, we notice a degraded performance in the algorithm caused by premature convergence to local minima. To alleviate this problem, a sequence of modi cations are implemented ranging from changes in input representation to systematic local search. The most recent version, called union GA, incorporates the features of union cross-over, greedy replacement, and diversity enhancement. It shows a marked speed-up in the number of iterations required to find a given solution, as well as some improvement in the clique size found. We discuss issues related to the SIMD implementation of the genetic algorithms on a Thinking Machines CM-5, which was necessitated by the intrinsically high time complexity (O(n3)) of the serial algorithm for computing one iteration. Our preliminary conclusions are: (1) a genetic algorithm needs to be heavily customized to work "well" for the clique problem; (2) a GA is computationally very expensive, and its use is only recommended if it is known to find larger cliques than other algorithms; (3) although our customization e ort is bringing forth continued improvements, there is no clear evidence, at this time, that a GA will have better success in circumventing local minima.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The increased diversity of Internet application requirements has spurred recent interests in transport protocols with flexible transmission controls. In window-based congestion control schemes, increase rules determine how to probe available bandwidth, whereas decrease rules determine how to back off when losses due to congestion are detected. The parameterization of these control rules is done so as to ensure that the resulting protocol is TCP-friendly in terms of the relationship between throughput and loss rate. In this paper, we define a new spectrum of window-based congestion control algorithms that are TCP-friendly as well as TCP-compatible under RED. Contrary to previous memory-less controls, our algorithms utilize history information in their control rules. Our proposed algorithms have two salient features: (1) They enable a wider region of TCP-friendliness, and thus more flexibility in trading off among smoothness, aggressiveness, and responsiveness; and (2) they ensure a faster convergence to fairness under a wide range of system conditions. We demonstrate analytically and through extensive ns simulations the steady-state and transient behaviors of several instances of this new spectrum of algorithms. In particular, SIMD is one instance in which the congestion window is increased super-linearly with time since the detection of the last loss. Compared to recently proposed TCP-friendly AIMD and binomial algorithms, we demonstrate the superiority of SIMD in: (1) adapting to sudden increases in available bandwidth, while maintaining competitive smoothness and responsiveness; and (2) rapidly converging to fairness and efficiency.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Massively parallel networks of highly efficient, high performance Single Instruction Multiple Data (SIMD) processors have been shown to enable FPGA-based implementation of real-time signal processing applications with performance and
cost comparable to dedicated hardware architectures. This is achieved by exploiting simple datapath units with deep processing pipelines. However, these architectures are highly susceptible to pipeline bubbles resulting from data and control hazards; the only way to mitigate against these is manual interleaving of
application tasks on each datapath, since no suitable automated interleaving approach exists. In this paper we describe a new automated integrated mapping/scheduling approach to map algorithm tasks to processors and a new low-complexity list scheduling technique to generate the interleaved schedules. When applied to a spatial Fixed-Complexity Sphere Decoding (FSD) detector
for next-generation Multiple-Input Multiple-Output (MIMO) systems, the resulting schedules achieve real-time performance for IEEE 802.11n systems on a network of 16-way SIMD processors on FPGA, enable better performance/complexity balance than current approaches and produce results comparable to handcrafted implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Aim: To evaluate the influence of socio-economic factors on visual acuity (VA) at presentation in exudative age- related macular degeneration (AMD). Methods: The medical records of all consecutive patients with newly diagnosed exudative AMD examined at the Ophthalmology Departments of Grampian University Hospitals-NHS Trust, Aberdeen, and Gartnavel General Hospital, Glasgow, between July 2004 and June 2005, were reviewed. Demographics, duration of symptoms, VA in study and fellow eye, exudative AMD characteristics, status of fellow eye and patient home address, used to determine the Scottish Index of Multiple Deprivation (SIMD) score, were recorded. The effect of these parameters on VA at presentation was investigated using general linear modelling. Results: Two-hundred and forty patients (median age 79 years) were included in this study; 44 (18.3%) belonged to the lowest 20% SIMD score (most deprived). Age and location and type of the choroidal neovascular- isation were statistically significantly associated with VA at presentation (p = 0.003, p

Relevância:

10.00% 10.00%

Publicador:

Resumo:

OBJECTIVE: To evaluate the influence of socio-economic factors on severity of glaucoma at presentation

METHODS: All newly diagnosed glaucoma patients at the University Hospitals-NHS, Aberdeen and South Glasgow University Hospitals-NHS, in 2006, were included. Glaucoma was severe at presentation if there was a repeatable visual-field loss with a mean deviation index greater than 12 dB in the Humphreys visual fields test or an absolute paracentral scotoma within the central 5 degrees of the visual fields. Home address was used to determine the Scottish Index of Multiple Deprivation (SIMD) rank. The SIMD rank, demographics and severity of glaucoma at presentation were investigated using general linear modelling.

RESULTS: There were 48 patients with severe glaucoma and 74 patients with non-severe glaucoma. In four, the severity could not be determined. Severity of glaucoma at presentation was significantly associated with SIMD rank, being most severe in patients from areas with the lowest ranks (p = 0.026). Age was a significant factor (p = 0.024), with severe glaucoma being more common in elderly patients.

CONCLUSIONS: Age and socio-economic deprivation were associated with severity of glaucoma at presentation, with patients from areas of higher socio-economic deprivation presenting with more advanced glaucoma.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The paper presents IPPro which is a high performance, scalable soft-core processor targeted for image processing applications. It has been based on the Xilinx DSP48E1 architecture using the ZYNQ Field Programmable Gate Array and is a scalar 16-bit RISC processor that operates at 526MHz, giving 526MIPS of performance. Each IPPro core uses 1 DSP48, 1 Block RAM and 330 Kintex-7 slice-registers, thus making the processor as compact as possible whilst maintaining flexibility and programmability. A key aspect of the approach is in reducing the application design time and implementation effort by using multiple IPPro processors in a SIMD mode. For different applications, this allows us to exploit different levels of parallelism and mapping for the specified processing architecture with the supported instruction set. In this context, a Traffic Sign Recognition (TSR) algorithm has been prototyped on a Zedboard with the colour and morphology operations accelerated using multiple IPPros. Simulation and experimental results demonstrate that the processing platform is able to achieve a speedup of 15 to 33 times for colour filtering and morphology operations respectively, with a reduced design effort and time.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

WHIRLBOB, also known as STRIBOBr2, is an AEAD (Authenticated Encryption with Associated Data) algorithm derived from STRIBOBr1 and the Whirlpool hash algorithm. WHIRLBOB/STRIBOBr2 is a second round candidate in the CAESAR competition. As with STRIBOBr1, the reduced-size Sponge design has a strong provable security link with a standardized hash algorithm. The new design utilizes only the LPS or ρ component of Whirlpool in flexibly domain-separated BLNK Sponge mode. The number of rounds is increased from 10 to 12 as a countermeasure against Rebound Distinguishing attacks. The 8 ×8 - bit S-Box used by Whirlpool and WHIRLBOB is constructed from 4 ×4 - bit “MiniBoxes”. We report on fast constant-time Intel SSSE3 and ARM NEON SIMD WHIRLBOB implementations that keep full miniboxes in registers and access them via SIMD shuffles. This is an efficient countermeasure against AES-style cache timing side-channel attacks. Another main advantage of WHIRLBOB over STRIBOBr1 (and most other AEADs) is its greatly reduced implementation footprint on lightweight platforms. On many lower-end microcontrollers the total software footprint of π+BLNK = WHIRLBOB AEAD is less than half a kilobyte. We also report an FPGA implementation that requires 4,946 logic units for a single round of WHIRLBOB, which compares favorably to 7,972 required for Keccak / Keyak on the same target platform. The relatively small S-Box gate count also enables efficient 64-bit bitsliced straight-line implementations. We finally present some discussion and analysis on the relationships between WHIRLBOB, Whirlpool, the Russian GOST Streebog hash, and the recent draft Russian Encryption Standard Kuznyechik.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we present techniques for inverting sparse, symmetric and positive definite matrices on parallel and distributed computers. We propose two algorithms, one for SIMD implementation and the other for MIMD implementation. These algorithms are modified versions of Gaussian elimination and they take into account the sparseness of the matrix. Our algorithms perform better than the general parallel Gaussian elimination algorithm. In order to demonstrate the usefulness of our technique, we implemented the snake problem using our sparse matrix algorithm. Our studies reveal that the proposed sparse matrix inversion algorithm significantly reduces the time taken for obtaining the solution of the snake problem. In this paper, we present the results of our experimental work.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The efficient emulation of a many-core architecture is a challenging task, each core could be emulated through a dedicated thread and such threads would be interleaved on an either single-core or a multi-core processor. The high number of context switches will results in an unacceptable performance. To support this kind of application, the GPU computational power is exploited in order to schedule the emulation threads on the GPU cores. This presents a non trivial divergence issue, since GPU computational power is offered through SIMD processing elements, that are forced to synchronously execute the same instruction on different memory portions. Thus, a new emulation technique is introduced in order to overcome this limitation: instead of providing a routine for each ISA opcode, the emulator mimics the behavior of the Micro Architecture level, here instructions are date that a unique routine takes as input. Our new technique has been implemented and compared with the classic emulation approach, in order to investigate the chance of a hybrid solution.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis develops high performance real-time signal processing modules for direction of arrival (DOA) estimation for localization systems. It proposes highly parallel algorithms for performing subspace decomposition and polynomial rooting, which are otherwise traditionally implemented using sequential algorithms. The proposed algorithms address the emerging need for real-time localization for a wide range of applications. As the antenna array size increases, the complexity of signal processing algorithms increases, making it increasingly difficult to satisfy the real-time constraints. This thesis addresses real-time implementation by proposing parallel algorithms, that maintain considerable improvement over traditional algorithms, especially for systems with larger number of antenna array elements. Singular value decomposition (SVD) and polynomial rooting are two computationally complex steps and act as the bottleneck to achieving real-time performance. The proposed algorithms are suitable for implementation on field programmable gated arrays (FPGAs), single instruction multiple data (SIMD) hardware or application specific integrated chips (ASICs), which offer large number of processing elements that can be exploited for parallel processing. The designs proposed in this thesis are modular, easily expandable and easy to implement. Firstly, this thesis proposes a fast converging SVD algorithm. The proposed method reduces the number of iterations it takes to converge to correct singular values, thus achieving closer to real-time performance. A general algorithm and a modular system design are provided making it easy for designers to replicate and extend the design to larger matrix sizes. Moreover, the method is highly parallel, which can be exploited in various hardware platforms mentioned earlier. A fixed point implementation of proposed SVD algorithm is presented. The FPGA design is pipelined to the maximum extent to increase the maximum achievable frequency of operation. The system was developed with the objective of achieving high throughput. Various modern cores available in FPGAs were used to maximize the performance and details of these modules are presented in detail. Finally, a parallel polynomial rooting technique based on Newton’s method applicable exclusively to root-MUSIC polynomials is proposed. Unique characteristics of root-MUSIC polynomial’s complex dynamics were exploited to derive this polynomial rooting method. The technique exhibits parallelism and converges to the desired root within fixed number of iterations, making this suitable for polynomial rooting of large degree polynomials. We believe this is the first time that complex dynamics of root-MUSIC polynomial were analyzed to propose an algorithm. In all, the thesis addresses two major bottlenecks in a direction of arrival estimation system, by providing simple, high throughput, parallel algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/