843 resultados para parallel processing systems
Resumo:
The paper presents IPPro which is a high performance, scalable soft-core processor targeted for image processing applications. It has been based on the Xilinx DSP48E1 architecture using the ZYNQ Field Programmable Gate Array and is a scalar 16-bit RISC processor that operates at 526MHz, giving 526MIPS of performance. Each IPPro core uses 1 DSP48, 1 Block RAM and 330 Kintex-7 slice-registers, thus making the processor as compact as possible whilst maintaining flexibility and programmability. A key aspect of the approach is in reducing the application design time and implementation effort by using multiple IPPro processors in a SIMD mode. For different applications, this allows us to exploit different levels of parallelism and mapping for the specified processing architecture with the supported instruction set. In this context, a Traffic Sign Recognition (TSR) algorithm has been prototyped on a Zedboard with the colour and morphology operations accelerated using multiple IPPros. Simulation and experimental results demonstrate that the processing platform is able to achieve a speedup of 15 to 33 times for colour filtering and morphology operations respectively, with a reduced design effort and time.
Resumo:
In this paper, we propose a system level design approach considering voltage over-scaling (VOS) that achieves error resiliency using unequal error protection of different computation elements, while incurring minor quality degradation. Depending on user specifications and severity of process variations/channel noise, the degree of VOS in each block of the system is adaptively tuned to ensure minimum system power while providing "just-the-right" amount of quality and robustness. This is achieved, by taking into consideration block level interactions and ensuring that under any change of operating conditions, only the "less-crucial" computations, that contribute less to block/system output quality, are affected. The proposed approach applies unequal error protection to various blocks of a system-logic and memory-and spans multiple layers of design hierarchy-algorithm, architecture and circuit. The design methodology when applied to a multimedia subsystem shows large power benefits ( up to 69% improvement in power consumption) at reasonable image quality while tolerating errors introduced due to VOS, process variations, and channel noise.
Resumo:
The Field Programmable Gate Array (FPGA) implementation of the commonly used Histogram of Oriented Gradients (HOG) algorithm is explored. The HOG algorithm is employed to extract features for object detection. A key focus has been to explore the use of a new FPGA-based processor which has been targeted at image processing. The paper gives details of the mapping and scheduling factors that influence the performance and the stages that were undertaken to allow the algorithm to be deployed on FPGA hardware, whilst taking into account the specific IPPro architecture features. We show that multi-core IPPro performance can exceed that of against state-of-the-art FPGA designs by up to 3.2 times with reduced design and implementation effort and increased flexibility all on a low cost, Zynq programmable system.
Resumo:
While virtualisation can provide many benefits to a networks infrastructure, securing the virtualised environment is a big challenge. The security of a fully virtualised solution is dependent on the security of each of its underlying components, such as the hypervisor, guest operating systems and storage.
This paper presents a single security service running on the hypervisor that could potentially work to provide security service to all virtual machines running on the system. This paper presents a hypervisor hosted framework which performs specialised security tasks for all underlying virtual machines to protect against any malicious attacks by passively analysing the network traffic of VMs. This framework has been implemented using Xen Server and has been evaluated by detecting a Zeus Server setup and infected clients, distributed over a number of virtual machines. This framework is capable of detecting and identifying all infected VMs with no false positive or false negative detection.
Resumo:
Fully Homomorphic Encryption (FHE) is a recently developed cryptographic technique which allows computations on encrypted data. There are many interesting applications for this encryption method, especially within cloud computing. However, the computational complexity is such that it is not yet practical for real-time applications. This work proposes optimised hardware architectures of the encryption step of an integer-based FHE scheme with the aim of improving its practicality. A low-area design and a high-speed parallel design are proposed and implemented on a Xilinx Virtex-7 FPGA, targeting the available DSP slices, which offer high-speed multiplication and accumulation. Both use the Comba multiplication scheduling method to manage the large multiplications required with uneven sized multiplicands and to minimise the number of read and write operations to RAM. Results show that speed up factors of 3.6 and 10.4 can be achieved for the encryption step with medium-sized security parameters for the low-area and parallel designs respectively, compared to the benchmark software implementation on an Intel Core2 Duo E8400 platform running at 3 GHz.
Resumo:
Energy in today's short-range wireless communication is mostly spent on the analog- and digital hardware rather than on radiated power. Hence,purely information-theoretic considerations fail to achieve the lowest energy per information bit and the optimization process must carefully consider the overall transceiver. In this paper, we propose to perform cross-layer optimization, based on an energy-aware rate adaptation scheme combined with a physical layer that is able to properly adjust its processing effort to the data rate and the channel conditions to minimize the energy consumption per information bit. This energy proportional behavior is enabled by extending the classical system modes with additional configuration parameters at the various layers. Fine grained models of the power consumption of the hardware are developed to provide awareness of the physical layer capabilities to the medium access control layer. The joint application of the proposed energy-aware rate adaptation and modifications to the physical layer of an IEEE802.11n system, improves energy-efficiency (averaged over many noise and channel realizations) in all considered scenarios by up to 44%.
Resumo:
The increasing scale of Multiple-Input Multiple- Output (MIMO) topologies employed in forthcoming wireless communications standards presents a substantial implementation challenge to designers of embedded baseband signal processing architectures for MIMO transceivers. Specifically the increased scale of such systems has a substantial impact on the perfor- mance/cost balance of detection algorithms for these systems. Whilst in small-scale systems Sphere Decoding (SD) algorithms offer the best quasi-ML performance/cost balance, in larger systems heuristic detectors, such Tabu-Search (TS) detectors are superior. This paper addresses a dearth of research in architectures for TS-based MIMO detection, presenting the first known realisations of TS detectors for 4 × 4 and 10 × 10 MIMO systems. To the best of the authors’ knowledge, these are the largest single-chip detectors on record.
Resumo:
The introduction of parallel processing architectures allowed the real time impelemtation of more sophisticated control algorithms with tighter specifications in terms of sampling time. However, to take advantage of the processing power of these architectures the control engeneer, due to the lack of appropriate tools, must spend a considerable amount of time in the parallelizaton of the control algorithm.
Resumo:
Discrete optimization problems are very difficult to solve, even if the dimantion is small. For most of them the problem of finding an ε-approximate solution is already NP-hard.
Resumo:
The performance demands of modern control and signal processing systems is increasing beyond the capacity of conventional sequential processors, requiring parallel processing solutions to satisfy the real-time requirements.
Resumo:
The Adaptive Generalized Predictive Control (GPC) algorithm can be speeded up using parallel processing. Since the GPC algorithm needs to be fed with knowledge of the plant transfer function, the parallelization of a standard Recursive Least Squares (RLS) estimator and a GPC predictor is discussed here.
Resumo:
Turbo codes experience a significant decoding delay because of the iterative nature of the decoding algorithms, the high number of metric computations and the complexity added by the (de)interleaver. The extrinsic information is exchanged sequentially between two Soft-Input Soft-Output (SISO) decoders. Instead of this sequential process, a received frame can be divided into smaller windows to be processed in parallel. In this paper, a novel parallel processing methodology is proposed based on the previous parallel decoding techniques. A novel Contention-Free (CF) interleaver is proposed as part of the decoding architecture which allows using extrinsic Log-Likelihood Ratios (LLRs) immediately as a-priori LLRs to start the second half of the iterative turbo decoding. The simulation case studies performed in this paper show that our parallel decoding method can provide %80 time saving compared to the standard decoding and %30 time saving compared to the previous parallel decoding methods at the expense of 0.3 dB Bit Error Rate (BER) performance degradation.
Resumo:
Relatório do Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
Database query languages on relations (for example SQL) make it possible to join two relations. This operation is very common in desktop/server database systems but unfortunately query processing systems in networked embedded computer systems currently do not support this operation; specifically, the query processing systems TAG, TinyDB, Cougar do not support this. We show how a prioritized medium access control (MAC) protocol can be used to efficiently execute the database operation join for networked embedded computer systems where all computer nodes are in a single broadcast domain.
Resumo:
La douleur est une expérience perceptive comportant de nombreuses dimensions. Ces dimensions de douleur sont inter-reliées et recrutent des réseaux neuronaux qui traitent les informations correspondantes. L’élucidation de l'architecture fonctionnelle qui supporte les différents aspects perceptifs de l'expérience est donc une étape fondamentale pour notre compréhension du rôle fonctionnel des différentes régions de la matrice cérébrale de la douleur dans les circuits corticaux qui sous tendent l'expérience subjective de la douleur. Parmi les diverses régions du cerveau impliquées dans le traitement de l'information nociceptive, le cortex somatosensoriel primaire et secondaire (S1 et S2) sont les principales régions généralement associées au traitement de l'aspect sensori-discriminatif de la douleur. Toutefois, l'organisation fonctionnelle dans ces régions somato-sensorielles n’est pas complètement claire et relativement peu d'études ont examiné directement l'intégration de l'information entre les régions somatiques sensorielles. Ainsi, plusieurs questions demeurent concernant la relation hiérarchique entre S1 et S2, ainsi que le rôle fonctionnel des connexions inter-hémisphériques des régions somatiques sensorielles homologues. De même, le traitement en série ou en parallèle au sein du système somatosensoriel constitue un autre élément de questionnement qui nécessite un examen plus approfondi. Le but de la présente étude était de tester un certain nombre d'hypothèses sur la causalité dans les interactions fonctionnelle entre S1 et S2, alors que les sujets recevaient des chocs électriques douloureux. Nous avons mis en place une méthode de modélisation de la connectivité, qui utilise une description de causalité de la dynamique du système, afin d'étudier les interactions entre les sites d'activation définie par un ensemble de données provenant d'une étude d'imagerie fonctionnelle. Notre paradigme est constitué de 3 session expérimentales en utilisant des chocs électriques à trois différents niveaux d’intensité, soit modérément douloureux (niveau 3), soit légèrement douloureux (niveau 2), soit complètement non douloureux (niveau 1). Par conséquent, notre paradigme nous a permis d'étudier comment l'intensité du stimulus est codé dans notre réseau d'intérêt, et comment la connectivité des différentes régions est modulée dans les conditions de stimulation différentes. Nos résultats sont en faveur du mode sériel de traitement de l’information somatosensorielle nociceptive avec un apport prédominant de la voie thalamocorticale vers S1 controlatérale au site de stimulation. Nos résultats impliquent que l'information se propage de S1 controlatéral à travers notre réseau d'intérêt composé des cortex S1 bilatéraux et S2. Notre analyse indique que la connexion S1→S2 est renforcée par la douleur, ce qui suggère que S2 est plus élevé dans la hiérarchie du traitement de la douleur que S1, conformément aux conclusions précédentes neurophysiologiques et de magnétoencéphalographie. Enfin, notre analyse fournit des preuves de l'entrée de l'information somatosensorielle dans l'hémisphère controlatéral au côté de stimulation, avec des connexions inter-hémisphériques responsable du transfert de l'information à l'hémisphère ipsilatéral.