893 resultados para Single Graphics Processing Units
Resumo:
The growing number of applications and processing units in modern Multiprocessor Systems-on-Chips (MPSoCs) come along with reduced time to market. Different IP cores can come from different vendors, and their trust levels are also different, but typically they use Network-on-Chip (NoC) as their communication infrastructure. An MPSoC can have multiple Trusted Execution Environments (TEEs). Apart from performance, power, and area research in the field of MPSoC, robust and secure system design is also gaining importance in the research community. To build a secure system, the designer must know beforehand all kinds of attack possibilities for the respective system (MPSoC). In this paper we survey the possible attack scenarios on present-day MPSoCs and investigate a new attack scenario, i.e., router attack targeted toward NoC architecture. We show the validity of this attack by analyzing different present-day NoC architectures and show that they are all vulnerable to this type of attack. By launching a router attack, an attacker can control the whole chip very easily, which makes it a very serious issue. Both routing tables and routing logic-based routers are vulnerable to such attacks. In this paper, we address attacks on routing tables. We propose different monitoring-based countermeasures against routing table-based router attack in an MPSoC having multiple TEEs. Synthesis results show that proposed countermeasures, viz. Runtime-monitor, Restart-monitor, Intermediate manager, and Auditor, occupy areas that are 26.6, 22, 0.2, and 12.2 % of a routing table-based router area. Apart from these, we propose Ejection address checker and Local monitoring module inside a router that cause 3.4 and 10.6 % increase of a router area, respectively. Simulation results are also given, which shows effectiveness of proposed monitoring-based countermeasures.
Resumo:
Rapid reconstruction of multidimensional image is crucial for enabling real-time 3D fluorescence imaging. This becomes a key factor for imaging rapidly occurring events in the cellular environment. To facilitate real-time imaging, we have developed a graphics processing unit (GPU) based real-time maximum a-posteriori (MAP) image reconstruction system. The parallel processing capability of GPU device that consists of a large number of tiny processing cores and the adaptability of image reconstruction algorithm to parallel processing (that employ multiple independent computing modules called threads) results in high temporal resolution. Moreover, the proposed quadratic potential based MAP algorithm effectively deconvolves the images as well as suppresses the noise. The multi-node multi-threaded GPU and the Compute Unified Device Architecture (CUDA) efficiently execute the iterative image reconstruction algorithm that is similar to 200-fold faster (for large dataset) when compared to existing CPU based systems. (C) 2015 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 Unported License.
Resumo:
Coarse Grained Reconfigurable Architectures (CGRA) are emerging as embedded application processing units in computing platforms for Exascale computing. Such CGRAs are distributed memory multi- core compute elements on a chip that communicate over a Network-on-chip (NoC). Numerical Linear Algebra (NLA) kernels are key to several high performance computing applications. In this paper we propose a systematic methodology to obtain the specification of Compute Elements (CE) for such CGRAs. We analyze block Matrix Multiplication and block LU Decomposition algorithms in the context of a CGRA, and obtain theoretical bounds on communication requirements, and memory sizes for a CE. Support for high performance custom computations common to NLA kernels are met through custom function units (CFUs) in the CEs. We present results to justify the merits of such CFUs.
Resumo:
Does language-specific orthography help language detection and lexical access in naturalistic bilingual contexts? This study investigates how L2 orthotactic properties influence bilingual language detection in bilingual societies and the extent to which it modulates lexical access and single word processing. Language specificity of naturalistically learnt L2 words was manipulated by including bigram combinations that could be either L2 language-specific or common in the two languages known by bilinguals. A group of balanced bilinguals and a group of highly proficient but unbalanced bilinguals who grew up in a bilingual society were tested, together with a group of monolinguals (for control purposes). All the participants completed a speeded language detection task and a progressive demasking task. Results showed that the use of the information of orthotactic rules across languages depends on the task demands at hand, and on participants' proficiency in the second language. The influence of language orthotactic rules during language detection, lexical access and word identification are discussed according to the most prominent models of bilingual word recognition.
Resumo:
The novel Si stripixel detector, developed at BNL (Brookhaven National Laboratory), has been applied in the development of a prototype Si strip detector system for the PHENIX Upgrade at RHIC. The Si stripixel detector can generate X-Y two-dimensional (2D) position sensitivity with single-sided processing and readout. Test stripixel detectors with pitches of 85 and 560 mu m have been subjected to the electron beam test in a SEM set-up, and to the laser beam test in a lab test fixture with an X-Y-Z table for laser scanning. Test results have shown that the X and Y strips are well isolated from each other, and 2D position sensitivity has been well demonstrated in the novel stripixel detectors. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
In the fluid simulation, the fluids and their surroundings may greatly change properties such as shape and temperature simultaneously, and different surroundings would characterize different interactions, which would change the shape and motion of the fluids in different ways. On the other hand, interactions among fluid mixtures of different kinds would generate more comprehensive behavior. To investigate the interaction behavior in physically based simulation of fluids, it is of importance to build physically correct models to represent the varying interactions between fluids and the environments, as well as interactions among the mixtures. In this paper, we will make a simple review of the interactions, and focus on those most interesting to us, and model them with various physical solutions. In particular, more detail will be given on the simulation of miscible and immiscible binary mixtures. In some of the methods, it is advantageous to be taken with the graphics processing unit (GPU) to achieve real-time computation for middle-scale simulation.
Resumo:
运动目标跟踪技术是未知环境下移动机器人研究领域的一个重要研究方向。该文提出了一种基于主动视觉和超声信息的移动机器人运动目标跟踪设计方法,利用一台SONY EV-D31彩色摄像机、自主研制的摄像机控制模块、图像采集与处理单元等构建了主动视觉系统。移动机器人采用了基于行为的分布式控制体系结构,利用主动视觉锁定运动目标,通过超声系统感知外部环境信息,能在未知的、动态的、非结构化复杂环境中可靠地跟踪运动目标。实验表明机器人具有较高的鲁棒性,运动目标跟踪系统运行可靠。
Resumo:
This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
Resumo:
In this paper, we propose a multi-camera application capable of processing high resolution images and extracting features based on colors patterns over graphic processing units (GPU). The goal is to work in real time under the uncontrolled environment of a sport event like a football match. Since football players are composed for diverse and complex color patterns, a Gaussian Mixture Models (GMM) is applied as segmentation paradigm, in order to analyze sport live images and video. Optimization techniques have also been applied over the C++ implementation using profiling tools focused on high performance. Time consuming tasks were implemented over NVIDIA's CUDA platform, and later restructured and enhanced, speeding up the whole process significantly. Our resulting code is around 4-11 times faster on a low cost GPU than a highly optimized C++ version on a central processing unit (CPU) over the same data. Real time has been obtained processing until 64 frames per second. An important conclusion derived from our study is the scalability of the application to the number of cores on the GPU. © 2011 Springer-Verlag.
Resumo:
Differential equations are often directly solvable by analytical means only in their one dimensional version. Partial differential equations are generally not solvable by analytical means in two and three dimensions, with the exception of few special cases. In all other cases, numerical approximation methods need to be utilized. One of the most popular methods is the finite element method. The main areas of focus, here, are the Poisson heat equation and the plate bending equation. The purpose of this paper is to provide a quick walkthrough of the various approaches that the authors followed in pursuit of creating optimal solvers, accelerated with the use of graphical processing units, and comparing them in terms of accuracy and time efficiency with existing or self-made non-accelerated solvers.
Resumo:
In the reinsurance market, the risks natural catastrophes pose to portfolios of properties must be quantified, so that they can be priced, and insurance offered. The analysis of such risks at a portfolio level requires a simulation of up to 800 000 trials with an average of 1000 catastrophic events per trial. This is sufficient to capture risk for a global multi-peril reinsurance portfolio covering a range of perils including earthquake, hurricane, tornado, hail, severe thunderstorm, wind storm, storm surge and riverine flooding, and wildfire. Such simulations are both computation and data intensive, making the application of high-performance computing techniques desirable.
In this paper, we explore the design and implementation of portfolio risk analysis on both multi-core and many-core computing platforms. Given a portfolio of property catastrophe insurance treaties, key risk measures, such as probable maximum loss, are computed by taking both primary and secondary uncertainties into account. Primary uncertainty is associated with whether or not an event occurs in a simulated year, while secondary uncertainty captures the uncertainty in the level of loss due to the use of simplified physical models and limitations in the available data. A combination of fast lookup structures, multi-threading and careful hand tuning of numerical operations is required to achieve good performance. Experimental results are reported for multi-core processors and systems using NVIDIA graphics processing unit and Intel Phi many-core accelerators.
Resumo:
A optimização estrutural é uma temática antiga em engenharia. No entanto, com o crescimento do método dos elementos finitos em décadas recentes, dá origem a um crescente número de aplicações. A optimização topológica, especificamente, surge associada a uma fase de definição de domínio efectivo de um processo global de optimização estrutural. Com base neste tipo de optimização, é possível obter a distribuição óptima de material para diversas aplicações e solicitações. Os materiais compósitos e alguns materiais celulares, em particular, encontram-se entre os materiais mais proeminentes dos nossos dias, em termos das suas aplicações e de investigação e desenvolvimento. No entanto, a sua estrutura potencialmente complexa e natureza heterogénea acarretam grandes complexidades, tanto ao nível da previsão das suas propriedades constitutivas quanto na obtenção das distribuições óptimas de constituintes. Procedimentos de homogeneização podem fornecer algumas respostas em ambos os casos. Em particular, a homogeneização por expansão assimptótica pode ser utilizada para determinar propriedades termomecânicas efectivas e globais a partir de volumes representativos, de forma flexível e independente da distribuição de constituintes. Além disso, integra processos de localização e fornece informação detalhada acerca de sensibilidades locais em metodologias de optimização multiescala. A conjugação destas áreas pode conduzir a metodologias de optimização topológica multiescala, nas quais de procede à obtenção não só de estruturas óptimas mas também das distribuições ideais de materiais constituintes. Os problemas associados a estas abordagens tendem, no entanto, a exigir recursos computacionais assinaláveis, criando muitas vezes sérias limitações à exequibilidade da sua resolução. Neste sentido, técnicas de cálculo paralelo e distribuído apresentam-se como uma potencial solução. Ao dividir os problemas por diferentes unidades memória e de processamento, é possível abordar problemas que, de outra forma, seriam proibitivos. O principal foco deste trabalho centra-se na importância do desenvolvimento de procedimentos computacionais para as aplicações referidas. Adicionalmente, estas conduzem a diversas abordagens alternativas na procura simultânea de estruturas e materiais para responder a aplicações termomecânicas. Face ao exposto, tudo isto é integrado numa plataforma computacional de optimização multiobjectivo multiescala em termoelasticidade, desenvolvida e implementada ao longo deste trabalho. Adicionalmente, o trabalho é complementado com a montagem e configuração de um cluster do tipo Beowulf, assim como com o desenvolvimento do código com vista ao cálculo paralelo e distribuído.
Resumo:
A domótica é uma área com grande interesse e margem de exploração, que pretende alcançar a gestão automática e autónoma de recursos habitacionais, proporcionando um maior conforto aos utilizadores. Para além disso, cada vez mais se procuram incluir benefícios económicos e ambientais neste conceito, por forma a garantir um futuro sustentável. O aquecimento de água (por meios elétricos) é um dos fatores que mais contribui para o consumo de energia total de uma residência. Neste enquadramento surge o tema “algoritmos inteligentes de baixa complexidade”, com origem numa parceria entre o Departamento de Eletrónica, Telecomunicações e Informática (DETI) da Universidade de Aveiro e a Bosch Termotecnologia SA, que visa o desenvolvimento de algoritmos ditos “inteligentes”, isto é, com alguma capacidade de aprendizagem e funcionamento autónomo. Os algoritmos devem ser adaptados a unidades de processamento de 8 bits para equipar pequenos aparelhos domésticos, mais propriamente tanques de aquecimento elétrico de água. Uma porção do desafio está, por isso, relacionada com as restrições computacionais de microcontroladores de 8 bits. No caso específico deste trabalho, foi determinada a existência de sensores de temperatura da água no tanque como a única fonte de informação externa aos algoritmos, juntamente com parâmetros pré-definidos pelo utilizador que estabelecem os limiares de temperatura máxima e mínima da água. Partindo deste princípio, os algoritmos desenvolvidos baseiam-se no perfil de consumo de água quente, observado ao longo de cada semana, para tentar prever futuras tiragens de água e, consequentemente, agir de forma adequada, adiantando ou adiando o aquecimento da água do tanque. O objetivo é alcançar uma gestão vantajosa entre a economia de energia e o conforto do utilizador (água quente), isto sem que exista necessidade de intervenção direta por parte do utilizador final. A solução prevista inclui também o desenvolvimento de um simulador que permite observar, avaliar e comparar o desempenho dos algoritmos desenvolvidos.
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
Consumer-electronics systems are becoming increasingly complex as the number of integrated applications is growing. Some of these applications have real-time requirements, while other non-real-time applications only require good average performance. For cost-efficient design, contemporary platforms feature an increasing number of cores that share resources, such as memories and interconnects. However, resource sharing causes contention that must be resolved by a resource arbiter, such as Time-Division Multiplexing. A key challenge is to configure this arbiter to satisfy the bandwidth and latency requirements of the real-time applications, while maximizing the slack capacity to improve performance of their non-real-time counterparts. As this configuration problem is NP-hard, a sophisticated automated configuration method is required to avoid negatively impacting design time. The main contributions of this article are: 1) An optimal approach that takes an existing integer linear programming (ILP) model addressing the problem and wraps it in a branch-and-price framework to improve scalability. 2) A faster heuristic algorithm that typically provides near-optimal solutions. 3) An experimental evaluation that quantitatively compares the branch-and-price approach to the previously formulated ILP model and the proposed heuristic. 4) A case study of an HD video and graphics processing system that demonstrates the practical applicability of the approach.