909 resultados para Parallel Control Algorithm
Resumo:
We propose a collision-free medium access control (MAC) protocol, which implements static-priority scheduling and works in the presence of hidden nodes. The MAC protocol allows multiple masters and is fully distributed; it is an adaptation to a wireless channel of the dominance protocol used in the CAN bus. But unlike that protocol, our protocol does not require a node having the ability to sense the channel while transmitting to the channel. Our protocol is collision-free even in the presence of hidden nodes and it achieves this without synchronized clocks or out-of-band busy tones. In addition, the protocol is designed to ensure that many non-interfering nodes can transmit in parallel and it functions for both broadcast and unicast transmissions.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Recent integrated circuit technologies have opened the possibility to design parallel architectures with hundreds of cores on a single chip. The design space of these parallel architectures is huge with many architectural options. Exploring the design space gets even more difficult if, beyond performance and area, we also consider extra metrics like performance and area efficiency, where the designer tries to design the architecture with the best performance per chip area and the best sustainable performance. In this paper we present an algorithm-oriented approach to design a many-core architecture. Instead of doing the design space exploration of the many core architecture based on the experimental execution results of a particular benchmark of algorithms, our approach is to make a formal analysis of the algorithms considering the main architectural aspects and to determine how each particular architectural aspect is related to the performance of the architecture when running an algorithm or set of algorithms. The architectural aspects considered include the number of cores, the local memory available in each core, the communication bandwidth between the many-core architecture and the external memory and the memory hierarchy. To exemplify the approach we did a theoretical analysis of a dense matrix multiplication algorithm and determined an equation that relates the number of execution cycles with the architectural parameters. Based on this equation a many-core architecture has been designed. The results obtained indicate that a 100 mm(2) integrated circuit design of the proposed architecture, using a 65 nm technology, is able to achieve 464 GFLOPs (double precision floating-point) for a memory bandwidth of 16 GB/s. This corresponds to a performance efficiency of 71 %. Considering a 45 nm technology, a 100 mm(2) chip attains 833 GFLOPs which corresponds to 84 % of peak performance These figures are better than those obtained by previous many-core architectures, except for the area efficiency which is limited by the lower memory bandwidth considered. The results achieved are also better than those of previous state-of-the-art many-cores architectures designed specifically to achieve high performance for matrix multiplication.
Resumo:
This paper presents a new parallel implementation of a previously hyperspectral coded aperture (HYCA) algorithm for compressive sensing on graphics processing units (GPUs). HYCA method combines the ideas of spectral unmixing and compressive sensing exploiting the high spatial correlation that can be observed in the data and the generally low number of endmembers needed in order to explain the data. The proposed implementation exploits the GPU architecture at low level, thus taking full advantage of the computational power of GPUs using shared memory and coalesced accesses to memory. The proposed algorithm is evaluated not only in terms of reconstruction error but also in terms of computational performance using two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN. Experimental results using real data reveals signficant speedups up with regards to serial implementation.
Resumo:
The application of compressive sensing (CS) to hyperspectral images is an active area of research over the past few years, both in terms of the hardware and the signal processing algorithms. However, CS algorithms can be computationally very expensive due to the extremely large volumes of data collected by imaging spectrometers, a fact that compromises their use in applications under real-time constraints. This paper proposes four efficient implementations of hyperspectral coded aperture (HYCA) for CS, two of them termed P-HYCA and P-HYCA-FAST and two additional implementations for its constrained version (CHYCA), termed P-CHYCA and P-CHYCA-FAST on commodity graphics processing units (GPUs). HYCA algorithm exploits the high correlation existing among the spectral bands of the hyperspectral data sets and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. The proposed P-HYCA and P-CHYCA implementations have been developed using the compute unified device architecture (CUDA) and the cuFFT library. Moreover, this library has been replaced by a fast iterative method in the P-HYCA-FAST and P-CHYCA-FAST implementations that leads to very significant speedup factors in order to achieve real-time requirements. The proposed algorithms are evaluated not only in terms of reconstruction error for different compressions ratios but also in terms of computational performance using two different GPU architectures by NVIDIA: 1) GeForce GTX 590; and 2) GeForce GTX TITAN. Experiments are conducted using both simulated and real data revealing considerable acceleration factors and obtaining good results in the task of compressing remotely sensed hyperspectral data sets.
Resumo:
Dissertation submitted to Faculdade de Ciências e Tecnologia - Universidade Nova de Lisboa in fulfilment of the requirements for the degree of Doctor of Philosophy (Biochemistry - Biotechnology)
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
The aim of this contribution is to extend the techniques of composite materials design to non-linear material behaviour and apply it for design of new materials for passive vibration control. As a first step a computational tool allowing determination of macroscopic optimized one-dimensional isolator behaviour was developed. Voigt, Maxwell, standard and more complex material models can be implemented. Objective function considers minimization of the initial reaction and/or displacement peak as well as minimization of the steady-state amplitude of reaction and/or displacement. The complex stiffness approach is used to formulate the governing equations in an efficient way. Material stiffness parameters are assumed as non-linear functions of the displacement. The numerical solution is performed in the complex space. The steady-state solution in the complex space is obtained by an iterative process based on the shooting method which imposes the conditions of periodicity with respect to the known value of the period. Extension of the shooting method to the complex space is presented and verified. Non-linear behaviour of material parameters is then optimized by generic probabilistic meta-algorithm, simulated annealing. Dependence of the global optimum on several combinations of leading parameters of the simulated annealing procedure, like neighbourhood definition and annealing schedule, is also studied and analyzed. Procedure is programmed in MATLAB environment.
Resumo:
The 30th ACM/SIGAPP Symposium On Applied Computing (SAC 2015). 13 to 17, Apr, 2015, Embedded Systems. Salamanca, Spain.
Resumo:
In this paper we present the operational matrices of the left Caputo fractional derivative, right Caputo fractional derivative and Riemann–Liouville fractional integral for shifted Legendre polynomials. We develop an accurate numerical algorithm to solve the two-sided space–time fractional advection–dispersion equation (FADE) based on a spectral shifted Legendre tau (SLT) method in combination with the derived shifted Legendre operational matrices. The fractional derivatives are described in the Caputo sense. We propose a spectral SLT method, both in temporal and spatial discretizations for the two-sided space–time FADE. This technique reduces the two-sided space–time FADE to a system of algebraic equations that simplifies the problem. Numerical results carried out to confirm the spectral accuracy and efficiency of the proposed algorithm. By selecting relatively few Legendre polynomial degrees, we are able to get very accurate approximations, demonstrating the utility of the new approach over other numerical methods.
Resumo:
Redundant manipulators have some advantages when compared with classical arms because they allow the trajectory optimization, both on the free space and on the presence of abstacles, and the resolution of singularities. For this type of manipulators, several kinetic algorithms adopt generalized inverse matrices. In this line of thought, the generalized inverse control scheme is tested through several experiments that reveal the difficulties that often arise. Motivated by theseproblems this paper presents a new method that ptimizes the manipulability through a least squre polynomialapproximation to determine the joints positions. Moreover, the article studies influence on the dynamics, when controlling redundant and hyper-redundant manipulators. The experiment confirm the superior performance of the proposed algorithm for redundant and hyper-redundant manipulators, revealing several fundamental properties of the chaotic phenomena, and gives a deeper insight towards the future development of superior trajectory control algorithms.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
Face à estagnação da tecnologia uniprocessador registada na passada década, aos principais fabricantes de microprocessadores encontraram na tecnologia multi-core a resposta `as crescentes necessidades de processamento do mercado. Durante anos, os desenvolvedores de software viram as suas aplicações acompanhar os ganhos de performance conferidos por cada nova geração de processadores sequenciais, mas `a medida que a capacidade de processamento escala em função do número de processadores, a computação sequencial tem de ser decomposta em várias partes concorrentes que possam executar em paralelo, para que possam utilizar as unidades de processamento adicionais e completar mais rapidamente. A programação paralela implica um paradigma completamente distinto da programação sequencial. Ao contrário dos computadores sequenciais tipificados no modelo de Von Neumann, a heterogeneidade de arquiteturas paralelas requer modelos de programação paralela que abstraiam os programadores dos detalhes da arquitectura e simplifiquem o desenvolvimento de aplicações concorrentes. Os modelos de programação paralela mais populares incitam os programadores a identificar instruções concorrentes na sua lógica de programação, e a especificá-las sob a forma de tarefas que possam ser atribuídas a processadores distintos para executarem em simultâneo. Estas tarefas são tipicamente lançadas durante a execução, e atribuídas aos processadores pelo motor de execução subjacente. Como os requisitos de processamento costumam ser variáveis, e não são conhecidos a priori, o mapeamento de tarefas para processadores tem de ser determinado dinamicamente, em resposta a alterações imprevisíveis dos requisitos de execução. `A medida que o volume da computação cresce, torna-se cada vez menos viável garantir as suas restrições temporais em plataformas uniprocessador. Enquanto os sistemas de tempo real se começam a adaptar ao paradigma de computação paralela, há uma crescente aposta em integrar execuções de tempo real com aplicações interativas no mesmo hardware, num mundo em que a tecnologia se torna cada vez mais pequena, leve, ubíqua, e portável. Esta integração requer soluções de escalonamento que simultaneamente garantam os requisitos temporais das tarefas de tempo real e mantenham um nível aceitável de QoS para as restantes execuções. Para tal, torna-se imperativo que as aplicações de tempo real paralelizem, de forma a minimizar os seus tempos de resposta e maximizar a utilização dos recursos de processamento. Isto introduz uma nova dimensão ao problema do escalonamento, que tem de responder de forma correcta a novos requisitos de execução imprevisíveis e rapidamente conjeturar o mapeamento de tarefas que melhor beneficie os critérios de performance do sistema. A técnica de escalonamento baseado em servidores permite reservar uma fração da capacidade de processamento para a execução de tarefas de tempo real, e assegurar que os efeitos de latência na sua execução não afectam as reservas estipuladas para outras execuções. No caso de tarefas escalonadas pelo tempo de execução máximo, ou tarefas com tempos de execução variáveis, torna-se provável que a largura de banda estipulada não seja consumida por completo. Para melhorar a utilização do sistema, os algoritmos de partilha de largura de banda (capacity-sharing) doam a capacidade não utilizada para a execução de outras tarefas, mantendo as garantias de isolamento entre servidores. Com eficiência comprovada em termos de espaço, tempo, e comunicação, o mecanismo de work-stealing tem vindo a ganhar popularidade como metodologia para o escalonamento de tarefas com paralelismo dinâmico e irregular. O algoritmo p-CSWS combina escalonamento baseado em servidores com capacity-sharing e work-stealing para cobrir as necessidades de escalonamento dos sistemas abertos de tempo real. Enquanto o escalonamento em servidores permite partilhar os recursos de processamento sem interferências a nível dos atrasos, uma nova política de work-stealing que opera sobre o mecanismo de capacity-sharing aplica uma exploração de paralelismo que melhora os tempos de resposta das aplicações e melhora a utilização do sistema. Esta tese propõe uma implementação do algoritmo p-CSWS para o Linux. Em concordância com a estrutura modular do escalonador do Linux, ´e definida uma nova classe de escalonamento que visa avaliar a aplicabilidade da heurística p-CSWS em circunstâncias reais. Ultrapassados os obstáculos intrínsecos `a programação da kernel do Linux, os extensos testes experimentais provam que o p-CSWS ´e mais do que um conceito teórico atrativo, e que a exploração heurística de paralelismo proposta pelo algoritmo beneficia os tempos de resposta das aplicações de tempo real, bem como a performance e eficiência da plataforma multiprocessador.