71 resultados para Software-reconfigurable array processing architectures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems. © 2014 Technical University of Munich (TUM).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e. g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 x 4,320 at 30 fps) in real time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mestrado em Intervenção Sócio-Organizacional na Saúde - Área de especialização: Políticas de Administração e Gestão de Serviços de Saúde

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A non-coherent vector delay/frequency-locked loop architecture for GNSS receivers is proposed. Two dynamics models are considered: PV (position and velocity) and PVA (position, velocity, and acceleration). In contrast with other vector architectures, the proposed approach does not require the estimation of signals amplitudes. Only coarse estimates of the carrier-to-noise ratios are necessary.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a possible implementation of a compact printed monopole antenna, useful to operate in UMTS and WLAN bands. In order to accomplish that, a miniaturization technique based on the application of chip inductors is used in conjunction with frequency reconfiguration capability. The chip inductors change the impedance response of the monopole, allowing to reduce the resonant frequency. In order to be able to operate the antenna in these two different frequencies, an antenna reconfiguration technique based on PIN diodes is applied. This procedure allows the change of the active form of the antenna leading to a shift in the resonant frequency. The prototype measurements show good agreement with the simulation results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as subworkflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Workflows have been successfully applied to express the decomposition of complex scientific applications. However the existing tools still lack adequate support to important aspects namely, decoupling the enactment engine from tasks specification, decentralizing the control of workflow activities allowing their tasks to run in distributed infrastructures, and supporting dynamic workflow reconfigurations. We present the AWARD (Autonomic Workflow Activities Reconfigurable and Dynamic) model of computation, based on Process Networks, where the workflow activities (AWA) are autonomic processes with independent control that can run in parallel on distributed infrastructures. Each AWA executes a task developed as a Java class with a generic interface allowing end-users to code their applications without low-level details. The data-driven coordination of AWA interactions is based on a shared tuple space that also enables dynamic workflow reconfiguration. For evaluation we describe experimental results of AWARD workflow executions in several application scenarios, mapped to the Amazon (Elastic Computing EC2) Cloud.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dissertação para obtenção do grau de Mestre em Engenharia Eletrotécnica Ramo de Automação e eletrónica Industrial

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e telecomunicações

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trabalho Final de Mestrado para obtenção do grau de Mestrado em Engenharia Electrónica e Telecomunicações

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Mecânica

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mestrado em Ensino da Música.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introdução – A cintigrafia de perfusão do miocárdio (CPM) desempenha um importante papel no diagnóstico, avaliação e seguimento de pacientes com doença arterial coronária, sendo o seu processamento realizado maioritariamente de forma semiautomática. Uma vez que o desempenho dos técnicos de medicina nuclear (TMN) pode ser afetado por fatores individuais e ambientais, diferentes profissionais que processem os mesmos dados poderão obter diferentes estimativas dos parâmetros quantitativos (PQ). Objetivo – Avaliar a influência da experiência profissional e da função visual no processamento semiautomático da CPM. Analisar a variabilidade intra e interoperador na determinação dos PQ funcionais e de perfusão. Metodologia – Selecionou-se uma amostra de 20 TMN divididos em dois grupos, de acordo com a sua experiência no software Quantitative Gated SPECTTM: Grupo A (GA) – TMN ≥600h de experiência e Grupo B (GB) – TMN sem experiência. Submeteram-se os TMN a uma avaliação ortóptica e ao processamento de 21 CPM, cinco vezes, não consecutivas. Considerou-se uma visão alterada quando pelo menos um parâmetro da função visual se encontrava anormal. Para avaliar a repetibilidade e a reprodutibilidade recorreu-se à determinação dos coeficientes de variação, %. Na comparação dos PQ entre operadores, e para a análise do desempenho entre o GA e GB, aplicou-se o Teste de Friedman e de Wilcoxon, respetivamente, considerando o processamento das mesmas CPM. Para a comparação de TMN com visão normal e alterada na determinação dos PQ utilizou-se o Teste Mann-Whitney e para avaliar a influência da visão para cada PQ recorreu-se ao coeficiente de associação ETA. Diferenças estatisticamente significativas foram assumidas ao nível de significância de 5%. Resultados e Discussão – Verificou-se uma reduzida variabilidade intra (<6,59%) e inter (<5,07%) operador. O GB demonstrou ser o mais discrepante na determinação dos PQ, sendo a parede septal (PS) o único PQ que apresentou diferenças estatisticamente significativas (zw=-2,051, p=0,040), em detrimento do GA. No que se refere à influência da função visual foram detetadas diferenças estatisticamente significativas apenas na fração de ejeção do ventrículo esquerdo (FEVE) (U=11,5, p=0,012) entre TMN com visão normal e alterada, contribuindo a visão em 33,99% para a sua variação. Denotaram-se mais diferenças nos PQ obtidos em TMN que apresentam uma maior incidência de sintomatologia ocular e uma visão binocular diminuída. A FEVE demonstrou ser o parâmetro mais consistente entre operadores (1,86%). Conclusão – A CPM apresenta-se como uma técnica repetível e reprodutível, independente do operador. Verificou-se influência da experiência profissional e da função visual no processamento semiautomático da CPM, nos PQ PS e FEVE, respetivamente.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent integrated circuit technologies have opened the possibility to design parallel architectures with hundreds of cores on a single chip. The design space of these parallel architectures is huge with many architectural options. Exploring the design space gets even more difficult if, beyond performance and area, we also consider extra metrics like performance and area efficiency, where the designer tries to design the architecture with the best performance per chip area and the best sustainable performance. In this paper we present an algorithm-oriented approach to design a many-core architecture. Instead of doing the design space exploration of the many core architecture based on the experimental execution results of a particular benchmark of algorithms, our approach is to make a formal analysis of the algorithms considering the main architectural aspects and to determine how each particular architectural aspect is related to the performance of the architecture when running an algorithm or set of algorithms. The architectural aspects considered include the number of cores, the local memory available in each core, the communication bandwidth between the many-core architecture and the external memory and the memory hierarchy. To exemplify the approach we did a theoretical analysis of a dense matrix multiplication algorithm and determined an equation that relates the number of execution cycles with the architectural parameters. Based on this equation a many-core architecture has been designed. The results obtained indicate that a 100 mm(2) integrated circuit design of the proposed architecture, using a 65 nm technology, is able to achieve 464 GFLOPs (double precision floating-point) for a memory bandwidth of 16 GB/s. This corresponds to a performance efficiency of 71 %. Considering a 45 nm technology, a 100 mm(2) chip attains 833 GFLOPs which corresponds to 84 % of peak performance These figures are better than those obtained by previous many-core architectures, except for the area efficiency which is limited by the lower memory bandwidth considered. The results achieved are also better than those of previous state-of-the-art many-cores architectures designed specifically to achieve high performance for matrix multiplication.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An adaptive antenna array combines the signal of each element, using some constraints to produce the radiation pattern of the antenna, while maximizing the performance of the system. Direction of arrival (DOA) algorithms are applied to determine the directions of impinging signals, whereas beamforming techniques are employed to determine the appropriate weights for the array elements, to create the desired pattern. In this paper, a detailed analysis of both categories of algorithms is made, when a planar antenna array is used. Several simulation results show that it is possible to point an antenna array in a desired direction based on the DOA estimation and on the beamforming algorithms. A comparison of the performance in terms of runtime and accuracy of the used algorithms is made. These characteristics are dependent on the SNR of the incoming signal.