947 resultados para Algorithmic skeleton
Resumo:
Dynamic Voltage and Frequency Scaling (DVFS) exhibits fundamental limitations as a method to reduce energy consumption in computing systems. In the HPC domain, where performance is of highest priority and codes are heavily optimized to minimize idle time, DVFS has limited opportunity to achieve substantial energy savings. This paper explores if operating processors Near the transistor Threshold Volt- age (NTV) is a better alternative to DVFS for break- ing the power wall in HPC. NTV presents challenges, since it compromises both performance and reliability to reduce power consumption. We present a first of its kind study of a significance-driven execution paradigm that selectively uses NTV and algorithmic error tolerance to reduce energy consumption in performance- constrained HPC environments. Using an iterative algorithm as a use case, we present an adaptive execution scheme that switches between near-threshold execution on many cores and above-threshold execution on one core, as the computational significance of iterations in the algorithm evolves over time. Using this scheme on state-of-the-art hardware, we demonstrate energy savings ranging between 35% to 67%, while compromising neither correctness nor performance.
Resumo:
We introduce and address the problem of concurrent autonomic management of different non-functional concerns in parallel applications build as a hierarchical composition of behavioural skeletons. We first define the problems arising when multiple concerns are dealt with by independent managers, then we propose a methodology supporting coordinated management, and finally we discuss how autonomic management of multiple concerns may be implemented in a typical use case. Being based on the behavioural skeleton concept proposed in the CoreGRID GCM, it is anticipated that the methodology will be readily integrated into the current reference implementation of GCM based on Java Pro Active and running on top of major grid middleware systems.
Resumo:
In this paper, we have developed a low-complexity algorithm for epileptic seizure detection with a high degree of accuracy. The algorithm has been designed to be feasibly implementable as battery-powered low-power implantable epileptic seizure detection system or epilepsy prosthesis. This is achieved by utilizing design optimization techniques at different levels of abstraction. Particularly, user-specific critical parameters are identified at the algorithmic level and are explicitly used along with multiplier-less implementations at the architecture level. The system has been tested on neural data obtained from in-vivo animal recordings and has been implemented in 90nm bulk-Si technology. The results show up to 90 % savings in power as compared to prevalent wavelet based seizure detection technique while achieving 97% average detection rate. Copyright 2010 ACM.
Resumo:
In this paper we present a design methodology for algorithm/architecture co-design of a voltage-scalable, process variation aware motion estimator based on significance driven computation. The fundamental premise of our approach lies in the fact that all computations are not equally significant in shaping the output response of video systems. We use a statistical technique to intelligently identify these significant/not-so-significant computations at the algorithmic level and subsequently change the underlying architecture such that the significant computations are computed in an error free manner under voltage over-scaling. Furthermore, our design includes an adaptive quality compensation (AQC) block which "tunes" the algorithm and architecture depending on the magnitude of voltage over-scaling and severity of process variations. Simulation results show average power savings of similar to 33% for the proposed architecture when compared to conventional implementation in the 90 nm CMOS technology. The maximum output quality loss in terms of Peak Signal to Noise Ratio (PSNR) was similar to 1 dB without incurring any throughput penalty.
Resumo:
In this paper, a low complexity system for spectral analysis of heart rate variability (HRV) is presented. The main idea of the proposed approach is the implementation of the Fast-Lomb periodogram that is a ubiquitous tool in spectral analysis, using a wavelet based Fast Fourier transform. Interestingly we show that the proposed approach enables the classification of processed data into more and less significant based on their contribution to output quality. Based on such a classification a percentage of less-significant data is being pruned leading to a significant reduction of algorithmic complexity with minimal quality degradation. Indeed, our results indicate that the proposed system can achieve up-to 45% reduction in number of computations with only 4.9% average error in the output quality compared to a conventional FFT based HRV system.
Resumo:
This paper presents a new programming methodology for introducing and tuning parallelism in Erlang programs, using source-level code refactoring from sequential source programs to parallel programs written using our skeleton library, Skel. High-level cost models allow us to predict with reasonable accuracy the parallel performance of the refactored program, enabling programmers to make informed decisions about which refactorings to apply. Using our approach, we demonstrate easily obtainable, significant and scalable speedups of up to 21 on a 24-core machine over the sequential code.
Resumo:
We propose a methodology for optimizing the execution of data parallel (sub-)tasks on CPU and GPU cores of the same heterogeneous architecture. The methodology is based on two main components: i) an analytical performance model for scheduling tasks among CPU and GPU cores, such that the global execution time of the overall data parallel pattern is optimized; and ii) an autonomic module which uses the analytical performance model to implement the data parallel computations in a completely autonomic way, requiring no programmer intervention to optimize the computation across CPU and GPU cores. The analytical performance model uses a small set of simple parameters to devise a partitioning-between CPU and GPU cores-of the tasks derived from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It computes the percentage of tasks to be executed on CPU and GPU cores such that both kinds of cores are exploited and performance figures are optimized. The autonomic module, implemented in FastFlow, executes a generic map (reduce) data parallel pattern scheduling part of the tasks to the GPU and part to CPU cores so as to achieve optimal execution time. Experimental results on state-of-the-art CPU/GPU architectures are shown that assess both performance model properties and autonomic module effectiveness. © 2013 IEEE.
Resumo:
Metastasis is the predominant cause of death from cancer yet we have few biomarkers to predict patients at increased risk of metastasis and are unable to effectively treat disseminated disease. Analysis of 448 primary breast tumors determined that expression of the hylauronan receptor CD44 associated with high grade (p = 0.046), ER- (p = 0.001) and PR-negative tumors (p = 0.029), and correlated with increased distant recurrence and reduced disease-free survival in patients with lymph-node positive or large tumors. To determine its functional role in distant metastasis, CD44 was knocked-down in MDA-MB-231 cells using two independent shRNA sequences. Loss of CD44 attenuated tumor cell adhesion to endothelial cells and reduced cell invasion but did not affect proliferation in vitro. To verify the importance of CD44 to post-intravasation events, tumor formation was assessed by quantitative in vivo imaging and post-mortem tissue analysis following an intra-cardiac injection of transfected cells. CD44 knock-down increased survival and decreased overall tumor burden at multiple sites, including the skeleton in vivo. We conclude that elevated CD44 expression on tumour cells within the systemic circulation increases the efficiency of post-intravasation events and distant metastasis in vivo, consistent with its association with increased distant recurrence and reduced disease-free survival in patients.
Resumo:
We introduce a new parallel pattern derived from a specific application domain and show how it turns out to have application beyond its domain of origin. The pool evolution pattern models the parallel evolution of a population subject to mutations and evolving in such a way that a given fitness function is optimized. The pattern has been demonstrated to be suitable for capturing and modeling the parallel patterns underpinning various evolutionary algorithms, as well as other parallel patterns typical of symbolic computation. In this paper we introduce the pattern, we discuss its implementation on modern multi/many core architectures and finally present experimental results obtained with FastFlow and Erlang implementations to assess its feasibility and scalability.
Resumo:
A three-dimensional (3D) graphene-Co3O4 electrode was prepared by a two-step method in which graphene was initially deposited on a Ni foam with Co3O4 then grown on the resulting graphene structure. Cross-linked Co3O4 nanosheets with an open pore structure were fully and vertically distributed throughout the graphene skeleton. The free-standing and binder-free monolithic electrode was used directly as a cathode in a Li-O2 battery. This composite structure exhibited enhanced performance with a specific capacity of 2453 mA h g-1 at 0.1 mA cm-2 and 62 stable cycles with 583 mA h g-1 (1000 mA h gcarbon-1). The excellent electrochemical performance is associated with the unique architecture and superior catalytic activity of the 3D electrode.
Resumo:
Visual salience is an intriguing phenomenon observed in biological neural systems. Numerous attempts have been made to model visual salience mathematically using various feature contrasts, either locally or globally. However, these algorithmic models tend to ignore the problem’s biological solutions, in which visual salience appears to arise during the propagation of visual stimuli along the visual cortex. In this paper, inspired by the conjecture that salience arises from deep propagation along the visual cortex, we present a Deep Salience model where a multi-layer model based on successive Markov random fields (sMRF) is proposed to analyze the input image successively through its deep belief propagation. As a result, the foreground object can be automatically separated from the background in a fully unsupervised way. Experimental evaluation on the benchmark dataset validated that our Deep Salience model can consistently outperform eleven state-of-the-art salience models, yielding the higher rates in the precision-recall tests and attaining the best F-measure and mean-square error in the experiments.
Resumo:
Boolean games are a framework for reasoning about the rational behavior of agents whose goals are formalized using propositional formulas. Compared to normal form games, a well-studied and related game framework, Boolean games allow for an intuitive and more compact representation of the agents’ goals. So far, Boolean games have been mainly studied in the literature from the Knowledge Representation perspective, and less attention has been paid on the algorithmic issues underlying the computation of solution concepts. Although some suggestions for solving specific classes of Boolean games have been made in the literature, there is currently no work available on the practical performance. In this paper, we propose the first technique to solve general Boolean games that does not require an exponential translation to normal-form games. Our method is based on disjunctive answer set programming and computes solutions (equilibria) of arbitrary Boolean games. It can be applied to a wide variety of solution concepts, and can naturally deal with extensions of Boolean games such as constraints and costs. We present detailed experimental results in which we compare the proposed method against a number of existing methods for solving specific classes of Boolean games, as well as adaptations of methods that were initially designed for normal-form games. We found that the heuristic methods that do not require all payoff matrix entries performed well for smaller Boolean games, while our ASP based technique is faster when the problem instances have a higher number of agents or action variables.
Resumo:
Field programmable gate array (FPGA) technology is a powerful platform for implementing computationally complex, digital signal processing (DSP) systems. Applications that are multi-modal, however, are designed for worse case conditions. In this paper, genetic sequencing techniques are applied to give a more sophisticated decomposition of the algorithmic variations, thus allowing an unified hardware architecture which gives a 10-25% area saving and 15% power saving for a digital radar receiver.
Resumo:
O desenvolvimento de sistemas computacionais é um processo complexo, com múltiplas etapas, que requer uma análise profunda do problema, levando em consideração as limitações e os requisitos aplicáveis. Tal tarefa envolve a exploração de técnicas alternativas e de algoritmos computacionais para optimizar o sistema e satisfazer os requisitos estabelecidos. Neste contexto, uma das mais importantes etapas é a análise e implementação de algoritmos computacionais. Enormes avanços tecnológicos no âmbito das FPGAs (Field-Programmable Gate Arrays) tornaram possível o desenvolvimento de sistemas de engenharia extremamente complexos. Contudo, o número de transístores disponíveis por chip está a crescer mais rapidamente do que a capacidade que temos para desenvolver sistemas que tirem proveito desse crescimento. Esta limitação já bem conhecida, antes de se revelar com FPGAs, já se verificava com ASICs (Application-Specific Integrated Circuits) e tem vindo a aumentar continuamente. O desenvolvimento de sistemas com base em FPGAs de alta capacidade envolve uma grande variedade de ferramentas, incluindo métodos para a implementação eficiente de algoritmos computacionais. Esta tese pretende proporcionar uma contribuição nesta área, tirando partido da reutilização, do aumento do nível de abstracção e de especificações algorítmicas mais automatizadas e claras. Mais especificamente, é apresentado um estudo que foi levado a cabo no sentido de obter critérios relativos à implementação em hardware de algoritmos recursivos versus iterativos. Depois de serem apresentadas algumas das estratégias para implementar recursividade em hardware mais significativas, descreve-se, em pormenor, um conjunto de algoritmos para resolver problemas de pesquisa combinatória (considerados enquanto exemplos de aplicação). Versões recursivas e iterativas destes algoritmos foram implementados e testados em FPGA. Com base nos resultados obtidos, é feita uma cuidada análise comparativa. Novas ferramentas e técnicas de investigação que foram desenvolvidas no âmbito desta tese são também discutidas e demonstradas.
Resumo:
Consideramos o problema de controlo óptimo de tempo mínimo para sistemas de controlo mono-entrada e controlo afim num espaço de dimensão finita com condições inicial e final fixas, onde o controlo escalar toma valores num intervalo fechado. Quando aplicamos o método de tiro a este problema, vários obstáculos podem surgir uma vez que a função de tiro não é diferenciável quando o controlo é bang-bang. No caso bang-bang os tempos conjugados são teoricamente bem definidos para este tipo de sistemas de controlo, contudo os algoritmos computacionais directos disponíveis são de difícil aplicação. Por outro lado, no caso suave o conceito teórico e prático de tempos conjugados é bem conhecido, e ferramentas computacionais eficazes estão disponíveis. Propomos um procedimento de regularização para o qual as soluções do problema de tempo mínimo correspondente dependem de um parâmetro real positivo suficientemente pequeno e são definidas por funções suaves em relação à variável tempo, facilitando a aplicação do método de tiro simples. Provamos, sob hipóteses convenientes, a convergência forte das soluções do problema regularizado para a solução do problema inicial, quando o parâmetro real tende para zero. A determinação de tempos conjugados das trajectórias localmente óptimas do problema regularizado enquadra-se na teoria suave conhecida. Provamos, sob hipóteses adequadas, a convergência do primeiro tempo conjugado do problema regularizado para o primeiro tempo conjugado do problema inicial bang-bang, quando o parâmetro real tende para zero. Consequentemente, obtemos um algoritmo eficiente para a computação de tempos conjugados no caso bang-bang.