57 resultados para Parallel algorithms
em Instituto Politécnico do Porto, Portugal
Resumo:
Face à estagnação da tecnologia uniprocessador registada na passada década, aos principais fabricantes de microprocessadores encontraram na tecnologia multi-core a resposta `as crescentes necessidades de processamento do mercado. Durante anos, os desenvolvedores de software viram as suas aplicações acompanhar os ganhos de performance conferidos por cada nova geração de processadores sequenciais, mas `a medida que a capacidade de processamento escala em função do número de processadores, a computação sequencial tem de ser decomposta em várias partes concorrentes que possam executar em paralelo, para que possam utilizar as unidades de processamento adicionais e completar mais rapidamente. A programação paralela implica um paradigma completamente distinto da programação sequencial. Ao contrário dos computadores sequenciais tipificados no modelo de Von Neumann, a heterogeneidade de arquiteturas paralelas requer modelos de programação paralela que abstraiam os programadores dos detalhes da arquitectura e simplifiquem o desenvolvimento de aplicações concorrentes. Os modelos de programação paralela mais populares incitam os programadores a identificar instruções concorrentes na sua lógica de programação, e a especificá-las sob a forma de tarefas que possam ser atribuídas a processadores distintos para executarem em simultâneo. Estas tarefas são tipicamente lançadas durante a execução, e atribuídas aos processadores pelo motor de execução subjacente. Como os requisitos de processamento costumam ser variáveis, e não são conhecidos a priori, o mapeamento de tarefas para processadores tem de ser determinado dinamicamente, em resposta a alterações imprevisíveis dos requisitos de execução. `A medida que o volume da computação cresce, torna-se cada vez menos viável garantir as suas restrições temporais em plataformas uniprocessador. Enquanto os sistemas de tempo real se começam a adaptar ao paradigma de computação paralela, há uma crescente aposta em integrar execuções de tempo real com aplicações interativas no mesmo hardware, num mundo em que a tecnologia se torna cada vez mais pequena, leve, ubíqua, e portável. Esta integração requer soluções de escalonamento que simultaneamente garantam os requisitos temporais das tarefas de tempo real e mantenham um nível aceitável de QoS para as restantes execuções. Para tal, torna-se imperativo que as aplicações de tempo real paralelizem, de forma a minimizar os seus tempos de resposta e maximizar a utilização dos recursos de processamento. Isto introduz uma nova dimensão ao problema do escalonamento, que tem de responder de forma correcta a novos requisitos de execução imprevisíveis e rapidamente conjeturar o mapeamento de tarefas que melhor beneficie os critérios de performance do sistema. A técnica de escalonamento baseado em servidores permite reservar uma fração da capacidade de processamento para a execução de tarefas de tempo real, e assegurar que os efeitos de latência na sua execução não afectam as reservas estipuladas para outras execuções. No caso de tarefas escalonadas pelo tempo de execução máximo, ou tarefas com tempos de execução variáveis, torna-se provável que a largura de banda estipulada não seja consumida por completo. Para melhorar a utilização do sistema, os algoritmos de partilha de largura de banda (capacity-sharing) doam a capacidade não utilizada para a execução de outras tarefas, mantendo as garantias de isolamento entre servidores. Com eficiência comprovada em termos de espaço, tempo, e comunicação, o mecanismo de work-stealing tem vindo a ganhar popularidade como metodologia para o escalonamento de tarefas com paralelismo dinâmico e irregular. O algoritmo p-CSWS combina escalonamento baseado em servidores com capacity-sharing e work-stealing para cobrir as necessidades de escalonamento dos sistemas abertos de tempo real. Enquanto o escalonamento em servidores permite partilhar os recursos de processamento sem interferências a nível dos atrasos, uma nova política de work-stealing que opera sobre o mecanismo de capacity-sharing aplica uma exploração de paralelismo que melhora os tempos de resposta das aplicações e melhora a utilização do sistema. Esta tese propõe uma implementação do algoritmo p-CSWS para o Linux. Em concordância com a estrutura modular do escalonador do Linux, ´e definida uma nova classe de escalonamento que visa avaliar a aplicabilidade da heurística p-CSWS em circunstâncias reais. Ultrapassados os obstáculos intrínsecos `a programação da kernel do Linux, os extensos testes experimentais provam que o p-CSWS ´e mais do que um conceito teórico atrativo, e que a exploração heurística de paralelismo proposta pelo algoritmo beneficia os tempos de resposta das aplicações de tempo real, bem como a performance e eficiência da plataforma multiprocessador.
Resumo:
Multicore platforms have transformed parallelism into a main concern. Parallel programming models are being put forward to provide a better approach for application programmers to expose the opportunities for parallelism by pointing out potentially parallel regions within tasks, leaving the actual and dynamic scheduling of these regions onto processors to be performed at runtime, exploiting the maximum amount of parallelism. It is in this context that this paper proposes a scheduling approach that combines the constant-bandwidth server abstraction with a priority-aware work-stealing load balancing scheme which, while ensuring isolation among tasks, enables parallel tasks to be executed on more than one processor at a given time instant.
Resumo:
The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. The advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictable high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. To this end, it is of paramount importance to develop new techniques for exploiting the massively parallel computation capabilities of such platforms in a predictable way. P-SOCRATES will tackle this important challenge by merging leading research groups from the HPC and EC communities. The time-criticality and parallelisation challenges common to both areas will be addressed by proposing an integrated framework for executing workload-intensive applications with real-time requirements on top of next-generation commercial-off-the-shelf (COTS) platforms based on many-core accelerated architectures. The project will investigate new HPC techniques that fulfil real-time requirements. The main sources of indeterminism will be identified, proposing efficient mapping and scheduling algorithms, along with the associated timing and schedulability analysis, to guarantee the real-time and performance requirements of the applications.
Resumo:
In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.
Resumo:
In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.
Resumo:
Introduction: Image resizing is a normal feature incorporated into the Nuclear Medicine digital imaging. Upsampling is done by manufacturers to adequately fit more the acquired images on the display screen and it is applied when there is a need to increase - or decrease - the total number of pixels. This paper pretends to compare the “hqnx” and the “nxSaI” magnification algorithms with two interpolation algorithms – “nearest neighbor” and “bicubic interpolation” – in the image upsampling operations. Material and Methods: Three distinct Nuclear Medicine images were enlarged 2 and 4 times with the different digital image resizing algorithms (nearest neighbor, bicubic interpolation nxSaI and hqnx). To evaluate the pixel’s changes between the different output images, 3D whole image plot profiles and surface plots were used as an addition to the visual approach in the 4x upsampled images. Results: In the 2x enlarged images the visual differences were not so noteworthy. Although, it was clearly noticed that bicubic interpolation presented the best results. In the 4x enlarged images the differences were significant, with the bicubic interpolated images presenting the best results. Hqnx resized images presented better quality than 4xSaI and nearest neighbor interpolated images, however, its intense “halo effect” affects greatly the definition and boundaries of the image contents. Conclusion: The hqnx and the nxSaI algorithms were designed for images with clear edges and so its use in Nuclear Medicine images is obviously inadequate. Bicubic interpolation seems, from the algorithms studied, the most suitable and its each day wider applications seem to show it, being assumed as a multi-image type efficient algorithm.
Resumo:
Introduction: A major focus of data mining process - especially machine learning researches - is to automatically learn to recognize complex patterns and help to take the adequate decisions strictly based on the acquired data. Since imaging techniques like MPI – Myocardial Perfusion Imaging on Nuclear Cardiology, can implicate a huge part of the daily workflow and generate gigabytes of data, there could be advantages on Computerized Analysis of data over Human Analysis: shorter time, homogeneity and consistency, automatic recording of analysis results, relatively inexpensive, etc.Objectives: The aim of this study relates with the evaluation of the efficacy of this methodology on the evaluation of MPI Stress studies and the process of decision taking concerning the continuation – or not – of the evaluation of each patient. It has been pursued has an objective to automatically classify a patient test in one of three groups: “Positive”, “Negative” and “Indeterminate”. “Positive” would directly follow to the Rest test part of the exam, the “Negative” would be directly exempted from continuation and only the “Indeterminate” group would deserve the clinician analysis, so allowing economy of clinician’s effort, increasing workflow fluidity at the technologist’s level and probably sparing time to patients. Methods: WEKA v3.6.2 open source software was used to make a comparative analysis of three WEKA algorithms (“OneR”, “J48” and “Naïve Bayes”) - on a retrospective study using the comparison with correspondent clinical results as reference, signed by nuclear cardiologist experts - on “SPECT Heart Dataset”, available on University of California – Irvine, at the Machine Learning Repository. For evaluation purposes, criteria as “Precision”, “Incorrectly Classified Instances” and “Receiver Operating Characteristics (ROC) Areas” were considered. Results: The interpretation of the data suggests that the Naïve Bayes algorithm has the best performance among the three previously selected algorithms. Conclusions: It is believed - and apparently supported by the findings - that machine learning algorithms could significantly assist, at an intermediary level, on the analysis of scintigraphic data obtained on MPI, namely after Stress acquisition, so eventually increasing efficiency of the entire system and potentially easing both roles of Technologists and Nuclear Cardiologists. In the actual continuation of this study, it is planned to use more patient information and significantly increase the population under study, in order to allow improving system accuracy.
Resumo:
The paper formulates a genetic algorithm that evolves two types of objects in a plane. The fitness function promotes a relationship between the objects that is optimal when some kind of interface between them occurs. Furthermore, the algorithm adopts an hexagonal tessellation of the two-dimensional space for promoting an efficient method of the neighbour modelling. The genetic algorithm produces special patterns with resemblances to those revealed in percolation phenomena or in the symbiosis found in lichens. Besides the analysis of the spacial layout, a modelling of the time evolution is performed by adopting a distance measure and the modelling in the Fourier domain in the perspective of fractional calculus. The results reveal a consistent, and easy to interpret, set of model parameters for distinct operating conditions.
Resumo:
To avoid additional hardware deployment, indoor localization systems have to be designed in such a way that they rely on existing infrastructure only. Besides the processing of measurements between nodes, localization procedure can include the information of all available environment information. In order to enhance the performance of Wi-Fi based localization systems, the innovative solution presented in this paper considers also the negative information. An indoor tracking method inspired by Kalman filtering is also proposed.
Resumo:
This paper proposes a global multiprocessor scheduling algorithm for the Linux kernel that combines the global EDF scheduler with a priority-aware work-stealing load balancing scheme, enabling parallel real-time tasks to be executed on more than one processor at a given time instant. We state that some priority inversion may actually be acceptable, provided it helps reduce contention, communication, synchronisation and coordination between parallel threads, while still guaranteeing the expected system’s predictability. Experimental results demonstrate the low scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.
Resumo:
Dynamic parallel scheduling using work-stealing has gained popularity in academia and industry for its good performance, ease of implementation and theoretical bounds on space and time. Cores treat their own double-ended queues (deques) as a stack, pushing and popping threads from the bottom, but treat the deque of another randomly selected busy core as a queue, stealing threads only from the top, whenever they are idle. However, this standard approach cannot be directly applied to real-time systems, where the importance of parallelising tasks is increasing due to the limitations of multiprocessor scheduling theory regarding parallelism. Using one deque per core is obviously a source of priority inversion since high priority tasks may eventually be enqueued after lower priority tasks, possibly leading to deadline misses as in this case the lower priority tasks are the candidates when a stealing operation occurs. Our proposal is to replace the single non-priority deque of work-stealing with ordered per-processor priority deques of ready threads. The scheduling algorithm starts with a single deque per-core, but unlike traditional work-stealing, the total number of deques in the system may now exceed the number of processors. Instead of stealing randomly, cores steal from the highest priority deque.
Resumo:
Real-time embedded applications require to process large amounts of data within small time windows. Parallelize and distribute workloads adaptively is suitable solution for computational demanding applications. The purpose of the Parallel Real-Time Framework for distributed adaptive embedded systems is to guarantee local and distributed processing of real-time applications. This work identifies some promising research directions for parallel/distributed real-time embedded applications.
Resumo:
Consider the problem of assigning real-time tasks on a heterogeneous multiprocessor platform comprising two different types of processors — such a platform is referred to as two-type platform. We present two linearithmic timecomplexity algorithms, SA and SA-P, each providing the follow- ing guarantee. For a given two-type platform and a given task set, if there exists a feasible task-to-processor-type assignment such that tasks can be scheduled to meet deadlines by allowing them to migrate only between processors of the same type, then (i) using SA, it is guaranteed to find such a feasible task-to- processor-type assignment where the same restriction on task migration applies but given a platform in which processors are 1+α/2 times faster and (ii) SA-P succeeds in finding 2 a feasible task-to-processor assignment where tasks are not allowed to migrate between processors but given a platform in which processors are 1+α/times faster, where 0<α≤1. The parameter α is a property of the task set — it is the maximum utilization of any task which is less than or equal to 1.