936 resultados para Thread safe parallel run-time
Resumo:
The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. The advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictable high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. To this end, it is of paramount importance to develop new techniques for exploiting the massively parallel computation capabilities of such platforms in a predictable way. P-SOCRATES will tackle this important challenge by merging leading research groups from the HPC and EC communities. The time-criticality and parallelisation challenges common to both areas will be addressed by proposing an integrated framework for executing workload-intensive applications with real-time requirements on top of next-generation commercial-off-the-shelf (COTS) platforms based on many-core accelerated architectures. The project will investigate new HPC techniques that fulfil real-time requirements. The main sources of indeterminism will be identified, proposing efficient mapping and scheduling algorithms, along with the associated timing and schedulability analysis, to guarantee the real-time and performance requirements of the applications.
Resumo:
Face à estagnação da tecnologia uniprocessador registada na passada década, aos principais fabricantes de microprocessadores encontraram na tecnologia multi-core a resposta `as crescentes necessidades de processamento do mercado. Durante anos, os desenvolvedores de software viram as suas aplicações acompanhar os ganhos de performance conferidos por cada nova geração de processadores sequenciais, mas `a medida que a capacidade de processamento escala em função do número de processadores, a computação sequencial tem de ser decomposta em várias partes concorrentes que possam executar em paralelo, para que possam utilizar as unidades de processamento adicionais e completar mais rapidamente. A programação paralela implica um paradigma completamente distinto da programação sequencial. Ao contrário dos computadores sequenciais tipificados no modelo de Von Neumann, a heterogeneidade de arquiteturas paralelas requer modelos de programação paralela que abstraiam os programadores dos detalhes da arquitectura e simplifiquem o desenvolvimento de aplicações concorrentes. Os modelos de programação paralela mais populares incitam os programadores a identificar instruções concorrentes na sua lógica de programação, e a especificá-las sob a forma de tarefas que possam ser atribuídas a processadores distintos para executarem em simultâneo. Estas tarefas são tipicamente lançadas durante a execução, e atribuídas aos processadores pelo motor de execução subjacente. Como os requisitos de processamento costumam ser variáveis, e não são conhecidos a priori, o mapeamento de tarefas para processadores tem de ser determinado dinamicamente, em resposta a alterações imprevisíveis dos requisitos de execução. `A medida que o volume da computação cresce, torna-se cada vez menos viável garantir as suas restrições temporais em plataformas uniprocessador. Enquanto os sistemas de tempo real se começam a adaptar ao paradigma de computação paralela, há uma crescente aposta em integrar execuções de tempo real com aplicações interativas no mesmo hardware, num mundo em que a tecnologia se torna cada vez mais pequena, leve, ubíqua, e portável. Esta integração requer soluções de escalonamento que simultaneamente garantam os requisitos temporais das tarefas de tempo real e mantenham um nível aceitável de QoS para as restantes execuções. Para tal, torna-se imperativo que as aplicações de tempo real paralelizem, de forma a minimizar os seus tempos de resposta e maximizar a utilização dos recursos de processamento. Isto introduz uma nova dimensão ao problema do escalonamento, que tem de responder de forma correcta a novos requisitos de execução imprevisíveis e rapidamente conjeturar o mapeamento de tarefas que melhor beneficie os critérios de performance do sistema. A técnica de escalonamento baseado em servidores permite reservar uma fração da capacidade de processamento para a execução de tarefas de tempo real, e assegurar que os efeitos de latência na sua execução não afectam as reservas estipuladas para outras execuções. No caso de tarefas escalonadas pelo tempo de execução máximo, ou tarefas com tempos de execução variáveis, torna-se provável que a largura de banda estipulada não seja consumida por completo. Para melhorar a utilização do sistema, os algoritmos de partilha de largura de banda (capacity-sharing) doam a capacidade não utilizada para a execução de outras tarefas, mantendo as garantias de isolamento entre servidores. Com eficiência comprovada em termos de espaço, tempo, e comunicação, o mecanismo de work-stealing tem vindo a ganhar popularidade como metodologia para o escalonamento de tarefas com paralelismo dinâmico e irregular. O algoritmo p-CSWS combina escalonamento baseado em servidores com capacity-sharing e work-stealing para cobrir as necessidades de escalonamento dos sistemas abertos de tempo real. Enquanto o escalonamento em servidores permite partilhar os recursos de processamento sem interferências a nível dos atrasos, uma nova política de work-stealing que opera sobre o mecanismo de capacity-sharing aplica uma exploração de paralelismo que melhora os tempos de resposta das aplicações e melhora a utilização do sistema. Esta tese propõe uma implementação do algoritmo p-CSWS para o Linux. Em concordância com a estrutura modular do escalonador do Linux, ´e definida uma nova classe de escalonamento que visa avaliar a aplicabilidade da heurística p-CSWS em circunstâncias reais. Ultrapassados os obstáculos intrínsecos `a programação da kernel do Linux, os extensos testes experimentais provam que o p-CSWS ´e mais do que um conceito teórico atrativo, e que a exploração heurística de paralelismo proposta pelo algoritmo beneficia os tempos de resposta das aplicações de tempo real, bem como a performance e eficiência da plataforma multiprocessador.
Resumo:
Glazing is a technique used to retard fish deterioration during storage. This work focuses on the study of distinct variables (fish temperature, coating temperature, dipping time) that affect the thickness of edible coatings (water glazing and 1.5% chitosan) applied on frozen fish. Samples of frozen Atlantic salmon (Salmo salar) at -15, -20, and -25 °C were either glazed with water at 0.5, 1.5 or 2.5 °C or coated with 1.5% chitosan solution at 2.5, 5 or 8 °C, by dipping during 10 to 60 s. For both water and chitosan coatings, lowering the salmon and coating solution temperatures resulted in an increase of coating thickness. At the same conditions, higher thickness values were obtained when using chitosan (max. thickness of 1.41±0.05 mm) compared to water (max. thickness of 0.84±0.03 mm). Freezing temperature and crystallization heat were found to be lower for 1.5% chitosan solution than for water, thus favoring phase change. Salmon temperature profiles allowed determining, for different dipping conditions, whether the salmon temperature was within food safety standards to prevent the growth of pathogenic microorganisms. The concept of safe dipping time is proposed to define how long a frozen product can be dipped into a solution without the temperature raising to a point where it can constitute a hazard.
Resumo:
Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.
Resumo:
We propose methods for testing hypotheses of non-causality at various horizons, as defined in Dufour and Renault (1998, Econometrica). We study in detail the case of VAR models and we propose linear methods based on running vector autoregressions at different horizons. While the hypotheses considered are nonlinear, the proposed methods only require linear regression techniques as well as standard Gaussian asymptotic distributional theory. Bootstrap procedures are also considered. For the case of integrated processes, we propose extended regression methods that avoid nonstandard asymptotics. The methods are applied to a VAR model of the U.S. economy.
Resumo:
A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).
Resumo:
The real-time parallel computation of histograms using an array of pipelined cells is proposed and prototyped in this paper with application to consumer imaging products. The array operates in two modes: histogram computation and histogram reading. The proposed parallel computation method does not use any memory blocks. The resulting histogram bins can be stored into an external memory block in a pipelined fashion for subsequent reading or streaming of the results. The array of cells can be tuned to accommodate the required data path width in a VLSI image processing engine as present in many imaging consumer devices. Synthesis of the architectures presented in this paper in FPGA are shown to compute the real-time histogram of images streamed at over 36 megapixels at 30 frames/s by processing in parallel 1, 2 or 4 pixels per clock cycle.
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.
Resumo:
It is well known that cointegration between the level of two variables (e.g. prices and dividends) is a necessary condition to assess the empirical validity of a present-value model (PVM) linking them. The work on cointegration,namelyon long-run co-movements, has been so prevalent that it is often over-looked that another necessary condition for the PVM to hold is that the forecast error entailed by the model is orthogonal to the past. This amounts to investigate whether short-run co-movememts steming from common cyclical feature restrictions are also present in such a system. In this paper we test for the presence of such co-movement on long- and short-term interest rates and on price and dividend for the U.S. economy. We focuss on the potential improvement in forecasting accuracies when imposing those two types of restrictions coming from economic theory.
Resumo:
This paper has two original contributions. First, we show that the present value model (PVM hereafter), which has a wide application in macroeconomics and fi nance, entails common cyclical feature restrictions in the dynamics of the vector error-correction representation (Vahid and Engle, 1993); something that has been already investigated in that VECM context by Johansen and Swensen (1999, 2011) but has not been discussed before with this new emphasis. We also provide the present value reduced rank constraints to be tested within the log-linear model. Our second contribution relates to forecasting time series that are subject to those long and short-run reduced rank restrictions. The reason why appropriate common cyclical feature restrictions might improve forecasting is because it finds natural exclusion restrictions preventing the estimation of useless parameters, which would otherwise contribute to the increase of forecast variance with no expected reduction in bias. We applied the techniques discussed in this paper to data known to be subject to present value restrictions, i.e. the online series maintained and up-dated by Shiller. We focus on three different data sets. The fi rst includes the levels of interest rates with long and short maturities, the second includes the level of real price and dividend for the S&P composite index, and the third includes the logarithmic transformation of prices and dividends. Our exhaustive investigation of several different multivariate models reveals that better forecasts can be achieved when restrictions are applied to them. Moreover, imposing short-run restrictions produce forecast winners 70% of the time for target variables of PVMs and 63.33% of the time when all variables in the system are considered.
Resumo:
Using a sequence of nested multivariate models that are VAR-based, we discuss different layers of restrictions imposed by present-value models (PVM hereafter) on the VAR in levels for series that are subject to present-value restrictions. Our focus is novel - we are interested in the short-run restrictions entailed by PVMs (Vahid and Engle, 1993, 1997) and their implications for forecasting. Using a well-known database, kept by Robert Shiller, we implement a forecasting competition that imposes different layers of PVM restrictions. Our exhaustive investigation of several different multivariate models reveals that better forecasts can be achieved when restrictions are applied to the unrestricted VAR. Moreover, imposing short-run restrictions produces forecast winners 70% of the time for the target variables of PVMs and 63.33% of the time when all variables in the system are considered.
Resumo:
In order to improve the diagnosis of enzootic pneumonia (EP) in pigs two real-time polymerase chain reaction (rtPCR) assays for the detection of Mycoplasma hyopneumoniae in bronchial swabs from lung necropsies were established and validated in parallel. As a gold standard, the current "mosaic diagnosis" was taken, including epidemiological tracing, clinical signs, macro- and histopathological lesions of the lungs and immunofluorescence. One rtPCR is targeting a repeated DNA element of the M. hyopneumoniae genome (REP assay), the other a putative ABC transporter gene (ABC assay). Both assays were shown to be specific for M. hyopneumoniae and did not cross react with other bacteria and mollicutes from pig. With material from pigs of defined EP-negative farms the two assays showed to be 100% specific. When testing lungs from pig farms with EP, the REP assay detected 50% and the ABC assay 90% of the farms as positive. Both tests together detected all positive farms. Within a positive herd the two assays tested similarly with on average over 90% of the lung samples analysed from a single farm showing positive scores. A series of samples with suspicion of EP and samples from pigs with diseases other than respiratory taken from current routine diagnostic was assayed. None of the assays showed false positive results. The sensitivities in this sample group were 50% for the REP and 70% for the ABC assays and for both assays together 85%. The two assays run in parallel are therefore a valuable tool for the improvement of the current diagnosis of EP.
Resumo:
We investigate parallel algorithms for the solution of the Navier–Stokes equations in space-time. For periodic solutions, the discretized problem can be written as a large non-linear system of equations. This system of equations is solved by a Newton iteration. The Newton correction is computed using a preconditioned GMRES solver. The parallel performance of the algorithm is illustrated.