213 resultados para Concurrency
Resumo:
Recent trends in computing systems, such as multi-core processors and cloud computing, expose tens to thousands of processors to the software. Software developers must respond by introducing parallelism in their software. To obtain highest performance, it is not only necessary to identify parallelism, but also to reason about synchronization between threads and the communication of data from one thread to another. This entry gives an overview on some of the most common abstractions that are used in parallel programming, namely explicit vs. implicit expression of parallelism and shared and distributed memory. Several parallel programming models are reviewed and categorized by means of these abstractions. The pros and cons of parallel programming models from the perspective of performance and programmability are discussed.
Resumo:
This paper presents a scalable, statistical ‘black-box’ model for predicting the performance of parallel programs on multi-core non-uniform memory access (NUMA) systems. We derive a model with low overhead, by reducing data collection and model training time. The model can accurately predict the behaviour of parallel applications in response to changes in their concurrency, thread layout on NUMA nodes, and core voltage and frequency. We present a framework that applies the model to achieve significant energy and energy-delay-square (ED2) savings (9% and 25%, respectively) along with performance improvement (10% mean) on an actual 16-core NUMA system running realistic application workloads. Our prediction model proves substantially more accurate than previous efforts.
Resumo:
Approximate execution is a viable technique for energy-con\-strained environments, provided that applications have the mechanisms to produce outputs of the highest possible quality within the given energy budget.
We introduce a framework for energy-constrained execution with controlled and graceful quality loss. A simple programming model allows users to express the relative importance of computations for the quality of the end result, as well as minimum quality requirements. The significance-aware runtime system uses an application-specific analytical energy model to identify the degree of concurrency and approximation that maximizes quality while meeting user-specified energy constraints. Evaluation on a dual-socket 8-core server shows that the proposed
framework predicts the optimal configuration with high accuracy, enabling energy-constrained executions that result in significantly higher quality compared to loop perforation, a compiler approximation technique.
Resumo:
We present a rigorous methodology and new metrics for fair comparison of server and microserver platforms. Deploying our methodology and metrics, we compare a microserver with ARM cores against two servers with ×86 cores running the same real-time financial analytics workload. We define workload-specific but platform-independent performance metrics for platform comparison, targeting both datacenter operators and end users. Our methodology establishes that a server based on the Xeon Phi co-processor delivers the highest performance and energy efficiency. However, by scaling out energy-efficient microservers, we achieve competitive or better energy efficiency than a power-equivalent server with two Sandy Bridge sockets, despite the microserver's slower cores. Using a new iso-QoS metric, we find that the ARM microserver scales enough to meet market throughput demand, that is, a 100% QoS in terms of timely option pricing, with as little as 55% of the energy consumed by the Sandy Bridge server.
Resumo:
As the complexity of computing systems grows, reliability and energy are two crucial challenges asking for holistic solutions. In this paper, we investigate the interplay among concurrency, power dissipation, energy consumption and voltage-frequency scaling for a key numerical kernel for the solution of sparse linear systems. Concretely, we leverage a task-parallel implementation of the Conjugate Gradient method, equipped with an state-of-the-art pre-conditioner embedded in the ILUPACK software, and target a low-power multi core processor from ARM.In addition, we perform a theoretical analysis on the impact of a technique like Near Threshold Voltage Computing (NTVC) from the points of view of increased hardware concurrency and error rate.
Resumo:
In the reinsurance market, the risks natural catastrophes pose to portfolios of properties must be quantified, so that they can be priced, and insurance offered. The analysis of such risks at a portfolio level requires a simulation of up to 800 000 trials with an average of 1000 catastrophic events per trial. This is sufficient to capture risk for a global multi-peril reinsurance portfolio covering a range of perils including earthquake, hurricane, tornado, hail, severe thunderstorm, wind storm, storm surge and riverine flooding, and wildfire. Such simulations are both computation and data intensive, making the application of high-performance computing techniques desirable.
In this paper, we explore the design and implementation of portfolio risk analysis on both multi-core and many-core computing platforms. Given a portfolio of property catastrophe insurance treaties, key risk measures, such as probable maximum loss, are computed by taking both primary and secondary uncertainties into account. Primary uncertainty is associated with whether or not an event occurs in a simulated year, while secondary uncertainty captures the uncertainty in the level of loss due to the use of simplified physical models and limitations in the available data. A combination of fast lookup structures, multi-threading and careful hand tuning of numerical operations is required to achieve good performance. Experimental results are reported for multi-core processors and systems using NVIDIA graphics processing unit and Intel Phi many-core accelerators.
Resumo:
Tese de doutoramento, Informática (Ciências da Computação), Universidade de Lisboa, Faculdade de Ciências, 2015
Resumo:
Thesis (Master's)--University of Washington, 2015
Resumo:
As perspectivas de um mercado cada vez mais global, impostas pela intensificação da concorrência, aumento da competitividade e surgimento de economias de escala, têm desafiado os países a construírem identidades nacionais cada vez mais fortes e diferenciadoras, procurando alcançar uma vantagem competitiva que permita, não só um desempenho positivo nas relações comerciais, mas, igualmente, a valorização dos produtos nos mercados internacionais. Procura-se com o presente trabalho analisar o modo como Portugal se tem vindo a posicionar no panorama internacional, no âmbito do sector vitivinícola. Este é um sector que tem vindo a dar provas do seu forte potencial estratégico e, com recurso à construção da marca Wines of Portugal, os vinhos portugueses usufruem, actualmente, de mecanismos para a expressão da sua identidade nos mercados externos, permitindo um reforço do seu posicionamento estratégico. As noções de “marca país” e “comunicação institucional” surgem como ponto de partida para a análise do plano de comunicação proposto pela ViniPortugal para a marca Wines of Portugal, procurando compreender quais os contributos das relações públicas na construção e expressão da identidade dos vinhos portugueses num contexto internacional.
Resumo:
Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↵ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead. In this dissertation, we present a scalability study of two general purpose IMDBs on multicore systems. The results show that current general purpose IMDBs do not scale on multicores, due to contention among threads running concurrent transactions. In this work, we explore di↵erent direction to overcome the scalability issues of IMDBs in multicores, while enforcing strong isolation semantics. First, we present a solution that requires no modification to either database systems or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↵er scalable performance under read-only workloads, while updateintensive workloads su↵er from performance loss, when compared to the standalone engine. Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↵ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads. Next we addressed the scalability limitations for update-intensive workloads, and propose the reduction of locking granularity from the table level to the attribute level. This further improved performance for intensive and moderate update workloads, at a slight cost for read-only workloads. Scalability is limited to intensive-read and read-only workloads. Finally, we investigate the impact applications have on the performance of database systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under which transaction perform all read operations before executing write operations. The RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation.
Resumo:
Background: Research indicates a steady increase in marijuana use and that it is concurrent with tobacco. There is speculation this concurrency reaches beyond use, to where policies aimed at reducing one may result in the reduction of the other. Purpose: To investigate the association between tobacco control policies and marijuana use among young adult undergraduates. Methods: A stratified sample of Ontario universities resulted in a sample of 4,966 participants. Results: Campuses with a moderately strong policy was found to be significantly associated with decreased marijuana use compared to campuses with a weak tobacco control policy. (OR=0.52, 95% CI: 0.36-0.76). Conclusions: The findings show tobacco control strategies are related to decreased odds of marijuana use among Ontario undergraduates. These findings are important to both policy makers and researchers interested in health strategies pertaining to marijuana and tobacco use and/or how health policies aimed at reducing one risk behaviour can affect another.
Resumo:
Avec la complexité croissante des systèmes sur puce, de nouveaux défis ne cessent d’émerger dans la conception de ces systèmes en matière de vérification formelle et de synthèse de haut niveau. Plusieurs travaux autour de SystemC, considéré comme la norme pour la conception au niveau système, sont en cours afin de relever ces nouveaux défis. Cependant, à cause du modèle de concurrence complexe de SystemC, relever ces défis reste toujours une tâche difficile. Ainsi, nous pensons qu’il est primordial de partir sur de meilleures bases en utilisant un modèle de concurrence plus efficace. Par conséquent, dans cette thèse, nous étudions une méthodologie de conception qui offre une meilleure abstraction pour modéliser des composants parallèles en se basant sur le concept de transaction. Nous montrons comment, grâce au raisonnement simple que procure le concept de transaction, il devient plus facile d’appliquer la vérification formelle, le raffinement incrémental et la synthèse de haut niveau. Dans le but d’évaluer l’efficacité de cette méthodologie, nous avons fixé l’objectif d’optimiser la vitesse de simulation d’un modèle transactionnel en profitant d’une machine multicoeur. Nous présentons ainsi l’environnement de modélisation et de simulation parallèle que nous avons développé. Nous étudions différentes stratégies d’ordonnancement en matière de parallélisme et de surcoût de synchronisation. Une expérimentation faite sur un modèle du transmetteur Wi-Fi 802.11a a permis d’atteindre une accélération d’environ 1.8 en utilisant deux threads. Avec 8 threads, bien que la charge de travail des différentes transactions n’était pas importante, nous avons pu atteindre une accélération d’environ 4.6, ce qui est un résultat très prometteur.
Resumo:
Dans une perspective d’analyse des risques pour la santé publique, l’estimation de l’exposition revêt une importance capitale. Parmi les approches existantes d’estimation de l’exposition, l’utilisation d’outils, tels que des questionnaires alimentaires, la modélisation toxicocinétique ou les reconstructions de doses, en complément de la surveillance biologique, permet de raffiner les estimations, et ainsi, de mieux caractériser les risques pour la santé. Ces différents outils et approches ont été développés et appliqués à deux substances d’intérêt, le méthylmercure et le sélénium en raison des effets toxiques bien connus du méthylmercure, de l’interaction entre le méthylmercure et le sélénium réduisant potentiellement ces effets toxiques, et de l’existence de sources communes via la consommation de poisson. Ainsi, l’objectif général de cette thèse consistait à produire des données cinétiques et comparatives manquantes pour la validation et l’interprétation d’approches et d’outils d’évaluation de l’exposition au méthylmercure et au sélénium. Pour ce faire, l’influence du choix de la méthode d’évaluation de l’exposition au méthylmercure a été déterminée en comparant les apports quotidiens et les risques pour la santé estimés par différentes approches (évaluation directe de l’exposition par la surveillance biologique combinée à la modélisation toxicocinétique ou évaluation indirecte par questionnaire alimentaire). D’importantes différences entre ces deux approches ont été observées : les apports quotidiens de méthylmercure estimés par questionnaires sont en moyenne six fois plus élevés que ceux estimés à l’aide de surveillance biologique et modélisation. Ces deux méthodes conduisent à une appréciation des risques pour la santé divergente puisqu’avec l’approche indirecte, les doses quotidiennes estimées de méthylmercure dépassent les normes de Santé Canada pour 21 des 23 volontaires, alors qu’avec l’approche directe, seulement 2 des 23 volontaires sont susceptibles de dépasser les normes. Ces différences pourraient être dues, entre autres, à des biais de mémoire et de désirabilité lors de la complétion des questionnaires. En outre, l’étude de la distribution du sélénium dans différentes matrices biologiques suite à une exposition non alimentaire (shampoing à forte teneur en sélénium) visait, d’une part, à étudier la cinétique du sélénium provenant de cette source d’exposition et, d’autre part, à évaluer la contribution de cette source à la charge corporelle totale. Un suivi des concentrations biologiques (sang, urine, cheveux et ongles) pendant une période de 18 mois chez des volontaires exposés à une source non alimentaire de sélénium a contribué à mieux expliciter les mécanismes de transfert du sélénium du site d’absorption vers le sang (concomitance des voies régulées et non régulées). Ceci a permis de montrer que, contrairement au méthylmercure, l’utilisation des cheveux comme biomarqueur peut mener à une surestimation importante de la charge corporelle réelle en sélénium en cas de non contrôle de facteurs confondants tels que l’utilisation de shampoing contenant du sélénium. Finalement, une analyse exhaustive des données de surveillance biologique du sélénium issues de 75 études publiées dans la littérature a permis de mieux comprendre la cinétique globale du sélénium dans l’organisme humain. En particulier, elle a permis le développement d’un outil reliant les apports quotidiens et les concentrations biologiques de sélénium dans les différentes matrices à l’aide d’algorithmes mathématiques. Conséquemment, à l’aide de ces données cinétiques exprimées par un système d’équations logarithmiques et de leur représentation graphique, il est possible d’estimer les apports quotidiens chez un individu à partir de divers prélèvements biologiques, et ainsi, de faciliter la comparaison d’études de surveillance biologique du sélénium utilisant des biomarqueurs différents. L’ensemble de ces résultats de recherche montre que la méthode choisie pour évaluer l’exposition a un impact important sur les estimations des risques associés. De plus, les recherches menées ont permis de mettre en évidence que le sélénium non alimentaire ne contribue pas de façon significative à la charge corporelle totale, mais constitue un facteur de confusion pour l’estimation de la charge corporelle réelle en sélénium. Finalement, la détermination des équations et des coefficients reliant les concentrations de sélénium entre différentes matrices biologiques, à l’aide d’une vaste base de données cinétiques, concourt à mieux interpréter les résultats de surveillance biologique.
Resumo:
A foundational model of concurrency is developed in this thesis. We examine issues in the design of parallel systems and show why the actor model is suitable for exploiting large-scale parallelism. Concurrency in actors is constrained only by the availability of hardware resources and by the logical dependence inherent in the computation. Unlike dataflow and functional programming, however, actors are dynamically reconfigurable and can model shared resources with changing local state. Concurrency is spawned in actors using asynchronous message-passing, pipelining, and the dynamic creation of actors. This thesis deals with some central issues in distributed computing. Specifically, problems of divergence and deadlock are addressed. For example, actors permit dynamic deadlock detection and removal. The problem of divergence is contained because independent transactions can execute concurrently and potentially infinite processes are nevertheless available for interaction.
Resumo:
Computational models are arising is which programs are constructed by specifying large networks of very simple computational devices. Although such models can potentially make use of a massive amount of concurrency, their usefulness as a programming model for the design of complex systems will ultimately be decided by the ease in which such networks can be programmed (constructed). This thesis outlines a language for specifying computational networks. The language (AFL-1) consists of a set of primitives, ad a mechanism to group these elements into higher level structures. An implementation of this language runs on the Thinking Machines Corporation, Connection machine. Two significant examples were programmed in the language, an expert system (CIS), and a planning system (AFPLAN). These systems are explained and analyzed in terms of how they compare with similar systems written in conventional languages.