459 resultados para Parallelism
Resumo:
The number of comparative studies in the field of political communication increased considerably after Daniel Hallin and Paolo Mancini's publication of comparing Media systems. In this book, four dimensions are used to distinguish between the media environments in western countries around the year 2000: press market development, parallelism between parties and media outlets, state intervention in the realm of media, and levels of journalist professionalization. The authors conclude that in western Europe and North America three types of media systems coexisted: a polarized pluralist model (in southern Europe), a democratic corporatist model (in scandinavia and some western European countries), and a liberal model (Canada, USA, Ireland, and the UK). Within this framework, both Portugal and Spain are described as polarized pluralist media systems, given their weak press markets and low patterns of journalistic professionalization, as well as strong state intervention in the realm of media and parallelism between media outlets and political parties.
Resumo:
The time to process each of the W/B processing blocks of a median calculation method on a set of N W-bit integers is improved here by a factor of three compared with literature. The parallelism uncovered in blocks containing B-bit slices is exploited by independent accumulative parallel counters so that the median is calculated faster than any known previous method for any N, W values. The improvements to the method are discussed in the context of calculating the median for a moving set of N integers, for which a pipelined architecture is developed. An extra benefit of a smaller area for the architecture is also reported.
Resumo:
Dissertação apresentada para obtenção do grau de Mestre em Educação Matemática na Educação Pré-Escolar e nos 1.º e 2.º Ciclos do Ensino Básico
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
High-level parallel languages offer a simple way for application programmers to specify parallelism in a form that easily scales with problem size, leaving the scheduling of the tasks onto processors to be performed at runtime. Therefore, if the underlying system cannot efficiently execute those applications on the available cores, the benefits will be lost. In this paper, we consider how to schedule highly heterogenous parallel applications that require real-time performance guarantees on multicore processors. The paper proposes a novel scheduling approach that combines the global Earliest Deadline First (EDF) scheduler with a priority-aware work-stealing load balancing scheme, which enables parallel realtime tasks to be executed on more than one processor at a given time instant. Experimental results demonstrate the better scalability and lower scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.
Resumo:
Over the last three decades, computer architects have been able to achieve an increase in performance for single processors by, e.g., increasing clock speed, introducing cache memories and using instruction level parallelism. However, because of power consumption and heat dissipation constraints, this trend is going to cease. In recent times, hardware engineers have instead moved to new chip architectures with multiple processor cores on a single chip. With multi-core processors, applications can complete more total work than with one core alone. To take advantage of multi-core processors, parallel programming models are proposed as promising solutions for more effectively using multi-core processors. This paper discusses some of the existent models and frameworks for parallel programming, leading to outline a draft parallel programming model for Ada.
Resumo:
Composition is a practice of key importance in software engineering. When real-time applications are composed, it is necessary that their timing properties (such as meeting the deadlines) are guaranteed. The composition is performed by establishing an interface between the application and the physical platform. Such an interface typically contains information about the amount of computing capacity needed by the application. For multiprocessor platforms, the interface should also present information about the degree of parallelism. Several interface proposals have recently been put forward in various research works. However, those interfaces are either too complex to be handled or too pessimistic. In this paper we propose the generalized multiprocessor periodic resource model (GMPR) that is strictly superior to the MPR model without requiring a too detailed description. We then derive a method to compute the interface from the application specification. This method has been implemented in Matlab routines that are publicly available.
Resumo:
As plataformas com múltiplos núcleos tornaram a programação paralela/concorrente num tópico de interesse geral. Diversos modelos de programação têm vindo a ser propostos, facilitando aos programadores a identificação de regiões de código potencialmente paralelizáveis, deixando ao sistema operativo a tarefa de as escalonar dinamicamente em tempo de execução, explorando o maior grau possível de paralelismo. O Java não foge a esta tendência, disponibilizando ao programador um número crescente de bibliotecas de mecanismos de sincronização e paralelização de código. Neste contexto, esta tese apresenta e discute um conjunto de resultados obtidos através de testes intensivos à eficiência de algoritmos de ordenação implementados com recurso aos mecanismos de concorrência da API do Java 8 (Threads, Threadpools, ExecutorService, CountdownLach, ExecutorCompletionService e ForkJoinPools) em sistemas com um número de núcleos variável. Para cada um dos mecanismos, são apresentadas conclusões sobre o seu funcionamento e discutidos os cenários em que o seu uso pode ser rentabilizado de modo a serem obtidos melhores tempos de execução.
Resumo:
ABSTRACT - The Portuguese National Health Service (SNS), a universal, centralized and public owned health care system, exhibits an extraordinary record of equalization in the access to health care and health gains in the late thirty years. However, the most recent history of the Portuguese health reform is pervaded by the influence of decentralization and privatization. Decentralization has been present in the system design since the 1976 Constitution, at least in theory. Private ownership of health care suppliers and out-ofpocket expenditures, on the financing side, both have a long tradition of relevance in the NHS mix of services. The initial aim of this study was to demonstrate expected parallelism between health reforms and public administration reforms, where a common pattern of joint decentralization and privatization was observed in many countries. Observers would be tempted to consider these two movements as common signs of new public management (NPM) developments. They have common objectives, are established around the core concepts of gains in effectiveness, efficiency, equity and quality of public services, through improved accountability. However, in practice, in Portugal, each movement was developed in a totally separated way. Besides those rooted in the NPM theory, there are few visible signs of association between decentralization and privatization. Decentralization, in the Portuguese SNS, was never intended to be followed by a privatization movement; it was seen merely as a public administration tool. Private management of health services, as stated in the most recent SNS legislation, was never intended to have decentralization as a condition or as a consequence. Paradoxically, in the Portuguese context, it has led invariably to centralized control. While presented as separate instruments for a common purpose, the association between decentralization and privatization still lacks a convincing demonstration. Many common health care management stereotypes remain to be checked out if we want to look for eventual associations between these two organizational tools.
Resumo:
In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniques for maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables, and an approach for performing parallel addition of N input symbols.
Resumo:
In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniquesfor maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables,and an approach for performing parallel addition of N input symbols.
Resumo:
Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.
Resumo:
L’augmentation du nombre d’usagers de l’Internet a entraîné une croissance exponentielle dans les tables de routage. Cette taille prévoit l’atteinte d’un million de préfixes dans les prochaines années. De même, les routeurs au cœur de l’Internet peuvent facilement atteindre plusieurs centaines de connexions BGP simultanées avec des routeurs voisins. Dans une architecture classique des routeurs, le protocole BGP s’exécute comme une entité unique au sein du routeur. Cette architecture comporte deux inconvénients majeurs : l’extensibilité (scalabilité) et la fiabilité. D’un côté, la scalabilité de BGP est mesurable en termes de nombre de connexions et aussi par la taille maximale de la table de routage que l’interface de contrôle puisse supporter. De l’autre côté, la fiabilité est un sujet critique dans les routeurs au cœur de l’Internet. Si l’instance BGP s’arrête, toutes les connexions seront perdues et le nouvel état de la table de routage sera propagé tout au long de l’Internet dans un délai de convergence non trivial. Malgré la haute fiabilité des routeurs au cœur de l’Internet, leur résilience aux pannes est augmentée considérablement et celle-ci est implantée dans la majorité des cas via une redondance passive qui peut limiter la scalabilité du routeur. Dans cette thèse, on traite les deux inconvénients en proposant une nouvelle approche distribuée de BGP pour augmenter sa scalabilité ainsi que sa fiabilité sans changer la sémantique du protocole. L’architecture distribuée de BGP proposée dans la première contribution est faite pour satisfaire les deux contraintes : scalabilité et fiabilité. Ceci est accompli en exploitant adéquatement le parallélisme et la distribution des modules de BGP sur plusieurs cartes de contrôle. Dans cette contribution, les fonctionnalités de BGP sont divisées selon le paradigme « maître-esclave » et le RIB (Routing Information Base) est dupliqué sur plusieurs cartes de contrôle. Dans la deuxième contribution, on traite la tolérance aux pannes dans l’architecture élaborée dans la première contribution en proposant un mécanisme qui augmente la fiabilité. De plus, nous prouvons analytiquement dans cette contribution qu’en adoptant une telle architecture distribuée, la disponibilité de BGP sera augmentée considérablement versus une architecture monolithique. Dans la troisième contribution, on propose une méthode de partitionnement de la table de routage que nous avons appelé DRTP pour diviser la table de BGP sur plusieurs cartes de contrôle. Cette contribution vise à augmenter la scalabilité de la table de routage et la parallélisation de l’algorithme de recherche (Best Match Prefix) en partitionnant la table de routage sur plusieurs nœuds physiquement distribués.
Resumo:
Avec la complexité croissante des systèmes sur puce, de nouveaux défis ne cessent d’émerger dans la conception de ces systèmes en matière de vérification formelle et de synthèse de haut niveau. Plusieurs travaux autour de SystemC, considéré comme la norme pour la conception au niveau système, sont en cours afin de relever ces nouveaux défis. Cependant, à cause du modèle de concurrence complexe de SystemC, relever ces défis reste toujours une tâche difficile. Ainsi, nous pensons qu’il est primordial de partir sur de meilleures bases en utilisant un modèle de concurrence plus efficace. Par conséquent, dans cette thèse, nous étudions une méthodologie de conception qui offre une meilleure abstraction pour modéliser des composants parallèles en se basant sur le concept de transaction. Nous montrons comment, grâce au raisonnement simple que procure le concept de transaction, il devient plus facile d’appliquer la vérification formelle, le raffinement incrémental et la synthèse de haut niveau. Dans le but d’évaluer l’efficacité de cette méthodologie, nous avons fixé l’objectif d’optimiser la vitesse de simulation d’un modèle transactionnel en profitant d’une machine multicoeur. Nous présentons ainsi l’environnement de modélisation et de simulation parallèle que nous avons développé. Nous étudions différentes stratégies d’ordonnancement en matière de parallélisme et de surcoût de synchronisation. Une expérimentation faite sur un modèle du transmetteur Wi-Fi 802.11a a permis d’atteindre une accélération d’environ 1.8 en utilisant deux threads. Avec 8 threads, bien que la charge de travail des différentes transactions n’était pas importante, nous avons pu atteindre une accélération d’environ 4.6, ce qui est un résultat très prometteur.
Resumo:
Quoique très difficile à résoudre, le problème de satisfiabilité Booléenne (SAT) est fréquemment utilisé lors de la modélisation d’applications industrielles. À cet effet, les deux dernières décennies ont vu une progression fulgurante des outils conçus pour trouver des solutions à ce problème NP-complet. Deux grandes avenues générales ont été explorées afin de produire ces outils, notamment l’approche logicielle et matérielle. Afin de raffiner et améliorer ces solveurs, de nombreuses techniques et heuristiques ont été proposées par la communauté de recherche. Le but final de ces outils a été de résoudre des problèmes de taille industrielle, ce qui a été plus ou moins accompli par les solveurs de nature logicielle. Initialement, le but de l’utilisation du matériel reconfigurable a été de produire des solveurs pouvant trouver des solutions plus rapidement que leurs homologues logiciels. Cependant, le niveau de sophistication de ces derniers a augmenté de telle manière qu’ils restent le meilleur choix pour résoudre SAT. Toutefois, les solveurs modernes logiciels n’arrivent toujours pas a trouver des solutions de manière efficace à certaines instances SAT. Le but principal de ce mémoire est d’explorer la résolution du problème SAT dans le contexte du matériel reconfigurable en vue de caractériser les ingrédients nécessaires d’un solveur SAT efficace qui puise sa puissance de calcul dans le parallélisme conféré par une plateforme FPGA. Le prototype parallèle implémenté dans ce travail est capable de se mesurer, en termes de vitesse d’exécution à d’autres solveurs (matériels et logiciels), et ce sans utiliser aucune heuristique. Nous montrons donc que notre approche matérielle présente une option prometteuse vers la résolution d’instances industrielles larges qui sont difficilement abordées par une approche logicielle.