988 resultados para multi-core


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Estudi elaborat a partir d’una estada a la Universität Karlsruhe entre gener i maig del 2007. Les biblioteques d’estructures de dades defineixen interfícies i implementen algorismes i estructures de dades fonamentals. Un exemple n’és la Satandard Template Library (STL ), que forma part del llenguatge de programació C++. En el marc d’una tesi, s’està treballant per obtenir implementacions més eficients i/o versàtils d’alguns components de la STL. Per a fer-ho s’utilitzen tècniques de la enginyeria d’algorismes. En particular, s’integra el coneixement de la comunitat algorítmica i es té en consideració la tecnologia existent. L’acció durant l’estada s’ha emmarcat en el desenvolupament la Multi Core STL (MCSTL ). La MCSTL és una implementació paral•lela de la STL per a màquines multi-core. Les màquines multi-core són actualment l’únic tipus de màquina disponible al mercat. Per tant, tot i que el paral•lelisme obtingut no sigui òptim, és preferible a tenir els processadors esperant, ja que , la tendència és que el nombre de processadors per computador augmenti.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Metabolic problems lead to numerous failures during clinical trials, and much effort is now devoted to developing in silico models predicting metabolic stability and metabolites. Such models are well known for cytochromes P450 and some transferases, whereas less has been done to predict the activity of human hydrolases. The present study was undertaken to develop a computational approach able to predict the hydrolysis of novel esters by human carboxylesterase hCES2. The study involved first a homology modeling of the hCES2 protein based on the model of hCES1 since the two proteins share a high degree of homology (congruent with 73%). A set of 40 known substrates of hCES2 was taken from the literature; the ligands were docked in both their neutral and ionized forms using GriDock, a parallel tool based on the AutoDock4.0 engine which can perform efficient and easy virtual screening analyses of large molecular databases exploiting multi-core architectures. Useful statistical models (e.g., r (2) = 0.91 for substrates in their unprotonated state) were calculated by correlating experimental pK(m) values with distance between the carbon atom of the substrate's ester group and the hydroxy function of Ser228. Additional parameters in the equations accounted for hydrophobic and electrostatic interactions between substrates and contributing residues. The negatively charged residues in the hCES2 cavity explained the preference of the enzyme for neutral substrates and, more generally, suggested that ligands which interact too strongly by ionic bonds (e.g., ACE inhibitors) cannot be good CES2 substrates because they are trapped in the cavity in unproductive modes and behave as inhibitors. The effects of protonation on substrate recognition and the contrasting behavior of substrates and products were finally investigated by MD simulations of some CES2 complexes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

How many times a given process p preempts, either voluntarily or involuntarily, is an important threat to computer's processes throughput. Whenever running cpu-bound processes on a multi-core system without an actual system grid engine as commonly found on Grid Clusters, their performance and stability are directly related to their accurate implementation and the system reliability which is, to an extend, an important caveat most of the times so difficult to detect. Context Switching is time-consuming. Thus, if we could develop a tool capable of detecting it and gather data from every single performed Context Switch, we would beable to study this data and present some results that should pin-point at whatever their main cause could be.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Critical real-time ebedded (CRTE) Systems require safe and tight worst-case execution time (WCET) estimations to provide required safety levels and keep costs low. However, CRTE Systems require increasing performance to satisfy performance needs of existing and new features. Such performance can be only achieved by means of more agressive hardware architectures, which are much harder to analyze from a WCET perspective. The main features considered include cache memòries and multi-core processors.Thus, althoug such features provide higher performance, corrent WCET analysis methods are unable to provide tight WCET estimations. In fact, WCET estimations become worse than for simple rand less powerful hardware. The main reason is the fact that hardware behavior is deterministic but unknown and, therefore, the worst-case behavior must be assumed most of the time, leading to large WCET estimations. The purpose of this project is developing new hardware designs together with WCET analysis tools able to provide tight and safe WCET estimations. In order to do so, those pieces of hardware whose behavior is not easily analyzable due to lack of accurate information during WCET analysis will be enhanced to produce a probabilistically analyzable behavior. Thus, even if the worst-case behavior cannot be removed, its probabilty can be bounded, and hence, a safe and tight WCET can be provided for a particular safety level in line with the safety levels of the remaining components of the system. During the first year the project we have developed molt of the evaluation infraestructure as well as the techniques hardware techniques to analyze cache memories. During the second year those techniques have been evaluated, and new purely-softwar techniques have been developed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Avec la complexité croissante des systèmes sur puce, de nouveaux défis ne cessent d’émerger dans la conception de ces systèmes en matière de vérification formelle et de synthèse de haut niveau. Plusieurs travaux autour de SystemC, considéré comme la norme pour la conception au niveau système, sont en cours afin de relever ces nouveaux défis. Cependant, à cause du modèle de concurrence complexe de SystemC, relever ces défis reste toujours une tâche difficile. Ainsi, nous pensons qu’il est primordial de partir sur de meilleures bases en utilisant un modèle de concurrence plus efficace. Par conséquent, dans cette thèse, nous étudions une méthodologie de conception qui offre une meilleure abstraction pour modéliser des composants parallèles en se basant sur le concept de transaction. Nous montrons comment, grâce au raisonnement simple que procure le concept de transaction, il devient plus facile d’appliquer la vérification formelle, le raffinement incrémental et la synthèse de haut niveau. Dans le but d’évaluer l’efficacité de cette méthodologie, nous avons fixé l’objectif d’optimiser la vitesse de simulation d’un modèle transactionnel en profitant d’une machine multicoeur. Nous présentons ainsi l’environnement de modélisation et de simulation parallèle que nous avons développé. Nous étudions différentes stratégies d’ordonnancement en matière de parallélisme et de surcoût de synchronisation. Une expérimentation faite sur un modèle du transmetteur Wi-Fi 802.11a a permis d’atteindre une accélération d’environ 1.8 en utilisant deux threads. Avec 8 threads, bien que la charge de travail des différentes transactions n’était pas importante, nous avons pu atteindre une accélération d’environ 4.6, ce qui est un résultat très prometteur.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Réalisé en majeure partie sous la tutelle de feu le Professeur Paul Arminjon. Après sa disparition, le Docteur Aziz Madrane a pris la relève de la direction de mes travaux.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

clRNG et clProbdist sont deux interfaces de programmation (APIs) que nous avons développées pour la génération de nombres aléatoires uniformes et non uniformes sur des dispositifs de calculs parallèles en utilisant l’environnement OpenCL. La première interface permet de créer au niveau d’un ordinateur central (hôte) des objets de type stream considérés comme des générateurs virtuels parallèles qui peuvent être utilisés aussi bien sur l’hôte que sur les dispositifs parallèles (unités de traitement graphique, CPU multinoyaux, etc.) pour la génération de séquences de nombres aléatoires. La seconde interface permet aussi de générer au niveau de ces unités des variables aléatoires selon différentes lois de probabilité continues et discrètes. Dans ce mémoire, nous allons rappeler des notions de base sur les générateurs de nombres aléatoires, décrire les systèmes hétérogènes ainsi que les techniques de génération parallèle de nombres aléatoires. Nous présenterons aussi les différents modèles composant l’architecture de l’environnement OpenCL et détaillerons les structures des APIs développées. Nous distinguons pour clRNG les fonctions qui permettent la création des streams, les fonctions qui génèrent les variables aléatoires uniformes ainsi que celles qui manipulent les états des streams. clProbDist contient les fonctions de génération de variables aléatoires non uniformes selon la technique d’inversion ainsi que les fonctions qui permettent de retourner différentes statistiques des lois de distribution implémentées. Nous évaluerons ces interfaces de programmation avec deux simulations qui implémentent un exemple simplifié d’un modèle d’inventaire et un exemple d’une option financière. Enfin, nous fournirons les résultats d’expérimentation sur les performances des générateurs implémentés.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The design space of emerging heterogenous multi-core architectures with re-configurability element makes it feasible to design mixed fine-grained and coarse-grained parallel architectures. This paper presents a hierarchical composite array design which extends the curret design space of regular array design by combining a sequence of transformations. This technique is applied to derive a new design of a pipelined parallel regular array with different dataflow between phases of computation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Can autonomic computing concepts be applied to traditional multi-core systems found in high performance computing environments? In this paper, we propose a novel synergy between parallel computing and swarm robotics to offer a new computing paradigm, `Swarm-Array Computing' that can harness and apply autonomic computing for parallel computing systems. One approach among three proposed approaches in swarm-array computing based on landscapes of intelligent cores, in which the cores of a parallel computing system are abstracted to swarm agents, is investigated. A task gets executed and transferred seamlessly between cores in the proposed approach thereby achieving self-ware properties that characterize autonomic computing. FPGAs are considered as an experimental platform taking into account its application in space robotics. The feasibility of the proposed approach is validated on the SeSAm multi-agent simulator.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We describe infinitely scalable pipeline machines with perfect parallelism, in the sense that every instruction of an inline program is executed, on successive data, on every clock tick. Programs with shared data effectively execute in less than a clock tick. We show that pipeline machines are faster than single or multi-core, von Neumann machines for sufficiently many program runs of a sufficiently time consuming program. Our pipeline machines exploit the totality of transreal arithmetic and the known waiting time of statically compiled programs to deliver the interesting property that they need no hardware or software exception handling.