984 resultados para Parallel application


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of assigning cells to switches in a cellular mobile network is an NP-hard optimization problem. So, real size mobile networks could not be solved by using exact methods. The alternative is the use of the heuristic methods, because they allow us to find a good quality solution in a quite satisfactory computational time. This paper proposes a Beam Search method to solve the problem of assignment cell in cellular mobile networks. Some modifications in this algorithm are also presented, which allows its parallel application. Computational results obtained from several tests confirm the effectiveness of this approach to provide good solutions for medium- and large-sized cellular mobile network.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work presents a study about the use of standards and directions on parallel programming in distributed systems, using the MPI standard and PETSc toolkit, performing an analysis of their performances over certain mathematic operations involving matrices. The concepts are used to develop applications to solve problems involving Principal Components Analysis (PCA), which are executed in a Beowulf cluster. The results are compared to the ones of an analogous application with sequencial execution, and then it is analized if there was any performance boost on the parallel application

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Breakthrough advances in microprocessor technology and efficient power management have altered the course of development of processors with the emergence of multi-core processor technology, in order to bring higher level of processing. The utilization of many-core technology has boosted computing power provided by cluster of workstations or SMPs, providing large computational power at an affordable cost using solely commodity components. Different implementations of message-passing libraries and system softwares (including Operating Systems) are installed in such cluster and multi-cluster computing systems. In order to guarantee correct execution of message-passing parallel applications in a computing environment other than that originally the parallel application was developed, review of the application code is needed. In this paper, a hybrid communication interfacing strategy is proposed, to execute a parallel application in a group of computing nodes belonging to different clusters or multi-clusters (computing systems may be running different operating systems and MPI implementations), interconnected with public or private IP addresses, and responding interchangeably to user execution requests. Experimental results demonstrate the feasibility of this proposed strategy and its effectiveness, through the execution of benchmarking parallel applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In questa tesi sono stati simulati i modelli circuitali relativi alle tecniche di estrazione SECE, Q-SECE e SSHI. Sono state graficate e analizzate le caratteristiche di trasferimento di potenza. Tramite simulazioni LTspice, è stata calcolata l'energia estratta con tecnica SECE e Q-SECE ed è stato ricavato un miglioramento delle prestazioni di energia di +30% con la tecnica Q-SECE. Un'analisi simile è stata fatta per il calcolo dell'energia in uscita anche per il modello SSHI-parallel.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Debris flows are highly complex natural phenomena. The assessment of these processes requires the parallel application of various approaches and methods. However, many uncertainties remain even if proven procedures are available. Uncertainties are highest for magnitude-frequency relations and for the behaviour of the process.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La tradición ha reunido a Plotino y a Nicolás de Cusa en una misma corriente de pensamiento, al primero en tanto punto de partida y al segundo, acaso, como culminación. Más allá de las etiquetas historiográficas, intentaré establecer cuáles son aquellos aspectos del pensamiento de Plotino que persisten en el de Nicolás de Cusa a punto tal que permiten legítimamente entender en qué sentido ambos filósofos pueden ser denominados ?neoplatónicos?. Ciertamente, Plotino y Nicolás de Cusa difieren en núcleos doctrinales que pueden considerarse lo suficientemente relevantes como para negar la posibilidad de pensarlos miembros de una misma corriente y, a las diferencias entre sistemas, se agrega el hecho de que nada indica que Nicolás haya tenido contacto con la obra plotiniana. Sin embargo, ambos han reflexionado acerca de problemas similares y han brindado, cada uno a su modo y en su propia porción de tiempo, propuestas, en algunos casos, paralelas

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Esta tesis establece los fundamentos teóricos y diseña una colección abierta de clases C++ denominada VBF (Vector Boolean Functions) para analizar funciones booleanas vectoriales (funciones que asocian un vector booleano a otro vector booleano) desde una perspectiva criptográfica. Esta nueva implementación emplea la librería NTL de Victor Shoup, incorporando nuevos módulos que complementan a las funciones de NTL, adecuándolas para el análisis criptográfico. La clase fundamental que representa una función booleana vectorial se puede inicializar de manera muy flexible mediante diferentes estructuras de datas tales como la Tabla de verdad, la Representación de traza y la Forma algebraica normal entre otras. De esta manera VBF permite evaluar los criterios criptográficos más relevantes de los algoritmos de cifra en bloque y de stream, así como funciones hash: por ejemplo, proporciona la no-linealidad, la distancia lineal, el grado algebraico, las estructuras lineales, la distribución de frecuencias de los valores absolutos del espectro Walsh o del espectro de autocorrelación, entre otros criterios. Adicionalmente, VBF puede llevar a cabo operaciones entre funciones booleanas vectoriales tales como la comprobación de igualdad, la composición, la inversión, la suma, la suma directa, el bricklayering (aplicación paralela de funciones booleanas vectoriales como la empleada en el algoritmo de cifra Rijndael), y la adición de funciones coordenada. La tesis también muestra el empleo de la librería VBF en dos aplicaciones prácticas. Por un lado, se han analizado las características más relevantes de los sistemas de cifra en bloque. Por otro lado, combinando VBF con algoritmos de optimización, se han diseñado funciones booleanas cuyas propiedades criptográficas son las mejores conocidas hasta la fecha. ABSTRACT This thesis develops the theoretical foundations and designs an open collection of C++ classes, called VBF, designed for analyzing vector Boolean functions (functions that map a Boolean vector to another Boolean vector) from a cryptographic perspective. This new implementation uses the NTL library from Victor Shoup, adding new modules which complement the existing ones making VBF better suited for cryptography. The fundamental class representing a vector Boolean function can be initialized in a flexible way via several alternative types of data structures such as Truth Table, Trace Representation, Algebraic Normal Form (ANF) among others. This way, VBF allows the evaluation of the most relevant cryptographic criteria for block and stream ciphers as well as for hash functions: for instance, it provides the nonlinearity, the linearity distance, the algebraic degree, the linear structures, the frequency distribution of the absolute values of the Walsh Spectrum or the Autocorrelation Spectrum, among others. In addition, VBF can perform operations such as equality testing, composition, inversion, sum, direct sum, bricklayering (parallel application of vector Boolean functions as employed in Rijndael cipher), and adding coordinate functions of two vector Boolean functions. This thesis also illustrates the use of VBF in two practical applications. On the one hand, the most relevant properties of the existing block ciphers have been analysed. On the other hand, by combining VBF with optimization algorithms, new Boolean functions have been designed which have the best known cryptographic properties up-to-date.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A computação paralela permite uma série de vantagens para a execução de aplicações de grande porte, sendo que o uso efetivo dos recursos computacionais paralelos é um aspecto relevante da computação de alto desempenho. Este trabalho apresenta uma metodologia que provê a execução, de forma automatizada, de aplicações paralelas baseadas no modelo BSP com tarefas heterogêneas. É considerado no modelo adotado, que o tempo de computação de cada tarefa secundária não possui uma alta variância entre uma iteração e outra. A metodologia é denominada de ASE e é composta por três etapas: Aquisição (Acquisition), Escalonamento (Scheduling) e Execução (Execution). Na etapa de Aquisição, os tempos de processamento das tarefas são obtidos; na etapa de Escalonamento a metodologia busca encontrar a distribuição de tarefas que maximize a velocidade de execução da aplicação paralela, mas minimizando o uso de recursos, por meio de um algoritmo desenvolvido neste trabalho; e por fim a etapa de Execução executa a aplicação paralela com a distribuição definida na etapa anterior. Ferramentas que são aplicadas na metodologia foram implementadas. Um conjunto de testes aplicando a metodologia foi realizado e os resultados apresentados mostram que os objetivos da proposta foram alcançados.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Transition P Systems are a parallel and distributed computational model based on the notion of the cellular membrane structure. Each membrane determines a region that encloses a multiset of objects and evolution rules. Transition P Systems evolve through transitions between two consecutive configurations that are determined by the membrane structure and multisets present inside membranes. Moreover, transitions between two consecutive configurations are provided by an exhaustive non-deterministic and parallel application of evolution rules. But, to establish the rules to be applied, it is required the previous calculation of useful, applicable and active rules. Hence, computation of useful evolution rules is critical for the whole evolution process efficiency, because it is performed in parallel inside each membrane in every evolution step. This work defines usefulness states through an exhaustive analysis of the P system for every membrane and for every possible configuration of the membrane structure during the computation. Moreover, this analysis can be done in a static way; therefore membranes only have to check their usefulness states to obtain their set of useful rules during execution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Transition P Systems are a parallel and distributed computational model based on the notion of the cellular membrane structure. Each membrane determines a region that encloses a multiset of objects and evolution rules. Transition P Systems evolve through transitions between two consecutive configurations that are determined by the membrane structure and multisets present inside membranes. Moreover, transitions between two consecutive configurations are provided by an exhaustive non-deterministic and parallel application of active evolution rules subset inside each membrane of the P system. But, to establish the active evolution rules subset, it is required the previous calculation of useful and applicable rules. Hence, computation of applicable evolution rules subset is critical for the whole evolution process efficiency, because it is performed in parallel inside each membrane in every evolution step. The work presented here shows advantages of incorporating decision trees in the evolution rules applicability algorithm. In order to it, necessary formalizations will be presented to consider this as a classification problem, the method to obtain the necessary decision tree automatically generated and the new algorithm for applicability based on it.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop. It transparently targets (by using OpenCL) combinations of CPU cores and GPUs, and it makes it possible to simplify the deployment of a single stencil computation kernel on different GPUs. The paper discusses the implementation of Loop-of-stencil-reduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a highly effective parallel filter for visual data restoration to assess performance. Thanks to the high-level design of the Loop-of-stencil-reduce, it was possible to run the filter seamlessly on a multicore machine, on multi-GPUs, and on both.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis presents a novel design paradigm, called Virtual Runtime Application Partitions (VRAP), to judiciously utilize the on-chip resources. As the dark silicon era approaches, where the power considerations will allow only a fraction chip to be powered on, judicious resource management will become a key consideration in future designs. Most of the works on resource management treat only the physical components (i.e. computation, communication, and memory blocks) as resources and manipulate the component to application mapping to optimize various parameters (e.g. energy efficiency). To further enhance the optimization potential, in addition to the physical resources we propose to manipulate abstract resources (i.e. voltage/frequency operating point, the fault-tolerance strength, the degree of parallelism, and the configuration architecture). The proposed framework (i.e. VRAP) encapsulates methods, algorithms, and hardware blocks to provide each application with the abstract resources tailored to its needs. To test the efficacy of this concept, we have developed three distinct self adaptive environments: (i) Private Operating Environment (POE), (ii) Private Reliability Environment (PRE), and (iii) Private Configuration Environment (PCE) that collectively ensure that each application meets its deadlines using minimal platform resources. In this work several novel architectural enhancements, algorithms and policies are presented to realize the virtual runtime application partitions efficiently. Considering the future design trends, we have chosen Coarse Grained Reconfigurable Architectures (CGRAs) and Network on Chips (NoCs) to test the feasibility of our approach. Specifically, we have chosen Dynamically Reconfigurable Resource Array (DRRA) and McNoC as the representative CGRA and NoC platforms. The proposed techniques are compared and evaluated using a variety of quantitative experiments. Synthesis and simulation results demonstrate VRAP significantly enhances the energy and power efficiency compared to state of the art.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

As consumers demand more functionality) from their electronic devices and manufacturers supply the demand then electrical power and clock requirements tend to increase, however reassessing system architecture can fortunately lead to suitable counter reductions. To maintain low clock rates and therefore reduce electrical power, this paper presents a parallel convolutional coder for the transmit side in many wireless consumer devices. The coder accepts a parallel data input and directly computes punctured convolutional codes without the need for a separate puncturing operation while the coded bits are available at the output of the coder in a parallel fashion. Also as the computation is in parallel then the coder can be clocked at 7 times slower than the conventional shift-register based convolutional coder (using DVB 7/8 rate). The presented coder is directly relevant to the design of modern low-power consumer devices

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In vielen Bereichen der industriellen Fertigung, wie zum Beispiel in der Automobilindustrie, wer- den digitale Versuchsmodelle (sog. digital mock-ups) eingesetzt, um die Entwicklung komplexer Maschinen m ̈oglichst gut durch Computersysteme unterstu ̈tzen zu k ̈onnen. Hierbei spielen Be- wegungsplanungsalgorithmen eine wichtige Rolle, um zu gew ̈ahrleisten, dass diese digitalen Pro- totypen auch kollisionsfrei zusammengesetzt werden k ̈onnen. In den letzten Jahrzehnten haben sich hier sampling-basierte Verfahren besonders bew ̈ahrt. Diese erzeugen eine große Anzahl von zuf ̈alligen Lagen fu ̈r das ein-/auszubauende Objekt und verwenden einen Kollisionserken- nungsmechanismus, um die einzelnen Lagen auf Gu ̈ltigkeit zu u ̈berpru ̈fen. Daher spielt die Kollisionserkennung eine wesentliche Rolle beim Design effizienter Bewegungsplanungsalgorith- men. Eine Schwierigkeit fu ̈r diese Klasse von Planern stellen sogenannte “narrow passages” dar, schmale Passagen also, die immer dort auftreten, wo die Bewegungsfreiheit der zu planenden Objekte stark eingeschr ̈ankt ist. An solchen Stellen kann es schwierig sein, eine ausreichende Anzahl von kollisionsfreien Samples zu finden. Es ist dann m ̈oglicherweise n ̈otig, ausgeklu ̈geltere Techniken einzusetzen, um eine gute Performance der Algorithmen zu erreichen.rnDie vorliegende Arbeit gliedert sich in zwei Teile: Im ersten Teil untersuchen wir parallele Kollisionserkennungsalgorithmen. Da wir auf eine Anwendung bei sampling-basierten Bewe- gungsplanern abzielen, w ̈ahlen wir hier eine Problemstellung, bei der wir stets die selben zwei Objekte, aber in einer großen Anzahl von unterschiedlichen Lagen auf Kollision testen. Wir im- plementieren und vergleichen verschiedene Verfahren, die auf Hu ̈llk ̈operhierarchien (BVHs) und hierarchische Grids als Beschleunigungsstrukturen zuru ̈ckgreifen. Alle beschriebenen Verfahren wurden auf mehreren CPU-Kernen parallelisiert. Daru ̈ber hinaus vergleichen wir verschiedene CUDA Kernels zur Durchfu ̈hrung BVH-basierter Kollisionstests auf der GPU. Neben einer un- terschiedlichen Verteilung der Arbeit auf die parallelen GPU Threads untersuchen wir hier die Auswirkung verschiedener Speicherzugriffsmuster auf die Performance der resultierenden Algo- rithmen. Weiter stellen wir eine Reihe von approximativen Kollisionstests vor, die auf den beschriebenen Verfahren basieren. Wenn eine geringere Genauigkeit der Tests tolerierbar ist, kann so eine weitere Verbesserung der Performance erzielt werden.rnIm zweiten Teil der Arbeit beschreiben wir einen von uns entworfenen parallelen, sampling- basierten Bewegungsplaner zur Behandlung hochkomplexer Probleme mit mehreren “narrow passages”. Das Verfahren arbeitet in zwei Phasen. Die grundlegende Idee ist hierbei, in der er- sten Planungsphase konzeptionell kleinere Fehler zuzulassen, um die Planungseffizienz zu erh ̈ohen und den resultierenden Pfad dann in einer zweiten Phase zu reparieren. Der hierzu in Phase I eingesetzte Planer basiert auf sogenannten Expansive Space Trees. Zus ̈atzlich haben wir den Planer mit einer Freidru ̈ckoperation ausgestattet, die es erlaubt, kleinere Kollisionen aufzul ̈osen und so die Effizienz in Bereichen mit eingeschr ̈ankter Bewegungsfreiheit zu erh ̈ohen. Optional erlaubt unsere Implementierung den Einsatz von approximativen Kollisionstests. Dies setzt die Genauigkeit der ersten Planungsphase weiter herab, fu ̈hrt aber auch zu einer weiteren Perfor- mancesteigerung. Die aus Phase I resultierenden Bewegungspfade sind dann unter Umst ̈anden nicht komplett kollisionsfrei. Um diese Pfade zu reparieren, haben wir einen neuartigen Pla- nungsalgorithmus entworfen, der lokal beschr ̈ankt auf eine kleine Umgebung um den bestehenden Pfad einen neuen, kollisionsfreien Bewegungspfad plant.rnWir haben den beschriebenen Algorithmus mit einer Klasse von neuen, schwierigen Metall- Puzzlen getestet, die zum Teil mehrere “narrow passages” aufweisen. Unseres Wissens nach ist eine Sammlung vergleichbar komplexer Benchmarks nicht ̈offentlich zug ̈anglich und wir fan- den auch keine Beschreibung von vergleichbar komplexen Benchmarks in der Motion-Planning Literatur.