855 resultados para Parallel Computation
Resumo:
Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP's problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages for object classification is presented. Two main GP approaches for visual pattern classification, namely the block-classifiers and the pixel-classifiers, were studied. Results showed that the pixel-classifiers generally performed better. Using these results, a suitable language was selected for the real-time implementation. Synthetic video data was used in the experiments. The goal of the experiments was to evolve a unique classifier for each texture pattern that existed in the video. The experiments revealed that the system was capable of correctly tracking the textures in the video. The performance of the system was on-par with real-time requirements.
Resumo:
The KCube interconnection network was first introduced in 2010 in order to exploit the good characteristics of two well-known interconnection networks, the hypercube and the Kautz graph. KCube links up multiple processors in a communication network with high density for a fixed degree. Since the KCube network is newly proposed, much study is required to demonstrate its potential properties and algorithms that can be designed to solve parallel computation problems. In this thesis we introduce a new methodology to construct the KCube graph. Also, with regard to this new approach, we will prove its Hamiltonicity in the general KC(m; k). Moreover, we will find its connectivity followed by an optimal broadcasting scheme in which a source node containing a message is to communicate it with all other processors. In addition to KCube networks, we have studied a version of the routing problem in the traditional hypercube, investigating this problem: whether there exists a shortest path in a Qn between two nodes 0n and 1n, when the network is experiencing failed components. We first conditionally discuss this problem when there is a constraint on the number of faulty nodes, and subsequently introduce an algorithm to tackle the problem without restrictions on the number of nodes.
Resumo:
Le travail de modélisation a été réalisé à travers EGSnrc, un logiciel développé par le Conseil National de Recherche Canada.
Resumo:
La multiplication dans le corps de Galois à 2^m éléments (i.e. GF(2^m)) est une opérations très importante pour les applications de la théorie des correcteurs et de la cryptographie. Dans ce mémoire, nous nous intéressons aux réalisations parallèles de multiplicateurs dans GF(2^m) lorsque ce dernier est généré par des trinômes irréductibles. Notre point de départ est le multiplicateur de Montgomery qui calcule A(x)B(x)x^(-u) efficacement, étant donné A(x), B(x) in GF(2^m) pour u choisi judicieusement. Nous étudions ensuite l'algorithme diviser pour régner PCHS qui permet de partitionner les multiplicandes d'un produit dans GF(2^m) lorsque m est impair. Nous l'appliquons pour la partitionnement de A(x) et de B(x) dans la multiplication de Montgomery A(x)B(x)x^(-u) pour GF(2^m) même si m est pair. Basé sur cette nouvelle approche, nous construisons un multiplicateur dans GF(2^m) généré par des trinôme irréductibles. Une nouvelle astuce de réutilisation des résultats intermédiaires nous permet d'éliminer plusieurs portes XOR redondantes. Les complexités de temps (i.e. le délais) et d'espace (i.e. le nombre de portes logiques) du nouveau multiplicateur sont ensuite analysées: 1. Le nouveau multiplicateur demande environ 25% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito lorsque GF(2^m) est généré par des trinômes irréductible et m est suffisamment grand. Le nombre de portes du nouveau multiplicateur est presque identique à celui du multiplicateur de Karatsuba proposé par Elia. 2. Le délai de calcul du nouveau multiplicateur excède celui des meilleurs multiplicateurs d'au plus deux évaluations de portes XOR. 3. Nous determinons le délai et le nombre de portes logiques du nouveau multiplicateur sur les deux corps de Galois recommandés par le National Institute of Standards and Technology (NIST). Nous montrons que notre multiplicateurs contient 15% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito au coût d'un délai d'au plus une porte XOR supplémentaire. De plus, notre multiplicateur a un délai d'une porte XOR moindre que celui du multiplicateur d'Elia au coût d'une augmentation de moins de 1% du nombre total de portes logiques.
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data. It contains four classes corresponding to the four different types of compositional and positive geometry (including the Aitchison geometry). It provides means for computation, plotting and high-level multivariate statistical analysis in all four geometries. These geometries are treated in an fully analogous way, based on the principle of working in coordinates, and the object-oriented programming paradigm of R. In this way, called functions automatically select the most appropriate type of analysis as a function of the geometry. The graphical capabilities include ternary diagrams and tetrahedrons, various compositional plots (boxplots, barplots, piecharts) and extensive graphical tools for principal components. Afterwards, ortion and proportion lines, straight lines and ellipses in all geometries can be added to plots. The package is accompanied by a hands-on-introduction, documentation for every function, demos of the graphical capabilities and plenty of usage examples. It allows direct and parallel computation in all four vector spaces and provides the beginner with a copy-and-paste style of data analysis, while letting advanced users keep the functionality and customizability they demand of R, as well as all necessary tools to add own analysis routines. A complete example is included in the appendix
Resumo:
In computer graphics, global illumination algorithms take into account not only the light that comes directly from the sources, but also the light interreflections. This kind of algorithms produce very realistic images, but at a high computational cost, especially when dealing with complex environments. Parallel computation has been successfully applied to such algorithms in order to make it possible to compute highly-realistic images in a reasonable time. We introduce here a speculation-based parallel solution for a global illumination algorithm in the context of radiosity, in which we have taken advantage of the hierarchical nature of such an algorithm
Resumo:
Neste trabalho, o método FDTD em coordenadas gerais (LN-FDTD) foi implementado para a análise de estruturas de aterramento com geometrias coincidentes ou não com o sistema de coordenadas cartesiano. O método soluciona as equações de Maxwell no domínio do tempo, permitindo a obtenção de dados a respeito da resposta transitória e de regime estacionário de estruturas diversas de aterramento. Uma nova formulação para a técnica de truncagem UPML em coordenadas gerais, para meios condutivos, foi desenvolvida e implementada para viabilizar a análise dos problemas (LN-UPML). Uma nova metodologia baseada em duas redes neurais artificiais é apresentada para a deteccão de defeitos em malhas de terra. O software FDTD em coordenadas gerais foi testado e validado para vários casos. Uma interface gráfica para usuários, chamada LANE SAGS, foi desenvolvida para simplificar o uso e automatizar o processamento dos dados.
Resumo:
Desenvolvemos a modelagem numérica de dados sintéticos Marine Controlled Source Electromagnetic (MCSEM) usada na exploração de hidrocarbonetos para simples modelos tridimensionais usando computação paralela. Os modelos são constituidos de duas camadas estrati cadas: o mar e o sedimentos encaixantes de um delgado reservatório tridimensional, sobrepostas pelo semi-espaço correspondente ao ar. Neste Trabalho apresentamos uma abordagem tridimensional da técnica dos elementos nitos aplicada ao método MCSEM, usando a formulação da decomposição primária e secundária dos potenciais acoplados magnético e elétrico. Num pós-processamento, os campos eletromagnéticos são calculados a partir dos potenciais espalhados via diferenciação numérica. Exploramos o paralelismo dos dados MCSEM 3D em um levantamento multitransmissor, em que para cada posição do transmissor temos o mesmo processo de cálculos com dados diferentes. Para isso, usamos a biblioteca Message Passing Interface (MPI) e o modelo servidor cliente, onde o processador administrador envia os dados de entradas para os processadores clientes computar a modelagem. Os dados de entrada são formados pelos parâmetros da malha de elementos nitos, dos transmissores e do modelo geoelétrico do reservatório. Esse possui geometria prismática que representa lentes de reservatórios de hidrocarbonetos em águas profundas. Observamos que quando a largura e o comprimento horizontais desses reservatório têm a mesma ordem de grandeza, as resposta in-line são muito semelhantes e conseqüentemente o efeito tridimensional não é detectado. Por sua vez, quando a diferença nos tamanhos da largura e do comprimento do reservatório é signi cativa o efeito 3D é facilmente detectado em medidas in-line na maior dimensão horizontal do reservatório. Para medidas na menor dimensão esse efeito não é detectável, pois, nesse caso o modelo 3D se aproxima de um modelo bidimensional. O paralelismo dos dados é de rápida implementação e processamento. O tempo de execução para a modelagem multitransmissor em ambiente paralelo é equivalente ao tempo de processamento da modelagem para um único transmissor em uma máquina seqüêncial, com o acréscimo do tempo de latência na transmissão de dados entre os nós do cluster, o que justi ca o uso desta metodologia na modelagem e interpretação de dados MCSEM. Devido a reduzida memória (2 Gbytes) em cada processador do cluster do departamento de geofísica da UFPA, apenas modelos muito simples foram executados.
Resumo:
The modern GPUs are well suited for intensive computational tasks and massive parallel computation. Sparse matrix multiplication and linear triangular solver are the most important and heavily used kernels in scientific computation, and several challenges in developing a high performance kernel with the two modules is investigated. The main interest it to solve linear systems derived from the elliptic equations with triangular elements. The resulting linear system has a symmetric positive definite matrix. The sparse matrix is stored in the compressed sparse row (CSR) format. It is proposed a CUDA algorithm to execute the matrix vector multiplication using directly the CSR format. A dependence tree algorithm is used to determine which variables the linear triangular solver can determine in parallel. To increase the number of the parallel threads, a coloring graph algorithm is implemented to reorder the mesh numbering in a pre-processing phase. The proposed method is compared with parallel and serial available libraries. The results show that the proposed method improves the computation cost of the matrix vector multiplication. The pre-processing associated with the triangular solver needs to be executed just once in the proposed method. The conjugate gradient method was implemented and showed similar convergence rate for all the compared methods. The proposed method showed significant smaller execution time.
Resumo:
Ein System in einem metastabilen Zustand muss eine bestimmte Barriere in derrnfreien Energie überwinden um einen Tropfen der stabilen Phase zu formen.rnHerkömmliche Untersuchungen nehmen hierbei kugelförmige Tropfen an. Inrnanisotropen Systemen (wie z.B. Kristallen) ist diese Annahme aber nicht ange-rnbracht. Bei tiefen Temperaturen wirkt sich die Anisotropie des Systems starkrnauf die freie Energie ihrer Oberfläche aus. Diese Wirkung wird oberhalb derrnAufrauungstemperatur T R schwächer. Das Ising-Modell ist ein einfaches Mo-rndell, welches eine solche Anisotropie aufweist. Wir führen großangelegte Sim-rnulationen durch, um die Effekte, die mit einer endlichen Simulationsbox ein-rnhergehen, sowie statistische Ungenauigkeiten möglichst klein zu halten. DasrnAusmaß der Simulationen die benötigt werden um sinnvolle Ergebnisse zu pro-rnduzieren, erfordert die Entwicklung eines skalierbaren Simulationsprogrammsrnfür das Ising-Modell, welcher auf verschiedenen parallelen Architekturen (z.B.rnGrafikkarten) verwendet werden kann. Plattformunabhängigkeit wird durch ab-rnstrakte Schnittstellen erreicht, welche plattformspezifische Implementierungs-rndetails verstecken. Wir benutzen eine Systemgeometrie die es erlaubt eine Ober-rnfläche mit einem variablen Winkel zur Kristallebene zu untersuchen. Die Ober-rnfläche ist in Kontakt mit einer harten Wand, wobei der Kontaktwinkel Θ durchrnein Oberflächenfeld eingestellt werden kann. Wir leiten eine Differenzialglei-rnchung ab, welche das Verhalten der freien Energie der Oberfläche in einemrnanisotropen System beschreibt. Kombiniert mit thermodynamischer Integrationrnkann die Gleichung benutzt werden, um die anisotrope Oberflächenspannungrnüber einen großen Winkelbereich zu integrieren. Vergleiche mit früheren Mes-rnsungen in anderen Geometrien und anderen Methoden zeigen hohe Überein-rnstimung und Genauigkeit, welche vor allem durch die im Vergleich zu früherenrnMessungen wesentlich größeren Simulationsdomänen erreicht wird. Die Temper-rnaturabhängigkeit der Oberflächensteifheit κ wird oberhalb von T R durch diernKrümmung der freien Energie der Oberfläche für kleine Winkel gemessen. DiesernMessung lässt sich mit Simulationsergebnissen in der Literatur vergleichen undrnhat bessere Übereinstimmung mit theoretischen Voraussagen über das Skalen-rnverhalten von κ. Darüber hinaus entwickeln wir ein Tieftemperatur-Modell fürrndas Verhalten um Θ = 90 Grad weit unterhalb von T R. Der Winkel bleibt bis zu einemrnkritischen Feld H C quasi null; oberhalb des kritischen Feldes steigt der Winkelrnrapide an. H C wird mit der freien Energie einer Stufe in Verbindung gebracht,rnwas es ermöglicht, das kritische Verhalten dieser Größe zu analysieren. Die harternWand muss in die Analyse einbezogen werden. Durch den Vergleich freier En-rnergien bei geschickt gewählten Systemgrößen ist es möglich, den Beitrag derrnKontaktlinie zur freien Energie in Abhängigkeit von Θ zu messen. Diese Anal-rnyse wird bei verschiedenen Temperaturen durchgeführt. Im letzten Kapitel wirdrneine 2D Fluiddynamik Simulation für Grafikkarten parallelisiert, welche u. a.rnbenutzt werden kann um die Dynamik der Atmosphäre zu simulieren. Wir im-rnplementieren einen parallelen Evolution Galerkin Operator und erreichen
Resumo:
The Networks of Evolutionary Processors (NEPs) are computing mechanisms directly inspired from the behavior of cell populations more specifically the point mutations in DNA strands. These mechanisms are been used for solving NP-complete problems by means of a parallel computation postulation. This paper describes an implementation of the basic model of NEP using Web technologies and includes the possibility of designing some of the most common variants of it by means the use of the web page design which eases the configuration of a given problem. It is a system intended to be used in a multicore processor in order to benefit from the multi thread use.
Resumo:
LLas nuevas tecnologías orientadas a la nube, el internet de las cosas o las tendencias "as a service" se basan en el almacenamiento y procesamiento de datos en servidores remotos. Para garantizar la seguridad en la comunicación de dichos datos al servidor remoto, y en el manejo de los mismos en dicho servidor, se hace uso de diferentes esquemas criptográficos. Tradicionalmente, dichos sistemas criptográficos se centran en encriptar los datos mientras no sea necesario procesarlos (es decir, durante la comunicación y almacenamiento de los mismos). Sin embargo, una vez es necesario procesar dichos datos encriptados (en el servidor remoto), es necesario desencriptarlos, momento en el cual un intruso en dicho servidor podría a acceder a datos sensibles de usuarios del mismo. Es más, este enfoque tradicional necesita que el servidor sea capaz de desencriptar dichos datos, teniendo que confiar en la integridad de dicho servidor de no comprometer los datos. Como posible solución a estos problemas, surgen los esquemas de encriptación homomórficos completos. Un esquema homomórfico completo no requiere desencriptar los datos para operar con ellos, sino que es capaz de realizar las operaciones sobre los datos encriptados, manteniendo un homomorfismo entre el mensaje cifrado y el mensaje plano. De esta manera, cualquier intruso en el sistema no podría robar más que textos cifrados, siendo imposible un robo de los datos sensibles sin un robo de las claves de cifrado. Sin embargo, los esquemas de encriptación homomórfica son, actualmente, drás-ticamente lentos comparados con otros esquemas de encriptación clásicos. Una op¬eración en el anillo del texto plano puede conllevar numerosas operaciones en el anillo del texto encriptado. Por esta razón, están surgiendo distintos planteamientos sobre como acelerar estos esquemas para un uso práctico. Una de las propuestas para acelerar los esquemas homomórficos consiste en el uso de High-Performance Computing (HPC) usando FPGAs (Field Programmable Gate Arrays). Una FPGA es un dispositivo semiconductor que contiene bloques de lógica cuya interconexión y funcionalidad puede ser reprogramada. Al compilar para FPGAs, se genera un circuito hardware específico para el algorithmo proporcionado, en lugar de hacer uso de instrucciones en una máquina universal, lo que supone una gran ventaja con respecto a CPUs. Las FPGAs tienen, por tanto, claras difrencias con respecto a CPUs: -Arquitectura en pipeline: permite la obtención de outputs sucesivos en tiempo constante -Posibilidad de tener multiples pipes para computación concurrente/paralela. Así, en este proyecto: -Se realizan diferentes implementaciones de esquemas homomórficos en sistemas basados en FPGAs. -Se analizan y estudian las ventajas y desventajas de los esquemas criptográficos en sistemas basados en FPGAs, comparando con proyectos relacionados. -Se comparan las implementaciones con trabajos relacionados New cloud-based technologies, the internet of things or "as a service" trends are based in data storage and processing in a remote server. In order to guarantee a secure communication and handling of data, cryptographic schemes are used. Tradi¬tionally, these cryptographic schemes focus on guaranteeing the security of data while storing and transferring it, not while operating with it. Therefore, once the server has to operate with that encrypted data, it first decrypts it, exposing unencrypted data to intruders in the server. Moreover, the whole traditional scheme is based on the assumption the server is reliable, giving it enough credentials to decipher data to process it. As a possible solution for this issues, fully homomorphic encryption(FHE) schemes is introduced. A fully homomorphic scheme does not require data decryption to operate, but rather operates over the cyphertext ring, keeping an homomorphism between the cyphertext ring and the plaintext ring. As a result, an outsider could only obtain encrypted data, making it impossible to retrieve the actual sensitive data without its associated cypher keys. However, using homomorphic encryption(HE) schemes impacts performance dras-tically, slowing it down. One operation in the plaintext space can lead to several operations in the cyphertext space. Because of this, different approaches address the problem of speeding up these schemes in order to become practical. One of these approaches consists in the use of High-Performance Computing (HPC) using FPGAs (Field Programmable Gate Array). An FPGA is an integrated circuit designed to be configured by a customer or a designer after manufacturing - hence "field-programmable". Compiling into FPGA means generating a circuit (hardware) specific for that algorithm, instead of having an universal machine and generating a set of machine instructions. FPGAs have, thus, clear differences compared to CPUs: - Pipeline architecture, which allows obtaining successive outputs in constant time. -Possibility of having multiple pipes for concurrent/parallel computation. Thereby, In this project: -We present different implementations of FHE schemes in FPGA-based systems. -We analyse and study advantages and drawbacks of the implemented FHE schemes, compared to related work.
Resumo:
We present a video-based system which interactively captures the geometry of a 3D object in the form of a point cloud, then recognizes and registers known objects in this point cloud in a matter of seconds (fig. 1). In order to achieve interactive speed, we exploit both efficient inference algorithms and parallel computation, often on a GPU. The system can be broken down into two distinct phases: geometry capture, and object inference. We now discuss these in further detail. © 2011 IEEE.
Resumo:
This paper looks at potential distribution network stability problems under the Smart Grid scenario. This is to consider distributed energy resources (DERs) e.g. renewable power generations and intelligent loads with power-electronic controlled converters. The background of this topic is introduced and potential problems are defined from conventional power system stability and power electronic system stability theories. Challenges are identified with possible solutions from steady-state limits, small-signal, and large-signal stability indexes and criteria. Parallel computation techniques might be included for simulation or simplification approaches are required for a largescale distribution network analysis.
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.