988 resultados para parallel implementation
Resumo:
Segmentation of medical imagery is a challenging problem due to the complexity of the images, as well as to the absence of models of the anatomy that fully capture the possible deformations in each structure. Brain tissue is a particularly complex structure, and its segmentation is an important step for studies in temporal change detection of morphology, as well as for 3D visualization in surgical planning. In this paper, we present a method for segmentation of brain tissue from magnetic resonance images that is a combination of three existing techniques from the Computer Vision literature: EM segmentation, binary morphology, and active contour models. Each of these techniques has been customized for the problem of brain tissue segmentation in a way that the resultant method is more robust than its components. Finally, we present the results of a parallel implementation of this method on IBM's supercomputer Power Visualization System for a database of 20 brain scans each with 256x256x124 voxels and validate those against segmentations generated by neuroanatomy experts.
Resumo:
El treball desenvolupat en aquesta tesi aprofundeix i aporta solucions innovadores en el camp orientat a tractar el problema de la correspondència en imatges subaquàtiques. En aquests entorns, el que realment complica les tasques de processat és la falta de contorns ben definits per culpa d'imatges esborronades; un fet aquest que es deu fonamentalment a il·luminació deficient o a la manca d'uniformitat dels sistemes d'il·luminació artificials. Els objectius aconseguits en aquesta tesi es poden remarcar en dues grans direccions. Per millorar l'algorisme d'estimació de moviment es va proposar un nou mètode que introdueix paràmetres de textura per rebutjar falses correspondències entre parells d'imatges. Un seguit d'assaigs efectuats en imatges submarines reals han estat portats a terme per seleccionar les estratègies més adients. Amb la finalitat d'aconseguir resultats en temps real, es proposa una innovadora arquitectura VLSI per la implementació d'algunes parts de l'algorisme d'estimació de moviment amb alt cost computacional.
Resumo:
This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA, Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
This paper presents a parallel Linear Hashtable Motion Estimation Algorithm (LHMEA). Most parallel video compression algorithms focus on Group of Picture (GOP). Based on LHMEA we proposed earlier [1][2], we developed a parallel motion estimation algorithm focus inside of frame. We divide each reference frames into equally sized regions. These regions are going to be processed in parallel to increase the encoding speed significantly. The theory and practice speed up of parallel LHMEA according to the number of PCs in the cluster are compared and discussed. Motion Vectors (MV) are generated from the first-pass LHMEA and used as predictors for second-pass Hexagonal Search (HEXBS) motion estimation, which only searches a small number of Macroblocks (MBs). We evaluated distributed parallel implementation of LHMEA of TPA for real time video compression.
Resumo:
The metaheuristics techiniques are known to solve optimization problems classified as NP-complete and are successful in obtaining good quality solutions. They use non-deterministic approaches to generate solutions that are close to the optimal, without the guarantee of finding the global optimum. Motivated by the difficulties in the resolution of these problems, this work proposes the development of parallel hybrid methods using the reinforcement learning, the metaheuristics GRASP and Genetic Algorithms. With the use of these techniques, we aim to contribute to improved efficiency in obtaining efficient solutions. In this case, instead of using the Q-learning algorithm by reinforcement learning, just as a technique for generating the initial solutions of metaheuristics, we use it in a cooperative and competitive approach with the Genetic Algorithm and GRASP, in an parallel implementation. In this context, was possible to verify that the implementations in this study showed satisfactory results, in both strategies, that is, in cooperation and competition between them and the cooperation and competition between groups. In some instances were found the global optimum, in others theses implementations reach close to it. In this sense was an analyze of the performance for this proposed approach was done and it shows a good performance on the requeriments that prove the efficiency and speedup (gain in speed with the parallel processing) of the implementations performed
Resumo:
This paper analyzes the performance of a parallel implementation of Coupled Simulated Annealing (CSA) for the unconstrained optimization of continuous variables problems. Parallel processing is an efficient form of information processing with emphasis on exploration of simultaneous events in the execution of software. It arises primarily due to high computational performance demands, and the difficulty in increasing the speed of a single processing core. Despite multicore processors being easily found nowadays, several algorithms are not yet suitable for running on parallel architectures. The algorithm is characterized by a group of Simulated Annealing (SA) optimizers working together on refining the solution. Each SA optimizer runs on a single thread executed by different processors. In the analysis of parallel performance and scalability, these metrics were investigated: the execution time; the speedup of the algorithm with respect to increasing the number of processors; and the efficient use of processing elements with respect to the increasing size of the treated problem. Furthermore, the quality of the final solution was verified. For the study, this paper proposes a parallel version of CSA and its equivalent serial version. Both algorithms were analysed on 14 benchmark functions. For each of these functions, the CSA is evaluated using 2-24 optimizers. The results obtained are shown and discussed observing the analysis of the metrics. The conclusions of the paper characterize the CSA as a good parallel algorithm, both in the quality of the solutions and the parallel scalability and parallel efficiency
Resumo:
This work presents a scalable and efficient parallel implementation of the Standard Simplex algorithm in the multicore architecture to solve large scale linear programming problems. We present a general scheme explaining how each step of the standard Simplex algorithm was parallelized, indicating some important points of the parallel implementation. Performance analysis were conducted by comparing the sequential time using the Simplex tableau and the Simplex of the CPLEXR IBM. The experiments were executed on a shared memory machine with 24 cores. The scalability analysis was performed with problems of different dimensions, finding evidence that our parallel standard Simplex algorithm has a better parallel efficiency for problems with more variables than constraints. In comparison with CPLEXR , the proposed parallel algorithm achieved a efficiency of up to 16 times better
Estudo estrutural de quinases dependentes de ciclinas por métodos de modelagem molecular comparativa
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Neste trabalho são apresentadas simulações computacionais inéditas para o cálculo de tensões induzidas em linhas de baixa tensão provenientes de descargas atmosféricas em estações rádio-base de telefonia celular (ERBs). Foram construídas estruturas representativas que denotam um grau de complexidade bastante avançado e semelhante ao encontrado em campo, visando assim a obtenção o de resultados bem próximos aos da realidade. Para tal, desenvolveu-se um software, no qual as equações de Maxwell são resolvidas numericamente utilizando o Método das Diferenças Finitas no Domínio do Tempo (FDTD), associado à truncagem de domínio de análise pela técnica da UPML e representação de condutores elétricos pela formulação de fio fino para meios condutivos, gerando soluções de onda completa para o problema.
Resumo:
The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).
Resumo:
Coupled-cluster theory in its single-reference formulation represents one of the most successful approaches in quantum chemistry for the description of atoms and molecules. To extend the applicability of single-reference coupled-cluster theory to systems with degenerate or near-degenerate electronic configurations, multireference coupled-cluster methods have been suggested. One of the most promising formulations of multireference coupled cluster theory is the state-specific variant suggested by Mukherjee and co-workers (Mk-MRCC). Unlike other multireference coupled-cluster approaches, Mk-MRCC is a size-extensive theory and results obtained so far indicate that it has the potential to develop to a standard tool for high-accuracy quantum-chemical treatments. This work deals with developments to overcome the limitations in the applicability of the Mk-MRCC method. Therefore, an efficient Mk-MRCC algorithm has been implemented in the CFOUR program package to perform energy calculations within the singles and doubles (Mk-MRCCSD) and singles, doubles, and triples (Mk-MRCCSDT) approximations. This implementation exploits the special structure of the Mk-MRCC working equations that allows to adapt existing efficient single-reference coupled-cluster codes. The algorithm has the correct computational scaling of d*N^6 for Mk-MRCCSD and d*N^8 for Mk-MRCCSDT, where N denotes the system size and d the number of reference determinants. For the determination of molecular properties as the equilibrium geometry, the theory of analytic first derivatives of the energy for the Mk-MRCC method has been developed using a Lagrange formalism. The Mk-MRCC gradients within the CCSD and CCSDT approximation have been implemented and their applicability has been demonstrated for various compounds such as 2,6-pyridyne, the 2,6-pyridyne cation, m-benzyne, ozone and cyclobutadiene. The development of analytic gradients for Mk-MRCC offers the possibility of routinely locating minima and transition states on the potential energy surface. It can be considered as a key step towards routine investigation of multireference systems and calculation of their properties. As the full inclusion of triple excitations in Mk-MRCC energy calculations is computational demanding, a parallel implementation is presented in order to circumvent limitations due to the required execution time. The proposed scheme is based on the adaption of a highly efficient serial Mk-MRCCSDT code by parallelizing the time-determining steps. A first application to 2,6-pyridyne is presented to demonstrate the efficiency of the current implementation.
Resumo:
In this paper we will see how the efficiency of the MBS simulations can be improved in two different ways, by considering both an explicit and implicit semi-recursive formulation. The explicit method is based on a double velocity transformation that involves the solution of a redundant but compatible system of equations. The high computational cost of this operation has been drastically reduced by taking into account the sparsity pattern of the system. Regarding this, the goal of this method is the introduction of MA48, a high performance mathematical library provided by Harwell Subroutine Library. The second method proposed in this paper has the particularity that, depending on the case, between 70 and 85% of the computation time is devoted to the evaluation of forces derivatives with respect to the relative position and velocity vectors. Keeping in mind that evaluating these derivatives can be decomposed into concurrent tasks, the main goal of this paper lies on a successful and straightforward parallel implementation that have led to a substantial improvement with a speedup of 3.2 by keeping all the cores busy in a quad-core processor and distributing the workload between them, achieving on this way a huge time reduction by doing an ideal CPU usage
Resumo:
Most implementations of parallel logic programming rely on complex low-level machinery which is arguably difflcult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallelism. Therefore, we handle a signiflcant portion of the parallel implementation mechanism at the Prolog level with the help of a comparatively small number of concurrency-related primitives which take care of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modiflcations to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary experiments show that the amount of performance sacriflced is reasonable, although granularity control is required in some cases. Also, we observe that the availability of unrestricted parallelism contributes to better observed speedups.
Resumo:
This report presents an overview of the current work performed by us in the context of the efficient parallel implementation of traditional logic programming systems. The work is based on the &-Prolog System, a system for the automatic parallelization and execution of logic programming languages within the Independent And-parallelism model, and the global analysis and parallelization tools which have been developed for this system. In order to make the report self-contained, we first describe the "classical" tools of the &-Prolog system. We then explain in detail the work performed in improving and generalizing the global analysis and parallelization tools. Also, we describe the objectives which will drive our future work in this area.
Resumo:
Una amarra electrodinámica (electrodynamic tether) opera sobre principios electromagnéticos intercambiando momento con la magnetosfera planetaria e interactuando con su ionosfera. Es un subsistema pasivo fiable para desorbitar etapas de cohetes agotadas y satélites al final de su misión, mitigando el crecimiento de la basura espacial. Una amarra sin aislamiento captura electrones del plasma ambiente a lo largo de su segmento polarizado positivamente, el cual puede alcanzar varios kilómetros de longitud, mientras que emite electrones de vuelta al plasma mediante un contactor de plasma activo de baja impedancia en su extremo catódico, tal como un cátodo hueco (hollow cathode). En ausencia de un contactor catódico activo, la corriente que circula por una amarra desnuda en órbita es nula en ambos extremos de la amarra y se dice que ésta está flotando eléctricamente. Para emisión termoiónica despreciable y captura de corriente en condiciones limitadas por movimiento orbital (orbital-motion-limited, OML), el cociente entre las longitudes de los segmentos anódico y catódico es muy pequeño debido a la disparidad de masas entre iones y electrones. Tal modo de operación resulta en una corriente media y fuerza de Lorentz bajas en la amarra, la cual es poco eficiente como dispositivo para desorbitar. El electride C12A7 : e−, que podría presentar una función de trabajo (work function) tan baja como W = 0.6 eV y un comportamiento estable a temperaturas relativamente altas, ha sido propuesto como recubrimiento para amarras desnudas. La emisión termoiónica a lo largo de un segmento así recubierto y bajo el calentamiento de la operación espacial, puede ser más eficiente que la captura iónica. En el modo más simple de fuerza de frenado, podría eliminar la necesidad de un contactor catódico activo y su correspondientes requisitos de alimentación de gas y subsistema de potencia, lo que resultaría en un sistema real de amarra “sin combustible”. Con este recubrimiento de bajo W, cada segmento elemental del segmento catódico de una amarra desnuda de kilómetros de longitud emitiría corriente como si fuese parte de una sonda cilíndrica, caliente y uniformemente polarizada al potencial local de la amarra. La operación es similar a la de una sonda de Langmuir 2D tanto en los segmentos catódico como anódico. Sin embargo, en presencia de emisión, los electrones emitidos resultan en carga espacial (space charge) negativa, la cual reduce el campo eléctrico que los acelera hacia fuera, o incluso puede desacelerarlos y hacerlos volver a la sonda. Se forma una doble vainas (double sheath) estable con electrones emitidos desde la sonda e iones provenientes del plasma ambiente. La densidad de corriente termoiónica, variando a lo largo del segmento catódico, podría seguir dos leyes distintas bajo diferentes condiciones: (i) la ley de corriente limitada por la carga espacial (space-charge-limited, SCL) o (ii) la ley de Richardson-Dushman (RDS). Se presenta un estudio preliminar sobre la corriente SCL frente a una sonda emisora usando la teoría de vainas (sheath) formada por la captura iónica en condiciones OML, y la corriente electrónica SCL entre los electrodos cilíndricos según Langmuir. El modelo, que incluye efectos óhmicos y el efecto de transición de emisión SCL a emisión RDS, proporciona los perfiles de corriente y potencial a lo largo de la longitud completa de la amarra. El análisis muestra que en el modo más simple de fuerza de frenado, bajo condiciones orbitales y de amarras típicas, la emisión termoiónica proporciona un contacto catódico eficiente y resulta en una sección catódica pequeña. En el análisis anterior, tanto la transición de emisión SCL a RD como la propia ley de emisión SCL consiste en un modelo muy simplificado. Por ello, a continuación se ha estudiado con detalle la solución de vaina estacionaria de una sonda con emisión termoiónica polarizada negativamente respecto a un plasma isotrópico, no colisional y sin campo magnético. La existencia de posibles partículas atrapadas ha sido ignorada y el estudio incluye tanto un estudio semi-analítico mediante técnica asintóticas como soluciones numéricas completas del problema. Bajo las tres condiciones (i) alto potencial, (ii) R = Rmax para la validez de la captura iónica OML, y (iii) potencial monotónico, se desarrolla un análisis asintótico auto-consistente para la estructura de plasma compleja que contiene las tres especies de cargas (electrones e iones del plasma, electrones emitidos), y cuatro regiones espaciales distintas, utilizando teorías de movimiento orbital y modelos cinéticos de las especies. Aunque los electrones emitidos presentan carga espacial despreciable muy lejos de la sonda, su efecto no se puede despreciar en el análisis global de la estructura de la vaina y de dos capas finas entre la vaina y la región cuasi-neutra. El análisis proporciona las condiciones paramétricas para que la corriente sea SCL. También muestra que la emisión termoiónica aumenta el radio máximo de la sonda para operar dentro del régimen OML y que la emisión de electrones es mucho más eficiente que la captura iónica para el segmento catódico de la amarra. En el código numérico, los movimientos orbitales de las tres especies son modelados para potenciales tanto monotónico como no-monotónico, y sonda de radio R arbitrario (dentro o más allá del régimen de OML para la captura iónica). Aprovechando la existencia de dos invariante, el sistema de ecuaciones Poisson-Vlasov se escribe como una ecuación integro-diferencial, la cual se discretiza mediante un método de diferencias finitas. El sistema de ecuaciones algebraicas no lineal resultante se ha resuelto de con un método Newton-Raphson paralelizado. Los resultados, comparados satisfactoriamente con el análisis analítico, proporcionan la emisión de corriente y la estructura del plasma y del potencial electrostático. ABSTRACT An electrodynamic tether operates on electromagnetic principles and exchanges momentum through the planetary magnetosphere, by continuously interacting with the ionosphere. It is a reliable passive subsystem to deorbit spent rocket stages and satellites at its end of mission, mitigating the growth of orbital debris. A tether left bare of insulation collects electrons by its own uninsulated and positively biased segment with kilometer range, while electrons are emitted by a low-impedance active device at the cathodic end, such as a hollow cathode, to emit the full electron current. In the absence of an active cathodic device, the current flowing along an orbiting bare tether vanishes at both ends and the tether is said to be electrically floating. For negligible thermionic emission and orbital-motion-limited (OML) collection throughout the entire tether (electron/ion collection at anodic/cathodic segment, respectively), the anodic-to-cathodic length ratio is very small due to ions being much heavier, which results in low average current and Lorentz drag. The electride C12A7 : e−, which might present a possible work function as low as W = 0.6 eV and moderately high temperature stability, has been proposed as coating for floating bare tethers. Thermionic emission along a thus coated cathodic segment, under heating in space operation, can be more efficient than ion collection and, in the simplest drag mode, may eliminate the need for an active cathodic device and its corresponding gas-feed requirements and power subsystem, which would result in a truly “propellant-less” tether system. With this low-W coating, each elemental segment on the cathodic segment of a kilometers-long floating bare-tether would emit current as if it were part of a hot cylindrical probe uniformly polarized at the local tether bias, under 2D probe conditions that are also applied to the anodic-segment analysis. In the presence of emission, emitted electrons result in negative space charge, which decreases the electric field that accelerates them outwards, or even reverses it, decelerating electrons near the emitting probe. A double sheath would be established with electrons being emitted from the probe and ions coming from the ambient plasma. The thermionic current density, varying along the cathodic segment, might follow two distinct laws under different con ditions: i) space-charge-limited (SCL) emission or ii) full Richardson-Dushman (RDS) emission. A preliminary study on the SCL current in front of an emissive probe is presented using the orbital-motion-limited (OML) ion-collection sheath and Langmuir’s SCL electron current between cylindrical electrodes. A detailed calculation of current and bias profiles along the entire tether length is carried out with ohmic effects considered and the transition from SCL to full RDS emission is included. Analysis shows that in the simplest drag mode, under typical orbital and tether conditions, thermionic emission provides efficient cathodic contact and leads to a short cathodic section. In the previous analysis, both the transition between SCL and RDS emission and the current law for SCL condition have used a very simple model. To continue, considering an isotropic, unmagnetized, colissionless plasma and a stationary sheath, the probe-plasma contact is studied in detail for a negatively biased probe with thermionic emission. The possible trapped particles are ignored and this study includes both semianalytical solutions using asymptotic analysis and complete numerical solutions. Under conditions of i) high bias, ii) R = Rmax for ion OML collection validity, and iii) monotonic potential, a self-consistent asymptotic analysis is carried out for the complex plasma structure involving all three charge species (plasma electrons and ions, and emitted electrons) and four distinct spatial regions using orbital motion theories and kinetic modeling of the species. Although emitted electrons present negligible space charge far away from the probe, their effect cannot be neglected in the global analysis for the sheath structure and two thin layers in between the sheath and the quasineutral region. The parametric conditions for the current to be space-chargelimited are obtained. It is found that thermionic emission increases the range of probe radius for OML validity and is greatly more effective than ion collection for cathodic contact of tethers. In the numerical code, the orbital motions of all three species are modeled for both monotonic and non-monotonic potential, and for any probe radius R (within or beyond OML regime for ion collection). Taking advantage of two constants of motion (energy and angular momentum), the Poisson-Vlasov equation is described by an integro differential equation, which is discretized using finite difference method. The non-linear algebraic equations are solved using a parallel implementation of the Newton-Raphson method. The results, which show good agreement with the analytical results, provide the results for thermionic current, the sheath structure, and the electrostatic potential.