957 resultados para High-performance computing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Otimização por Enxame de Partículas (PSO, Particle Swarm Optimization) é uma técnica de otimização que vem sendo utilizada na solução de diversos problemas, em diferentes áreas do conhecimento. Porém, a maioria das implementações é realizada de modo sequencial. O processo de otimização necessita de um grande número de avaliações da função objetivo, principalmente em problemas complexos que envolvam uma grande quantidade de partículas e dimensões. Consequentemente, o algoritmo pode se tornar ineficiente em termos do desempenho obtido, tempo de resposta e até na qualidade do resultado esperado. Para superar tais dificuldades, pode-se utilizar a computação de alto desempenho e paralelizar o algoritmo, de acordo com as características da arquitetura, visando o aumento de desempenho, a minimização do tempo de resposta e melhoria da qualidade do resultado final. Nesta dissertação, o algoritmo PSO é paralelizado utilizando três estratégias que abordarão diferentes granularidades do problema, assim como dividir o trabalho de otimização entre vários subenxames cooperativos. Um dos algoritmos paralelos desenvolvidos, chamado PPSO, é implementado diretamente em hardware, utilizando uma FPGA. Todas as estratégias propostas, PPSO (Parallel PSO), PDPSO (Parallel Dimension PSO) e CPPSO (Cooperative Parallel PSO), são implementadas visando às arquiteturas paralelas baseadas em multiprocessadores, multicomputadores e GPU. Os diferentes testes realizados mostram que, nos problemas com um maior número de partículas e dimensões e utilizando uma estratégia com granularidade mais fina (PDPSO e CPPSO), a GPU obteve os melhores resultados. Enquanto, utilizando uma estratégia com uma granularidade mais grossa (PPSO), a implementação em multicomputador obteve os melhores resultados.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. FINDINGS: Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. CONCLUSIONS: BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology.BarraCUDA is currently available from http://seqbarracuda.sf.net.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigated the structural, elastic, and electronic properties of the cubic perovskite-type BaHfO3 using a first-principles method based on the plane-wave basis set. Analysis of the band structure shows that perovskite-type BaHfO3 is a wide gap indirect semiconductor. The band-gap is predicted to be 3.94 eV within the screened exchange local density approximation (sX-LDA). The calculated equilibrium lattice constant of this compound is in good agreement with the available experimental and theoretical data reported in the literatures. The independent elastic constants (C-11, C-12, and C-44), bulk modules B and its pressure derivatives B', compressibility beta, shear modulus G, Young's modulus Y, Poisson's ratio nu, and Lame constants (mu, lambda) are obtained and analyzed in comparison with the available theoretical and experimental data for both the singlecrystalline and polycrystalline BaHfO3. The bonding-charge density calculation make it clear that the covalent bonds exist between the Hf and 0 atoms and the ionic bonds exist between the Ba atoms and HfO3 ionic groups in BaHfO3. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accepted Version

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Copper dimethylamino-2-propoxide [Cu(dmap)2] is used as a precursor for low-temperature atomic layer deposition (ALD) of copper thin films. Chemisorption of the precursor is the necessary first step of ALD, but it is not known in this case whether there is selectivity for adsorption sites, defects, or islands on the substrate. Therefore, we study the adsorption of the Cu(dmap)2 molecule on the different sites on flat and rough Cu surfaces using PBE, PBE-D3, optB88-vdW, and vdW-DF2 methods. We found the relative order of adsorption energies for Cu(dmap)2 on Cu surfaces is Eads (PBE-D3) > Eads (optB88-vdW) > Eads (vdW-DF2) > Eads (PBE). The PBE and vdW-DF2 methods predict one chemisorption structure, while optB88-vdW predicts three chemisorption structures for Cu(dmap)2 adsorption among four possible adsorption configurations, whereas PBE-D3 predicts a chemisorbed structure for all the adsorption sites on Cu(111). All the methods with and without van der Waals corrections yield a chemisorbed molecule on the Cu(332) step and Cu(643) kink because of less steric hindrance on the vicinal surfaces. Strong distortion of the molecule and significant elongation of Cu–N bonds are predicted in the chemisorbed structures, indicating that the ligand–Cu bonds break during the ALD of Cu from Cu(dmap)2. The molecule loses its initial square-planar structure and gains linear O–Cu–O bonding as these atoms attach to the surface. As a result, the ligands become unstable and the precursor becomes more reactive to the coreagent. Charge redistribution mainly occurs between the adsorbate O–Cu–O bond and the surface. Bader charge analysis shows that electrons are donated from the surface to the molecule in the chemisorbed structures, so that the Cu center in the molecule is partially reduced.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multilevel algorithms are a successful class of optimization techniques that address the mesh partitioning problem for mapping meshes onto parallel computers. They usually combine a graph contraction algorithm together with a local optimization method that refines the partition at each graph level. To date, these algorithms have been used almost exclusively to minimize the cut-edge weight in the graph with the aim of minimizing the parallel communication overhead. However, it has been shown that for certain classes of problems, the convergence of the underlying solution algorithm is strongly influenced by the shape or aspect ratio of the subdomains. Therefore, in this paper, the authors modify the multilevel algorithms to optimize a cost function based on the aspect ratio. Several variants of the algorithms are tested and shown to provide excellent results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Realizing scalable performance on high performance computing systems is not straightforward for single-phenomenon codes (such as computational fluid dynamics [CFD]). This task is magnified considerably when the target software involves the interactions of a range of phenomena that have distinctive solution procedures involving different discretization methods. The problems of addressing the key issues of retaining data integrity and the ordering of the calculation procedures are significant. A strategy for parallelizing this multiphysics family of codes is described for software exploiting finite-volume discretization methods on unstructured meshes using iterative solution procedures. A mesh partitioning-based SPMD approach is used. However, since different variables use distinct discretization schemes, this means that distinct partitions are required; techniques for addressing this issue are described using the mesh-partitioning tool, JOSTLE. In this contribution, the strategy is tested for a variety of test cases under a wide range of conditions (e.g., problem size, number of processors, asynchronous / synchronous communications, etc.) using a variety of strategies for mapping the mesh partition onto the processor topology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The parallelization of existing/industrial electromagnetic software using the bulk synchronous parallel (BSP) computation model is presented. The software employs the finite element method with a preconditioned conjugate gradient-type solution for the resulting linear systems of equations. A geometric mesh-partitioning approach is applied within the BSP framework for the assembly and solution phases of the finite element computation. This is combined with a nongeometric, data-driven parallel quadrature procedure for the evaluation of right-hand-side terms in applications involving coil fields. A similar parallel decomposition is applied to the parallel calculation of electron beam trajectories required for the design of tube devices. The BSP parallelization approach adopted is fully portable, conceptually simple, and cost-effective, and it can be applied to a wide range of finite element applications not necessarily related to electromagnetics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of deriving parallel mesh partitioning algorithms for mapping unstructured meshes to parallel computers is discussed in this chapter. In itself this raises a paradox - we seek to find a high quality partition of the mesh, but to compute it in parallel we require a partition of the mesh. In fact, we overcome this difficulty by deriving an optimisation strategy which can find a high quality partition even if the quality of the initial partition is very poor and then use a crude distribution scheme for the initial partition. The basis of this strategy is to use a multilevel approach combined with local refinement algorithms. Three such refinement algorithms are outlined and some example results presented which show that they can produce very high global quality partitions, very rapidly. The results are also compared with a similar multilevel serial partitioner and shown to be almost identical in quality. Finally we consider the impact of the initial partition on the results and demonstrate that the final partition quality is, modulo a certain amount of noise, independent of the initial partition.