14 resultados para OPENMP PROGRAMMING
em Chinese Academy of Sciences Institutional Repositories Grid Portal
Resumo:
Numerical modeling of groundwater is very important for understanding groundwater flow and solving hydrogeological problem. Today, groundwater studies require massive model cells and high calculation accuracy, which are beyond single-CPU computer’s capabilities. With the development of high performance parallel computing technologies, application of parallel computing method on numerical modeling of groundwater flow becomes necessary and important. Using parallel computing can improve the ability to resolve various hydro-geological and environmental problems. In this study, parallel computing method on two main types of modern parallel computer architecture, shared memory parallel systems and distributed shared memory parallel systems, are discussed. OpenMP and MPI (PETSc) are both used to parallelize the most widely used groundwater simulator, MODFLOW. Two parallel solvers, P-PCG and P-MODFLOW, were developed for MODFLOW. The parallelized MODFLOW was used to simulate regional groundwater flow in Beishan, Gansu Province, which is a potential high-level radioactive waste geological disposal area in China. 1. The OpenMP programming paradigm was used to parallelize the PCG (preconditioned conjugate-gradient method) solver, which is one of the main solver for MODFLOW. The parallel PCG solver, P-PCG, is verified using an 8-processor computer. Both the impact of compilers and different model domain sizes were considered in the numerical experiments. The largest test model has 1000 columns, 1000 rows and 1000 layers. Based on the timing results, execution times using the P-PCG solver are typically about 1.40 to 5.31 times faster than those using the serial one. In addition, the simulation results are the exact same as the original PCG solver, because the majority of serial codes were not changed. It is worth noting that this parallelizing approach reduces cost in terms of software maintenance because only a single source PCG solver code needs to be maintained in the MODFLOW source tree. 2. P-MODFLOW, a domain decomposition–based model implemented in a parallel computing environment is developed, which allows efficient simulation of a regional-scale groundwater flow. The basic approach partitions a large model domain into any number of sub-domains. Parallel processors are used to solve the model equations within each sub-domain. The use of domain decomposition method to achieve the MODFLOW program distributed shared memory parallel computing system will process the application of MODFLOW be extended to the fleet of the most popular systems, so that a large-scale simulation could take full advantage of hundreds or even thousands parallel processors. P-MODFLOW has a good parallel performance, with the maximum speedup of 18.32 (14 processors). Super linear speedups have been achieved in the parallel tests, indicating the efficiency and scalability of the code. Parallel program design, load balancing and full use of the PETSc were considered to achieve a highly efficient parallel program. 3. The characterization of regional ground water flow system is very important for high-level radioactive waste geological disposal. The Beishan area, located in northwestern Gansu Province, China, is selected as a potential site for disposal repository. The area includes about 80000 km2 and has complicated hydrogeological conditions, which greatly increase the computational effort of regional ground water flow models. In order to reduce computing time, parallel computing scheme was applied to regional ground water flow modeling. Models with over 10 million cells were used to simulate how the faults and different recharge conditions impact regional ground water flow pattern. The results of this study provide regional ground water flow information for the site characterization of the potential high-level radioactive waste disposal.
Resumo:
Intel和AMD双核乃至4核处理器的推出,使得并行计算已经普及到PC机。为了充分利用多核,需要对原有程序进行多线程改造,使其充分利用多核处理带来的性能提升。该文利用共享存储编程的工业标准OpenMP对有限元方法涉及的单元计算子程序进行了并行化实现。在机群的一个双CPU的SMP节点上的测试表明,共享并行化使得该单元子程序的性能提高了一倍。
Resumo:
This paper describes a two-step packing algorithm for LUT clusters of which the LUT input multipliers are depopulated. In the first step, a greedy algorithm is used to search for BLE locations and cluster inputs. If the greedy algorithm fails, the second step with network flow programming algorithm is employed. Numerical results illustrate that our two-step packing algorithm obtains better packing density than one-step greedy packing algorithm.
Resumo:
In this paper we present a methodology and its implementation for the design and verification of programming circuit used in a family of application-specific FPGAs that share a common architecture. Each member of the family is different either in the types of functional blocks contained or in the number of blocks of each type. The parametrized design methodology is presented here to achieve this goal. Even though our focus is on the programming circuitry that provides the interface between the FPGA core circuit and the external programming hardware, the parametrized design method can be generalized to the design of entire chip for all members in the FPGA family. The method presented here covers the generation of the design RTL files and the support files for synthesis, place-and-route layout and simulations. The proposed method is proven to work smoothly within the complete chip design methodology. We will describe the implementation of this method to the design of the programming circuit in details including the design flow from the behavioral-level design to the final layout as well as the verification. Different package options and different programming modes are included in the description of the design. The circuit design implementation is based on SMIC 0.13-micron CMOS technology.
Resumo:
Intel和AMD双核乃至4核处理器的推出,使得并行计算已经普及到PC机。为了充分利用多核,需要对原有程序进行多线程改造,使其充分利用多核处理带来的性能提升。该文利用共享存储编程的工业标准OpenMP对有限元方法涉及的单元计算子程序进行了并行化实现。在机群的一个双CPU的SMP节点上的测试表明,共享并行化使得该单元子程序的性能提高了一倍。
Resumo:
Gzip无损压缩算法.尽管gzip算法能够取得很好的压缩比,但它在分析和压缩编码的过程需要进行大量的计算.为了缩短压缩时间,提出了一种基于共享存储的并行压缩策略,采用OpenMP标准和"生产者/消费者"模型实现了gzip的并行压缩版本.在Beowulf集群中的一个SMP节点(双CPU)和曙光天阔服务器(4路双核)上的测试表明,并行化的gzip程序取得了极大的性能提升,尤其是大文件的压缩.
Resumo:
OpenMP是一种支持Fortran,C/C++的共享存储并行编程标准。它基于fork-join的并行执行模型,将程序划分为并行区和串行区。近几年来,OpenMP在SMP(Symmetric Multi-Processing)和多核体系结构的并行编程中得到了广泛的应用。随着多核处理器的发展,实际的应用程序如何充分利用多个处理器核来提高运算效率也成为研究的热点。 在科学计算中,循环结构是最核心的并行对象之一。考虑到负载平衡、调度开销、同步开销等多方面因素,OpenMP标准制定了Static调度、Dynamic调度、Guided调度和Runtime调度等不同策略。针对Guided调度策略不适合递减型循环结构的缺点,本文提出了一种改进的new_guided调度策略,并在OMPi编译器上加以实现。New_guided调度策略的主要思想是对前半部分的循环采用Static调度,后半部分的循环采用Guided调度。此外,本文针对不同的循环结构,在多核处理器上对不同的调度策略进行了评测。测试结果表明,在一般情况下,OpenMP默认的Static策略的调度性能最差;对于规则的循环结构和递增的循环结构,Dynamic调度策略、Guided调度策略和new_guided策略的性能差别不大;对于递减型的循环结构,Dynamic调度策略和new_guided策略的性能相当,要优于Guided调度策略;对于求解Mandelbrot集合这类计算量集中在中间的随机循环结构,Dynamic调度策略优于其它策略,new_guided策略的性能介于Dynamic调度和Guided调度之间。 随着多核处理器的问世和发展,多线程程序设计也已经成为一个不可回避的问题。稀疏矩阵向量乘(SpMV, Sparse Matrix-Vector Multiplication)是一个十分重要且经常被大量调用的科学计算内核。SpMV的存储访问一般都极不规则,导致现有的SpMV算法效率都比较低。目前,多核处理器芯片上的内核数量正在逐步增加。这使得在多核处理器上对SpMV进行并行化加速变得非常重要。本文介绍了稀疏矩阵的两种常用的存储格式CSR和BCSR,并采用OpenMP实现了SpMV的多核并行化。此外,本文还讨论了寄存器分块算法、压缩列索引等优化技术,以及不同调度策略对多线程并行后的SpMV的影响。在曙光天阔服务器S4800A1上的测试表明,大部分矩阵都取得了可扩展、甚至是超线性的加速比,但是对于部分规模较大的矩阵,加速效果并不明显。在我们的测试中,与基于CSR实现的多线程SpMV相比,采用寄存器分块算法优化后的SpMV运算速度平均提高了28.09%。在基于CSR实现的多线程SpMV中,采用列索引优化技术后的程序比优化前的速度平均提高了13.05%。此外,本文实现了一种基于非零元个数的调度策略。在该策略中,每个线程处理几乎相同数量的非零元。我们将它和OpenMP标准提供的三种调度策略进行了测试和分析。测试结果表明:与OpenMP提供的调度策略相比,基于非零元个数的调度策略能取得更好的负载平衡;Dynamic调度和Guided调度在多线程SpMV中的性能基本相当,均优于Static调度策略。
Resumo:
在科学计算中,循环结构是最重要的并行对象之一.考虑到负载平衡、调度开销等多方面因素,OpenMP标准提供静态调度、动态调度、指导调度和运行时调度等不同策略.针对指导调度策略不适合递减型循环结构的问题,提出一种改进的new_guided指导调度策略,并在OMPi编译器上加以实现.New_guided调度策略的主要思想是对前半部分的循环采用静态调度,后半部分的循环采用指导调度.针对不同循环结构,在多核处理器上对不同调度策略进行评测.结果表明,在一般情况下,OpenMP默认的静态策略的调度性能最差;对于规则的循环结构和递增的循环结构,动态调度、指导调度和new_guided策略的性能差别不大;对于递减型的循环结构,动态调度和new_guided策略的性能相当,要优于指导调度策略;对于某些极不规则的随机循环结构,动态调度明显优于其他策略,new_guided策略的性能介于动态调度和指导调度之间.
Resumo:
The formal specification language LFC was designed to support formal specification acquisition. However, it is yet suited to be used as a meta-language for specifying programming language processing. This paper introduces LFC as a meta-language, and compares it with ASF+SDF, an algebraic specification formalism that can also be used to programming languages.
Resumo:
ICSE
Resumo:
本文提出一个不用 Kuhn- Tucker条件而直接搜索严格凸二次规划最优目标点的鲁棒方法 .在搜索过程中 ,目标点沿约束多面体边界上的一条折线移动 .这种移动目标点的思想可以被认为是线性规划单纯形法的自然推广 ,在单纯形法中 ,目标点从一个顶点移到另一个顶点。