91 resultados para scalable parallel programming
em Chinese Academy of Sciences Institutional Repositories Grid Portal
Resumo:
Gzip无损压缩算法.尽管gzip算法能够取得很好的压缩比,但它在分析和压缩编码的过程需要进行大量的计算.为了缩短压缩时间,提出了一种基于共享存储的并行压缩策略,采用OpenMP标准和"生产者/消费者"模型实现了gzip的并行压缩版本.在Beowulf集群中的一个SMP节点(双CPU)和曙光天阔服务器(4路双核)上的测试表明,并行化的gzip程序取得了极大的性能提升,尤其是大文件的压缩.
Resumo:
Intel和AMD双核乃至4核处理器的推出,使得并行计算已经普及到PC机。为了充分利用多核,需要对原有程序进行多线程改造,使其充分利用多核处理带来的性能提升。该文利用共享存储编程的工业标准OpenMP对有限元方法涉及的单元计算子程序进行了并行化实现。在机群的一个双CPU的SMP节点上的测试表明,共享并行化使得该单元子程序的性能提高了一倍。
Resumo:
Negabinary is a component of the positional number system. A complete set of negabinary arithmetic operations are presented, including the basic addition/subtraction logic, the two-step carry-free addition/subtraction algorithm based on negabinary signed-digit (NSD) representation, parallel multiplication, and the fast conversion from NSD to the normal negabinary in the carry-look-ahead mode. All the arithmetic operations can be performed with binary logic. By programming the binary reference bits, addition and subtraction can be realized in parallel with the same binary logic functions. This offers a technique to perform space-variant arithmetic-logic functions with space-invariant instructions. Multiplication can be performed in the tree structure and it is simpler than the modified signed-digit (MSD) counterpart. The parallelism of the algorithms is very suitable for optical implementation. Correspondingly, a general-purpose optical logic system using an electron trapping device is suggested. Various complex logic functions can be performed by programming the illumination of the data arrays without additional temporal latency of the intermediate results. The system can be compact. These properties make the proposed negabinary arithmetic-logic system a strong candidate for future applications in digital optical computing with the development of smart pixel arrays. (C) 1999 Society of Photo-Optical Instrumentation Engineers. [S0091-3286(99)00803-X].
Resumo:
解决平行平板流槽每次实验只能观测壁面培养细胞受一种剪应力作用的问题。作者在平行平板流槽的基础上,首次提出了一种改进后的流槽--二维平板分叉流槽。通过数值模拟,给出了流体作定常流动时,流速和壁面剪应力的分布。结果发现,利用这种二维平板分叉流槽可以研究壁面培养的细胞在不同大小剪应力作用下的力学行为。该研究结果为流槽的合理设计和使用,并分析剪应力空间分布对内皮细胞的影响有重要实际意义。
Resumo:
A label-free protein microfluidic array for immunoassays based on the combination of imaging ellipsometry and an integrated microfluidic system is presented. Proteins can be patterned homogeneously on substrate in array format by the microfluidic system simultaneously. After preparation, the protein array can be packed in the microfluidic system which is full of buffer so that proteins are not exposed to denaturing conditions. With simple microfluidic channel junction, the protein microfluidic array can be used in serial or parallel format to analyze single or multiple samples simultaneously. Imaging ellipsometry is used for the protein array reading with a label-free format. The biological and medical applications of the label-free protein microfluidic array are demonstrated by screening for antibody–antigen interactions, measuring the concentration of the protein solution and detecting five markers of hepatitis B.
Resumo:
An experimental investigation was conducted to study the holdup distribution of oil and water two-phase flow in two parallel tubes with unequal tube diameter. Tests were performed using white oil (of viscosity 52 mPa s and density 860 kg/m(3)) and tap water as liquid phases at room temperature and atmospheric outlet pressure. Measurements were taken of water flow rates from 0.5 to 12.5 m(3)/h and input oil volume fractions from 3 to 94 %. Results showed that there were different flow pattern maps between the run and bypass tubes when oil-water two-phase flow is found in the parallel tubes. At low input fluid flow rates, a large deviation could be found on the average oil holdup between the bypass and the run tubes. However, with increased input oil fraction at constant water flow rate, the holdup at the bypass tube became close to that at the run tube. Furthermore, experimental data showed that there was no significant variation in flow pattern and holdup between the run and main tubes. In order to calculate the holdup in the form of segregated flow, the drift flux model has been used here.
Resumo:
<正>生物力学研究的趋势十分明显的是,由宏观方面的研究转向细观和微观方面的研究。人们从个体、器官和组织的生物力学方面,转向细胞甚至分子水平的研究。在力的作用下,细胞的形态、生理作用等发生的变化引起了人们极大的兴趣,其中流体流动时剪应力对细胞的作用尤为人们所特别关注,因为有血液在血管中流动时的剪应力对血管内皮细胞的作用这样的实际生理背景。剪应力不但可以影响内皮细胞的形态结构,而且对在细胞诸多生理方面有影响。
Resumo:
A three-dimensional MHD solver is described in the paper. The solver simulates reacting flows with nonequilibrium between translational-rotational, vibrational and electron translational modes. The conservation equations are discretized with implicit time marching and the second-order modified Steger-Warming scheme, and the resulted linear system is solved iteratively with Newton-Krylov-Schwarz method that is implemented by PETSc package. The results of convergence tests are plotted, which show good scalability and convergence around twice faster when compared with the DPLR method. Then five test runs are conducted simulating the experiments done at the NASA Ames MHD channel, and the calculated pressures, temperatures, electrical conductivity, back EMF, load factors and flow accelerations are shown to agree with the experimental data. Our computation shows that the electrical conductivity distribution is not uniform in the powered section of the MHD channel, and that it is important to include Joule heating in order to calculate the correct conductivity and the MHD acceleration.
Resumo:
It has long been recognized that many direct parallel tridiagonal solvers are only efficient for solving a single tridiagonal equation of large sizes, and they become inefficient when naively used in a three-dimensional ADI solver. In order to improve the parallel efficiency of an ADI solver using a direct parallel solver, we implement the single parallel partition (SPP) algorithm in conjunction with message vectorization, which aggregates several communication messages into one to reduce the communication costs. The measured performances show that the longest allowable message vector length (MVL) is not necessarily the best choice. To understand this observation and optimize the performance, we propose an improved model that takes the cache effect into consideration. The optimal MVL for achieving the best performance is shown to depend on number of processors and grid sizes. Similar dependence of the optimal MVL is also found for the popular block pipelined method.
MODIFIED DIRECT TWOS-COMPLEMENT PARALLEL ARRAY MULTIPLICATION ALGORITHM FOR COMPLEX MATRIX OPERATION
Resumo:
A direct twos-complement parallel array multiplication algorithm is introduced and modified for digital optical numerical computation. The modified version overcomes the problems encountered in the conventional optical twos-complement algorithm. In the array, all the summands are generated in parallel, and the relevant summands having the same weights are added simultaneously without carries, resulting in the product expressed in a mixed twos-complement system. In a two-stage array, complex multiplication is possible with using four real subarrays. Furthermore, with a three-stage array architecture, complex matrix operation is straightforwardly accomplished. In the experiment, parallel two-stage array complex multiplication with liquid-crystal panels is demonstrated.
Resumo:
On the basis of signed-digit negabinary representation, parallel two-step addition and one-step subtraction can be performed for arbitrary-length negabinary operands.; The arithmetic is realized by signed logic operations and optically implemented by spatial encoding and decoding techniques. The proposed algorithm and optical system are simple, reliable, and practicable, and they have the property of parallel processing of two-dimensional data. This leads to an efficient design for the optical arithmetic and logic unit. (C) 1997 Optical Society of America.