894 resultados para HPC parallel computer architecture queues fault tolerance programmability ADAM
Resumo:
In this paper we consider hybrid (fast stochastic approximation and deterministic refinement) algorithms for Matrix Inversion (MI) and Solving Systems of Linear Equations (SLAE). Monte Carlo methods are used for the stochastic approximation, since it is known that they are very efficient in finding a quick rough approximation of the element or a row of the inverse matrix or finding a component of the solution vector. We show how the stochastic approximation of the MI can be combined with a deterministic refinement procedure to obtain MI with the required precision and further solve the SLAE using MI. We employ a splitting A = D – C of a given non-singular matrix A, where D is a diagonal dominant matrix and matrix C is a diagonal matrix. In our algorithm for solving SLAE and MI different choices of D can be considered in order to control the norm of matrix T = D –1C, of the resulting SLAE and to minimize the number of the Markov Chains required to reach given precision. Further we run the algorithms on a mini-Grid and investigate their efficiency depending on the granularity. Corresponding experimental results are presented.
Resumo:
In this paper we introduce a new algorithm, based on the successful work of Fathi and Alexandrov, on hybrid Monte Carlo algorithms for matrix inversion and solving systems of linear algebraic equations. This algorithm consists of two parts, approximate inversion by Monte Carlo and iterative refinement using a deterministic method. Here we present a parallel hybrid Monte Carlo algorithm, which uses Monte Carlo to generate an approximate inverse and that improves the accuracy of the inverse with an iterative refinement. The new algorithm is applied efficiently to sparse non-singular matrices. When we are solving a system of linear algebraic equations, Bx = b, the inverse matrix is used to compute the solution vector x = B(-1)b. We present results that show the efficiency of the parallel hybrid Monte Carlo algorithm in the case of sparse matrices.
Resumo:
Large scale air pollution models are powerful tools, designed to meet the increasing demand in different environmental studies. The atmosphere is the most dynamic component of the environment, where the pollutants can be moved quickly on far distnce. Therefore the air pollution modeling must be done in a large computational domain. Moreover, all relevant physical, chemical and photochemical processes must be taken into account. In such complex models operator splitting is very often applied in order to achieve sufficient accuracy as well as efficiency of the numerical solution. The Danish Eulerian Model (DEM) is one of the most advanced such models. Its space domain (4800 × 4800 km) covers Europe, most of the Mediterian and neighboring parts of Asia and the Atlantic Ocean. Efficient parallelization is crucial for the performance and practical capabilities of this huge computational model. Different splitting schemes, based on the main processes mentioned above, have been implemented and tested with respect to accuracy and performance in the new version of DEM. Some numerical results of these experiments are presented in this paper.
Resumo:
Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi-threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full-scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread-safe Java messaging system—MPJ Express. The first application is the Gadget-2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite-domain time-difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd.
Resumo:
As consumers demand more functionality) from their electronic devices and manufacturers supply the demand then electrical power and clock requirements tend to increase, however reassessing system architecture can fortunately lead to suitable counter reductions. To maintain low clock rates and therefore reduce electrical power, this paper presents a parallel convolutional coder for the transmit side in many wireless consumer devices. The coder accepts a parallel data input and directly computes punctured convolutional codes without the need for a separate puncturing operation while the coded bits are available at the output of the coder in a parallel fashion. Also as the computation is in parallel then the coder can be clocked at 7 times slower than the conventional shift-register based convolutional coder (using DVB 7/8 rate). The presented coder is directly relevant to the design of modern low-power consumer devices
Resumo:
This paper is concerned with the uniformization of a system of afine recurrence equations. This transformation is used in the design (or compilation) of highly parallel embedded systems (VLSI systolic arrays, signal processing filters, etc.). In this paper, we present and implement an automatic system to achieve uniformization of systems of afine recurrence equations. We unify the results from many earlier papers, develop some theoretical extensions, and then propose effective uniformization algorithms. Our results can be used in any high level synthesis tool based on polyhedral representation of nested loop computations.
Resumo:
The conformation of a model peptide AAKLVFF based on a fragment of the amyloid beta peptide A beta 16-20, KLVFF, is investigated in methanol and water via solution NMR experiments and Molecular dynamics computer simulations. In previous work, we have shown that AAKLVFF forms peptide nanotubes in methanol and twisted fibrils in water. Chemical shift measurements were used to investigate the solubility of the peptide as a function of concentration in methanol and water. This enabled the determination of critical aggregation concentrations, The Solubility was lower in water. In dilute solution, diffusion coefficients revealed the presence of intermediate aggregates in concentrated solution, coexisting with NMR-silent larger aggregates, presumed to be beta-sheets. In water, diffusion coefficients did not change appreciably with concentration, indicating the presence mainly of monomers, coexisting with larger aggregates in more concentrated solution. Concentration-dependent chemical shift measurements indicated a folded conformation for the monomers/intermediate aggregates in dilute methanol, with unfolding at higher concentration. In water, an antiparallel arrangement of strands was indicated by certain ROESY peak correlations. The temperature-dependent solubility of AAKLVFF in methanol was well described by a van't Hoff analysis, providing a solubilization enthalpy and entropy. This pointed to the importance of solvophobic interactions in the self-assembly process. Molecular dynamics Simulations constrained by NOE values from NMR suggested disordered reverse turn structures for the monomer, with an antiparallel twisted conformation for dimers. To model the beta-sheet structures formed at higher concentration, possible model arrangements of strands into beta-sheets with parallel and antiparallel configurations and different stacking sequences were used as the basis for MD simulations; two particular arrangements of antiparallel beta-sheets were found to be stable, one being linear and twisted and the other twisted in two directions. These structures Were used to simulate Circular dichroism spectra. The roles of aromatic stacking interactions and charge transfer effects were also examined. Simulated spectra were found to be similar to those observed experimentally.(in water or methanol) which show a maximum at 215 or 218 nm due to pi-pi* interactions, when allowance is made for a 15-18 nm red-shift that may be due to light scattering effects.
Resumo:
Nonstructural protein 3 of the severe acute respiratory syndrome (SARS) coronavirus includes a "SARS-unique domain" (SUD) consisting of three globular domains separated by short linker peptide segments. This work reports NMR structure determinations of the C-terminal domain (SUD-C) and a two-domain construct (SUD-MC) containing the middle domain (SUD-M) and the C-terminal domain, and NMR data on the conformational states of the N-terminal domain (SUD-N) and the SUD-NM two-domain construct. Both SUD-N and SUD-NM are monomeric and globular in solution; in SUD-NM, there is high mobility in the two-residue interdomain linking sequence, with no preferred relative orientation of the two domains. SUD-C adopts a frataxin like fold and has structural similarity to DNA-binding domains of DNA-modifying enzymes. The structures of both SUD-M (previously determined) and SUD-C (from the present study) are maintained in SUD-MC, where the two domains are flexibly linked. Gel-shift experiments showed that both SUD-C and SUD-MC bind to single-stranded RNA and recognize purine bases more strongly than pyrimidine bases, whereby SUD-MC binds to a more restricted set of purine-containing RNA sequences than SUD-M. NMR chemical shift perturbation experiments with observations of (15)N-labeled proteins further resulted in delineation of RNA binding sites (i.e., in SUD-M, a positively charged surface area with a pronounced cavity, and in SUD-C, several residues of an anti-parallel beta-sheet). Overall, the present data provide evidence for molecular mechanisms involving the concerted actions of SUD-M and SUD-C, which result in specific RNA binding that might be unique to the SUD and, thus, to the SARS coronavirus.
Resumo:
The work reported in this paper is motivated towards handling single node failures for parallel summation algorithms in computer clusters. An agent based approach is proposed in which a task to be executed is decomposed to sub-tasks and mapped onto agents that traverse computing nodes. The agents intercommunicate across computing nodes to share information during the event of a predicted node failure. Two single node failure scenarios are considered. The Message Passing Interface is employed for implementing the proposed approach. Quantitative results obtained from experiments reveal that the agent based approach can handle failures more efficiently than traditional failure handling approaches.
Resumo:
Proposed is a unique cell histogram architecture which will process k data items in parallel to compute 2q histogram bins per time step. An array of m/2q cells computes an m-bin histogram with a speed-up factor of k; k ⩾ 2 makes it faster than current dual-ported memory implementations. Furthermore, simple mechanisms for conflict-free storing of the histogram bins into an external memory array are discussed.
Resumo:
User interaction within a virtual environment may take various forms: a teleconferencing application will require users to speak to each other (Geak, 1993), with computer supported co-operative working; an Engineer may wish to pass an object to another user for examination; in a battle field simulation (McDonough, 1992), users might exchange fire. In all cases it is necessary for the actions of one user to be presented to the others sufficiently quickly to allow realistic interaction. In this paper we take a fresh look at the approach of virtual reality operating systems by tackling the underlying issues of creating real-time multi-user environments.
Resumo:
The development of architecture and the settlement is central to discussions concerning the Neolithic transformation asthe very visible evidence for the changes in society that run parallel to the domestication of plants and animals. Architecture hasbeen used as an important aspect of models of how the transformation occurred, and as evidence for the sharp difference betweenhunter-gatherer and farming societies. We suggest that the emerging evidence for considerable architectural complexity from theearly Neolithic indicates that some of our interpretations depend too much on a very basic understanding of structures which arenormally seen as being primarily for residential purposes and containing households, which become the organising principle for thenew communities which are often seen as fully sedentary and described as villages. Recent work in southern Jordan suggests that inthis region at least there is little evidence for a standard house, and that structures are constructed for a range of diverse primary purposes other than simple domestic shelters.