38 resultados para MPI


Relevância:

10.00% 10.00%

Publicador:

Resumo:

MPJ Express is our implementation of MPI-like bindings for Java. In this paper we discuss our intermediate buffering layer that makes use of the so-called direct byte buffers introduced in the Java New I/O package. The purpose of this layer is to support the implementation of derived datatypes. MPJ Express is the first Java messaging library that implements this feature using pure Java. In addition, this buffering layer allows efficient implementation of communication devices based on proprietary networks such as Myrinet. In this paper we evaluate the performance of our buffering layer and demonstrate the usefulness of direct byte buffers. Also, we evaluate the performance of MPJ Express against other messaging systems using Myrinet and show that our buffering layer has made it possible to avoid the overheads suffered by other Java systems such as mpiJava that relies on the Java Native Interface.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MPJ Express is our implementation of MPI-like bindings for Java. In this paper we discuss our intermediate buffering layer that makes use of the so-called direct byte buffers introduced in the Java New I/O package. The purpose of this layer is to support the implementation of derived datatypes. MPJ Express is the first Java messaging library that implements this feature using pure Java. In addition, this buffering layer allows efficient implementation of communication devices based on proprietary networks such as Myrinet. In this paper we evaluate the performance of our buffering layer and demonstrate the usefulness of direct byte buffers. Also, we evaluate the performance of MPJ Express against other messaging systems using Myrinet and show that our buffering layer has made it possible to avoid the overheads suffered by other Java systems such as mpiJava that relies on the Java Native Interface.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi-threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full-scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread-safe Java messaging system—MPJ Express. The first application is the Gadget-2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite-domain time-difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the transition to multicore processors almost complete, the parallel processing community is seeking efficient ways to port legacy message passing applications on shared memory and multicore processors. MPJ Express is our reference implementation of Message Passing Interface (MPI)-like bindings for the Java language. Starting with the current release, the MPJ Express software can be configured in two modes: the multicore and the cluster mode. In the multicore mode, parallel Java applications execute on shared memory or multicore processors. In the cluster mode, Java applications parallelized using MPJ Express can be executed on distributed memory platforms like compute clusters and clouds. The multicore device has been implemented using Java threads in order to satisfy two main design goals of portability and performance. We also discuss the challenges of integrating the multicore device in the MPJ Express software. This turned out to be a challenging task because the parallel application executes in a single JVM in the multicore mode. On the contrary in the cluster mode, the parallel user application executes in multiple JVMs. Due to these inherent architectural differences between the two modes, the MPJ Express runtime is modified to ensure correct semantics of the parallel program. Towards the end, we compare performance of MPJ Express (multicore mode) with other C and Java message passing libraries---including mpiJava, MPJ/Ibis, MPICH2, MPJ Express (cluster mode)---on shared memory and multicore processors. We found out that MPJ Express performs signicantly better in the multicore mode than in the cluster mode. Not only this but the MPJ Express software also performs better in comparison to other Java messaging libraries including mpiJava and MPJ/Ibis when used in the multicore mode on shared memory or multicore processors. We also demonstrate effectiveness of the MPJ Express multicore device in Gadget-2, which is a massively parallel astrophysics N-body siimulation code.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

How can a bridge be built between autonomic computing approaches and parallel computing systems? The work reported in this paper is motivated towards bridging this gap by proposing a swarm-array computing approach based on ‘Intelligent Agents’ to achieve autonomy for distributed parallel computing systems. In the proposed approach, a task to be executed on parallel computing cores is carried onto a computing core by carrier agents that can seamlessly transfer between processing cores in the event of a predicted failure. The cognitive capabilities of the carrier agents on a parallel processing core serves in achieving the self-ware objectives of autonomic computing, hence applying autonomic computing concepts for the benefit of parallel computing systems. The feasibility of the proposed approach is validated by simulation studies using a multi-agent simulator on an FPGA (Field-Programmable Gate Array) and experimental studies using MPI (Message Passing Interface) on a computer cluster. Preliminary results confirm that applying autonomic computing principles to parallel computing systems is beneficial.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Processor virtualization for process migration in distributed parallel computing systems has formed a significant component of research on load balancing. In contrast, the potential of processor virtualization for fault tolerance has been addressed minimally. The work reported in this paper is motivated towards extending concepts of processor virtualization towards ‘intelligent cores’ as a means to achieve fault tolerance in distributed parallel computing systems. Intelligent cores are an abstraction of the hardware processing cores, with the incorporation of cognitive capabilities, on which parallel tasks can be executed and migrated. When a processing core executing a task is predicted to fail the task being executed is proactively transferred onto another core. A parallel reduction algorithm incorporating concepts of intelligent cores is implemented on a computer cluster using Adaptive MPI and Charm ++. Preliminary results confirm the feasibility of the approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND & AIMS: We studied the role of protease-activated receptor 2 (PAR(2)) and its activating enzymes, trypsins and tryptase, in Clostridium difficile toxin A (TxA)-induced enteritis. METHODS: We injected TxA into ileal loops in PAR(2) or dipeptidyl peptidase I (DPPI) knockout mice or in wild-type mice pretreated with tryptase inhibitors (FUT-175 or MPI-0442352) or soybean trypsin inhibitor. We examined the effect of TxA on expression and activity of PAR(2) and trypsin IV messenger RNA in the ileum and cultured colonocytes. We injected activating peptide (AP), trypsins, tryptase, and p23 in wild-type mice, some pretreated with the neurokinin 1 receptor antagonist SR140333. RESULTS: TxA increased fluid secretion, myeloperoxidase activity in fluid and tissue, and histologic damage. PAR(2) deletion decreased TxA-induced ileitis, reduced luminal fluid secretion by 20%, decreased tissue and fluid myeloperoxidase by 50%, and diminished epithelial damage, edema, and neutrophil infiltration. DPPI deletion reduced secretion by 20% and fluid myeloperoxidase by 55%. In wild-type mice, FUT-175 or MPI-0442352 inhibited secretion by 24%-28% and tissue and fluid myeloperoxidase by 31%-71%. Soybean trypsin inhibitor reduced secretion to background levels and tissue myeloperoxidase by up to 50%. TxA increased expression of PAR(2) and trypsin IV in enterocytes and colonocytes and caused a 2-fold increase in Ca(2+) responses to PAR(2) AP. AP, tryptase, and trypsin isozymes (trypsin I/II, trypsin IV, p23) caused ileitis. SR140333 prevented AP-induced ileitis. CONCLUSIONS: PAR(2) and its activators are proinflammatory in TxA-induced enteritis. TxA stimulates existing PAR(2) and up-regulates PAR(2) and activating proteases, and PAR(2) causes inflammation by neurogenic mechanisms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Java is becoming an increasingly popular language for developing distributed and parallel scientific and engineering applications. Jini is a Java-based infrastructure developed by Sun that can allegedly provide all the services necessary to support distributed applications. It is the aim of this paper to explore and investigate the services and properties that Jini actually provides and match these against the needs of high performance distributed and parallel applications written in Java. The motivation for this work is the need to develop a distributed infrastructure to support an MPI-like interface to Java known as MPJ. In the first part of the paper we discuss the needs of MPJ, the parallel environment that we wish to support. In particular we look at aspects such as reliability and ease of use. We then move on to sketch out the Jini architecture and review the components and services that Jini provides. In the third part of the paper we critically explore a Jini infrastructure that could be used to support MPJ. Here we are particularly concerned with Jini's ability to support reliably a cocoon of MPJ processes executing in a heterogeneous envirnoment. In the final part of the paper we summarise our findings and report on future work being undertaken on Jini and MPJ.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ensembles of extended Atmospheric Model Intercomparison Project (AMIP) runs from the general circulation models of the National Centers for Environmental Prediction (formerly the National Meteorological Center) and the Max-Planck Institute (Hamburg, Germany) are used to estimate the potential predictability (PP) of an index of the Pacific–North America (PNA) mode of climate change. The PP of this pattern in “perfect” prediction experiments is 20%–25% of the index’s variance. The models, particularly that from MPI, capture virtually all of this variance in their hindcasts of the winter PNA for the period 1970–93. The high levels of internally generated model noise in the PNA simulations reconfirm the need for an ensemble averaging approach to climate prediction. This means that the forecasts ought to be expressed in a probabilistic manner. It is shown that the models’ skills are higher by about 50% during strong SST events in the tropical Pacific, so the probabilistic forecasts need to be conditional on the tropical SST. Taken together with earlier studies, the present results suggest that the original set of AMIP integrations (single 10-yr runs) is not adequate to reliably test the participating models’ simulations of interannual climate variability in the midlatitudes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A statistical–dynamical downscaling (SDD) approach is applied to determine present day and future high-resolution rainfall distributions in the catchment of the river Aksu at the southern slopes of the Tienshan Mountains, Central Asia. First, a circulation weather type (CWT) classification is employed to define typical lower atmospheric flow regimes from ERA-40 reanalysis data. Selected representatives of each CWT are dynamically downscaled with the regional climate model COSMO-CLM 4.8 at a horizontal grid resolution of 0.0625°, using the ERA-40 reanalysis data as boundary conditions. Finally, the simulated representatives are recombined to obtain a high-resolution rainfall climatology for present day climate. The methodology is also applied to ensemble simulations of three different scenarios of the global climate model ECHAM5/MPI-OM1 to derive projections of rainfall changes until 2100. Comparisons of downscaled seasonal and annual rainfall with observational data suggest that the statistical–dynamical approach is appropriate to capture the observed present-day precipitation climatology over the low lands and the first elevations of the Tienshan Mountains. On the other hand, a strong bias is found at higher altitudes, where precipitation is clearly underestimated by SDD. The application of SDD to the ECHAM5/MPI-OM1 ensemble reveals that precipitation changes by the end of the 21st century depend on the season. While for autumn an increase of seasonal precipitation is found for all simulations, a decrease in precipitation is obtained during winter for most parts of the Aksu catchment. The spread between different ECHAM5/MPI-OM1 ensemble members is strongest in spring, where trends of opposite sign are found. The largest changes in rainfall are simulated for the summer season, which also shows the most pronounced spatial heterogeneity. Most ECHAM5/MPI-OM1 realizations indicate a decrease of annual precipitation over large parts of the Tienshan, and an increase restricted to the southeast of the study area. These results provide a good basis for downscaling present-day and future rainfall distributions for hydrological purposes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Precipitation indices are commonly used as climate change indicators. Considering four Climate Variability and Predictability-recommended indices, this study assesses possible changes in their spatial patterns over Portugal under future climatic conditions. Precipitation data from the regional climate model Consortium for Small-Scale Modelling–Climate version of the Local Model (CCLM) ensemble simulations with ECHAM5/MPI-OM1 boundary conditions are used for this purpose. For recent–past, medians and probability density functions of the CCLM-based indices are validated against station-based and gridded observational dataset from ENSEMBLES-based (gridded daily precipitation data provided by the European Climate Assessment & Dataset project) indices. It is demonstrated that the model is able to realistically reproduce not only precipitation but also the corresponding extreme indices. Climate change projections for 2071–2100 (A1B and B1 SRES scenarios) reveal significant decreases in total precipitation, particularly in autumn over northwestern and southern Portugal, though changes exhibit distinct local and seasonal patterns and are typically stronger for A1B than for B1. The increase in winter precipitation over northeastern Portugal in A1B is the most important exception to the overall drying trend. Contributions of extreme precipitation events to total precipitation are also expected to increase, mainly in winter and spring over northeastern Portugal. Strong projected increases in the dry spell lengths in autumn and spring are also noteworthy, giving evidence for an extension of the dry season from summer to spring and autumn. Although no coupling analysis is undertaken, these changes are qualitatively related to modifications in the large-scale circulation over the Euro-Atlantic area, more specifically to shifts in the position of the Azores High and associated changes in the large-scale pressure gradient over the area.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present projections of winter storm-induced insured losses in the German residential building sector for the 21st century. With this aim, two structurally most independent downscaling methods and one hybrid downscaling method are applied to a 3-member ensemble of ECHAM5/MPI-OM1 A1B scenario simulations. One method uses dynamical downscaling of intense winter storm events in the global model, and a transfer function to relate regional wind speeds to losses. The second method is based on a reshuffling of present day weather situations and sequences taking into account the change of their frequencies according to the linear temperature trends of the global runs. The third method uses statistical-dynamical downscaling, considering frequency changes of the occurrence of storm-prone weather patterns, and translation into loss by using empirical statistical distributions. The A1B scenario ensemble was downscaled by all three methods until 2070, and by the (statistical-) dynamical methods until 2100. Furthermore, all methods assume a constant statistical relationship between meteorology and insured losses and no developments other than climate change, such as in constructions or claims management. The study utilizes data provided by the German Insurance Association encompassing 24 years and with district-scale resolution. Compared to 1971–2000, the downscaling methods indicate an increase of 10-year return values (i.e. loss ratios per return period) of 6–35 % for 2011–2040, of 20–30 % for 2041–2070, and of 40–55 % for 2071–2100, respectively. Convolving various sources of uncertainty in one confidence statement (data-, loss model-, storm realization-, and Pareto fit-uncertainty), the return-level confidence interval for a return period of 15 years expands by more than a factor of two. Finally, we suggest how practitioners can deal with alternative scenarios or possible natural excursions of observed losses.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.