50 resultados para Thread safe parallel run-time

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

MPJ Express is a thread-safe Java messaging library that provides a full implementation of the mpiJava 1.2 API specification. This specification defines a MPI-like bindings for the Java language. We have implemented two communication devices as part of our library, the first, called niodev is based on the Java New I/O package and the second, called mxdev is based on the Myrinet eXpress library MPJ Express comes with an experimental runtitne, which allows portable bootstrapping of Java Virtual Machines across a cluster or network of computers. In this paper we describe the implementation of MPJ Express. Also, we present a performance comparison against various other C and Java messaging systems. A beta version of MPJ Express was released in September 2005.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi-threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full-scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread-safe Java messaging system—MPJ Express. The first application is the Gadget-2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite-domain time-difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increasing demand for cheaper-faster-better services anytime and anywhere has made radio network optimisation much more complex than ever before. In order to dynamically optimise the serving network, Dynamic Network Optimisation (DNO), is proposed as the ultimate solution and future trend. The realization of DNO, however, has been hindered by a significant bottleneck of the optimisation speed as the network complexity grows. This paper presents a multi-threaded parallel solution to accelerate complicated proprietary network optimisation algorithms, under a rigid condition of numerical consistency. ariesoACP product from Arieso Ltd serves as the platform for parallelisation. This parallel solution has been benchmarked and results exhibit a high scalability and a run-time reduction by 11% to 42% based on the technology, subscriber density and blocking rate of a given network in comparison with the original version. Further, it is highly essential that the parallel version produces equivalent optimisation quality in terms of identical optimisation outputs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hybrid multiprocessor architectures which combine re-configurable computing and multiprocessors on a chip are being proposed to transcend the performance of standard multi-core parallel systems. Both fine-grained and coarse-grained parallel algorithm implementations are feasible in such hybrid frameworks. A compositional strategy for designing fine-grained multi-phase regular processor arrays to target hybrid architectures is presented in this paper. The method is based on deriving component designs using classical regular array techniques and composing the components into a unified global design. Effective designs with phase-changes and data routing at run-time are characteristics of these designs. In order to describe the data transfer between phases, the concept of communication domain is introduced so that the producer–consumer relationship arising from multi-phase computation can be treated in a unified way as a data routing phase. This technique is applied to derive new designs of multi-phase regular arrays with different dataflow between phases of computation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A rapid capillary electrophoresis method was developed simultaneously to determine artificial sweeteners, preservatives and colours used as additives in carbonated soft drinks. Resolution between all additives occurring together in soft drinks was successfully achieved within a 15-min run-time by employing the micellar electrokinetic chromatography mode with a 20 mM carbonate buffer at pH 9.5 as the aqueous phase and 62 mM sodium dodecyl sulfate as the micellar phase. By using a diode-array detector to monitor the UV-visible range (190-600 nm), the identity of sample components, suggested by migration time, could be confirmed by spectral matching relative to standards.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Whole fresh goat's milk was heat treated at 135 degrees C for 4 s using a miniature UHT plant. The temperature of the milk in the preheating and sterilizer sections, and the milk flow rate were monitored to evaluate the overall heat transfer coefficient (OHTC). The decrease in OHTC was used to estimate the extent of fouling. Goat's milk fouled very quickly and run times of the UHT plant were short. The use of sodium hexametaphosphate, trisodium citrate and cation exchange resins to reduce ionic calcium prior to UHT processing, increased the pH and alcohol stability of the milk and markedly increased the run time of the UHT plant.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pullpipelining, a pipeline technique where data is pulled from successor stages from predecessor stages is proposed Control circuits using a synchronous, a semi-synchronous and an asynchronous approach are given. Simulation examples for a DLX generic RISC datapath show that common control pipeline circuit overhead is avoided using the proposal. Applications to linear systolic arrays in cases when computation is finished at early stages in the array are foreseen. This would allow run-time data-driven digital frequency modulation of synchronous pipelined designs. This has applications to implement algorithms exhibiting average-case processing time using a synchronous approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A reconfigurable scalar quantiser capable of accepting n-bit input data is presented. The data length n can be varied in the range 1... N-1 under partial-run time reconfiguration, p-RTR. Issues as improvement in throughput using this reconfigurable quantiser of p-RTR against RTR for data of variable length are considered. The quantiser design referred to as the priority quantiser PQ is then compared against a direct design of the quantiser DIQ. It is then evaluated that for practical quantiser sizes, PQ shows better area usage when both are targeted onto the same FPGA. Other benefits are also identified.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The real-time parallel computation of histograms using an array of pipelined cells is proposed and prototyped in this paper with application to consumer imaging products. The array operates in two modes: histogram computation and histogram reading. The proposed parallel computation method does not use any memory blocks. The resulting histogram bins can be stored into an external memory block in a pipelined fashion for subsequent reading or streaming of the results. The array of cells can be tuned to accommodate the required data path width in a VLSI image processing engine as present in many imaging consumer devices. Synthesis of the architectures presented in this paper in FPGA are shown to compute the real-time histogram of images streamed at over 36 megapixels at 30 frames/s by processing in parallel 1, 2 or 4 pixels per clock cycle.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper exploits a structural time series approach to model the time pattern of multiple and resurgent food scares and their direct and cross-product impacts on consumer response. A structural time series Almost Ideal Demand System (STS-AIDS) is embedded in a vector error correction framework to allow for dynamic effects (VEC-STS-AIDS). Italian aggregate household data on meat demand is used to assess the time-varying impact of a resurgent BSE crisis (1996 and 2000) and the 1999 Dioxin crisis. The VEC-STS-AIDS model monitors the short-run impacts and performs satisfactorily in terms of residuals diagnostics, overcoming the major problems encountered by the customary vector error correction approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The expression of proteins using recombinant baculoviruses is a mature and widely used technology. However, some aspects of the technology continue to detract from high throughput use and the basis of the final observed expression level is poorly understood. Here, we describe the design and use of a set of vectors developed around a unified cloning strategy that allow parallel expression of target proteins in the baculovirus system as N-terminal or C-terminal fusions. Using several protein kinases as tests we found that amino-terminal fusion to maltose binding protein rescued expression of the poorly expressed human kinase Cot but had only a marginal effect on expression of a well-expressed kinase IKK-2. In addition, MBP fusion proteins were found to be secreted from the expressing cell. Use of a carboxyl-terminal GFP tagging vector showed that fluorescence measurement paralleled expression level and was a convenient readout in the context of insect cell expression, an observation that was further supported with additional non-kinase targets. The expression of the target proteins using the same vectors in vitro showed that differences in expression level were wholly dependent on the environment of the expressing cell and an investigation of the time course of expression showed it could affect substantially the observed expression level for poorly but not well-expressed proteins. Our vector suite approach shows that rapid expression survey can be achieved within the baculovirus system and in addition, goes some way to identifying the underlying basis of the expression level obtained. (c) 2006 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Time-resolved kinetic studies of the reaction of silylene, SiH2, generated by laser flash photolysis of phenylsilane, have been carried out to obtain rate constants for its bimolecular reaction with HCL The reaction was studied in the gas phase at 10 Torr total pressure in SF6 bath gas, at five temperatures in the range of 296-611 K. The second-order rate constants fitted the Arrhenius equation: log(k/cm(3) molecule(-1) s(-1)) = (-11.51 +/- 0.06) + (1.92 +/- 0.47 kJ mol(-1))/RTIn10 Experiments at other pressures showed that these rate constants were unaffected by pressure in the range of 10-100 Torr, but showed small decreases in value of no more than 20% ( +/- 10%) at I Toff, at both the highest and lowest temperatures. The data are consistent with formation of an initial weakly bound donor-acceptor complex, which reacts by two parallel pathways. The first is by chlorine-to-silicon H-shift to make vibrationally excited chlorosilane, SiH3Cl*, which yields HSiCl by H-2 elimination from silicon. In the second pathway, the complex proceeds via H-2 elimination (4-center process) to make chlorosilylene, HSiCl, directly. This interpretation is supported by ab initio quantum calculations carried out at the G3 level which reveal the direct H-2 elimination route for the first time. RRKM modeling predicts the approximate magnitude of the pressure effect but is unable to determine the proportions of each pathway. The experimental data agree with the only previous measurements at room temperature. Comparisons with other reactions of SiH2 are also drawn.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of the present study was to compare the response of a range of atherogenic and thrombogenic risk markers to two dietary levels of saturated fatty acid (SFA) substitution with monounsaturated fatty acids (MUFA) in students living in a university hall of residence. Although the benefits of such diets have been reported for plasma lipoproteins in high-risk groups, more needs to be known about effects of more modest SFA-MUFA substitutions over the long term and in young healthy adults. In a parallel design over 16 weeks, fifty-one healthy young subjects were randomised to one of two diets: (1) a moderate-MUFA diet in which 16 g dietary SFA/100 g total fatty acids were substituted with MUFA (n 25); (2) a high-MUFA diet in which 33 g dietary SFA/100 g total fatty acids were substituted with MUFA (n 26). All subjects followed an 8-week run-in diet (reference diet), with a fatty acid composition close to the UK average values. There were no differences in plasma lipid responses between the two diets over 16 weeks of the study with similar reductions in total cholesterol (P<0.001) and LDL-cholesterol (P<0.01) in both groups; a small but significant reduction in HDL-cholesterol was also observed in both groups (P<0.01). Platelet responses to ADP (P<0.01) and arachidonic acid (P<0.05) differed with time on the two diets; at 16 weeks, platelet aggregatory response to ADP was significantly lower on the high-MUFA than the moderate-MUFA (P<0.01) diet; ADP responses were also significantly lower within this group at 8 (P< 0.05) and 16 (P< 0.01) weeks compared with baseline. There were no differences in fasting factor VII activity (factors VIII and VIIag), fibrinogen concentration or tissue-type plasminogen activator activity between the diets. There were no differences in postprandial factor VIII responses to a standard meal (area under the curve) between the diets after 16 weeks, but postprandial factor VIII response was lower than on the high-MUFA diet compared with baseline (P<0.01). In conclusion, a high-MUFA diet sustains potentially beneficial effects on platelet aggregation and postprandial activation of factor VII. Moderate or high substitution of MUFA for SFA achieves similar reductions in fasting blood lipids in young healthy subjects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA., Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance or the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.