336 resultados para MPI


Relevância:

10.00% 10.00%

Publicador:

Resumo:

A simulation program has been developed to calculate the power-spectral density of thin avalanche photodiodes, which are used in optical networks. The program extends the time-domain analysis of the dead-space multiplication model to compute the autocorrelation function of the APD impulse response. However, the computation requires a large amount of memory space and is very time consuming. We describe our experiences in parallelizing the code using both MPI and OpenMP. Several array partitioning schemes and scheduling policies are implemented and tested Our results show that the OpenMP code is scalable up to 64 processors on an SGI Origin 2000 machine and has small average errors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An important factor for high-speed optical communication is the availability of ultrafast and low-noise photodetectors. Among the semiconductor photodetectors that are commonly used in today’s long-haul and metro-area fiber-optic systems, avalanche photodiodes (APDs) are often preferred over p-i-n photodiodes due to their internal gain, which significantly improves the receiver sensitivity and alleviates the need for optical pre-amplification. Unfortunately, the random nature of the very process of carrier impact ionization, which generates the gain, is inherently noisy and results in fluctuations not only in the gain but also in the time response. Recently, a theory characterizing the autocorrelation function of APDs has been developed by us which incorporates the dead-space effect, an effect that is very significant in thin, high-performance APDs. The research extends the time-domain analysis of the dead-space multiplication model to compute the autocorrelation function of the APD impulse response. However, the computation requires a large amount of memory space and is very time consuming. In this research, we describe our experiences in parallelizing the code in MPI and OpenMP using CAPTools. Several array partitioning schemes and scheduling policies are implemented and tested. Our results show that the code is scalable up to 64 processors on a SGI Origin 2000 machine and has small average errors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The intrinsic independent features of the optimal codebook cubes searching process in fractal video compression systems are examined and exploited. The design of a suitable parallel algorithm reflecting the concept is presented. The Message Passing Interface (MPI) is chosen to be the communication tool for the implementation of the parallel algorithm on distributed memory parallel computers. Experimental results show that the parallel algorithm is able to reduce the compression time and achieve a high speed-up without changing the compression ratio and the quality of the decompressed image. A scalability test was also performed, and the results show that this parallel algorithm is scalable.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A FORTRAN 90 program is presented which calculates the total cross sections, and the electron energy spectra of the singly and doubly differential cross sections for the single target ionization of neutral atoms ranging from hydrogen up to and including argon. The code is applicable for the case of both high and low Z projectile impact in fast ion-atom collisions. The theoretical models provided for the program user are based on two quantum mechanical approximations which have proved to be very successful in the study of ionization in ion-atom collisions. These are the continuum-distorted-wave (CDW) and continuum-distorted-wave eikonal-initial-state (CDW-EIS) approximations. The codes presented here extend previously published. codes for single ionization of. target hydrogen [Crothers and McCartney, Comput. Phys. Commun. 72 (1992) 288], target helium [Nesbitt, O'Rourke and Crothers, Comput. Phys. Commun. 114 (1998) 385] and target atoms ranging from lithium to neon [O'Rourke, McSherry and Crothers, Comput. Phys. Commun. 131 (2000) 129]. Cross sections for all of these target atoms may be obtained as limiting cases from the present code. Title of program: ARGON Catalogue identifier: ADSE Program summary URL: http://cpc.cs.qub.ac.uk/cpc/summaries/ADSE Program obtainable from: CPC Program Library Queen's University of Belfast, N. Ireland Licensing provisions: none Computer for which the program is designed and others on which it is operable: Computers: Four by 200 MHz Pro Pentium Linux server, DEC Alpha 21164; Four by 400 MHz Pentium 2 Xeon 450 Linux server, IBM SP2 and SUN Enterprise 3500 Installations: Queen's University, Belfast Operating systems under which the program has been tested: Red-hat Linux 5.2, Digital UNIX Version 4.0d, AIX, Solaris SunOS 5.7 Compilers: PGI workstations, DEC CAMPUS Programming language used: FORTRAN 90 with MPI directives No. of bits in a word: 64, except on Linux servers 32 Number of processors used: any number Has the code been vectorized or parallelized? Parallelized using MPI No. of bytes in distributed program, including test data, etc.: 32 189 Distribution format: tar gzip file Keywords: Single ionization, cross sections, continuum-distorted-wave model, continuum- distorted-wave eikonal-initial-state model, target atoms, wave treatment Nature of physical problem: The code calculates total, and differential cross sections for the single ionization of target atoms ranging from hydrogen up to and including argon by both light and heavy ion impact. Method of solution: ARGON allows the user to calculate the cross sections using either the CDW or CDW-EIS [J. Phys. B 16 (1983) 3229] models within the wave treatment. Restrictions on the complexity of the program: Both the CDW and CDW-EIS models are two-state perturbative approximations. Typical running time: Times vary according to input data and number of processors. For one processor the test input data for double differential cross sections (40 points) took less than one second, whereas the test input for total cross sections (20 points) took 32 minutes. Unusual features of the program: none (C) 2003 Elsevier B.V All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Aim: The aim of this study is to assess the murine heart of normal embryos, neonates, and juveniles using high-frequency ultrasound. Methods: Diastolic function was measured with E/A ratio (E wave velocity/A wave velocity) and isovolumetric relaxation time (IRT), systolic function with isovolumetric contraction time (ICT), percentage fractional shortening (FS%), percentage ejection fraction (EF%). Global cardiac performance was quantified using myocardial performance index (MPI). Results: Isovolumetric relaxation time remained stable from E10.5 to 3 weeks. Systolic function (ICT) improved with gestation and remained stable from E18.5 onward. Myocardial performance index showed improvement in embryonic lift (0.82-0.63) and then stabilized from 1 to 3 week (0.60-0.58). Percentage ejection fraction remained high during gestation (77%-69%) and then decreased from the neonate to juvenile (68%-51%). Conclusion: The ultrasound biomicroscope allows for noninvasive in-depth assessment of cardiac function of embryos and pups. Detailed physiological and functional cardiac function readouts can be obtained, which is invaluable for comparison to mouse models of disease.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Performance evaluation of parallel software and architectural exploration of innovative hardware support face a common challenge with emerging manycore platforms: they are limited by the slow running time and the low accuracy of software simulators. Manycore FPGA prototypes are difficult to build, but they offer great rewards. Software running on such prototypes runs orders of magnitude faster than current simulators. Moreover, researchers gain significant architectural insight during the modeling process. We use the Formic FPGA prototyping board [1], which specifically targets scalable and cost-efficient multi-board prototyping, to build and test a 64-board model of a 512-core, MicroBlaze-based, non-coherent hardware prototype with a full network-on-chip in a 3D-mesh topology. We expand the hardware architecture to include the ARM Versatile Express platforms and build a 520-core heterogeneous prototype of 8 Cortex-A9 cores and 512 MicroBlaze cores. We then develop an MPI library for the prototype and evaluate it extensively using several bare-metal and MPI benchmarks. We find that our processor prototype is highly scalable, models faithfully single-chip multicore architectures, and is a very efficient platform for parallel programming research, being 50,000 times faster than software simulation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Enhancing sampling and analyzing simulations are central issues in molecular simulation. Recently, we introduced PLUMED, an open-source plug-in that provides some of the most popular molecular dynamics (MD) codes with implementations of a variety of different enhanced sampling algorithms and collective variables (CVs). The rapid changes in this field, in particular new directions in enhanced sampling and dimensionality reduction together with new hardware, require a code that is more flexible and more efficient. We therefore present PLUMED 2 here a,complete rewrite of the code in an object-oriented programming language (C++). This new version introduces greater flexibility and greater modularity, which both extends its core capabilities and makes it far easier to add new methods and CVs. It also has a simpler interface with the MD engines and provides a single software library containing both tools and core facilities. Ultimately, the new code better serves the ever-growing community of users and contributors in coping with the new challenges arising in the field.

Program summary

Program title: PLUMED 2

Catalogue identifier: AEEE_v2_0

Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEEE_v2_0.html

Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland

Licensing provisions: Yes

No. of lines in distributed program, including test data, etc.: 700646

No. of bytes in distributed program, including test data, etc.: 6618136

Distribution format: tar.gz

Programming language: ANSI-C++.

Computer: Any computer capable of running an executable produced by a C++ compiler.

Operating system: Linux operating system, Unix OSs.

Has the code been vectorized or parallelized?: Yes, parallelized using MPI.

RAM: Depends on the number of atoms, the method chosen and the collective variables used.

Classification: 3, 7.7, 23. Catalogue identifier of previous version: AEEE_v1_0.

Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 1961.

External routines: GNU libmatheval, Lapack, Bias, MPI. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present DRASync, a region-based allocator that implements a global address space abstraction for MPI programs with pointer-based data structures. The main features of DRASync are: (a) it amortizes communication among nodes to allow efficient parallel allocation in a global address space; (b) it takes advantage of bulk deallocation and good locality with pointer-based data structures; (c) it supports ownership semantics of regions by nodes akin to reader–writer locks, which makes for a high-level, intuitive synchronization tool in MPI programs, without sacrificing message-passing performance. We evaluate DRASync against a state-of-the-art distributed allocator and find that it produces comparable performance while offering a higher-level abstraction to programmers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tese de doutoramento, Bioquimica, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2015

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2012

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

INTRODUCTION: Diabetic patients are at high risk for coronary artery disease (CAD), which is the leading cause of death in this population. The Swiss Society of Endocrinology-Diabetology (SSED) recommends CAD screening for diabetic patients with > or = 2 additional cardiovascular risk factors (CVRF), by stress echocardiography (SE) or myocardial perfusion imaging (MPI). The aim of this study was to assess the application of these guidelines and the treatment of CVRF in the diabetes outpatient clinics of the five Swiss University Hospitals. METHODS: The study was initiated in Lausanne and the study questionnaires were circulated to the endocrinologists of the five Swiss University Hospitals. Practitioners were asked to include consecutive patients attending the diabetes outpatient clinics over one month. Prevalence of CAD, screening methods for CAD, prevalence of CVRF, biological analyses over the last 6 months and medical therapy were recorded. RESULTS: A total of 302 subjects were included. The mean age was 53 +/- 14 years, 68% had type 2 diabetes, 27% type 1 and 5% other types. Among T2DM with > or = 2 CVRF, 45% were screened for CAD according to SSED guidelines. In T2DM 25% had blood pressure < or = 130/80 mm Hg, 15% a lipid profile within target, 23% HbA1c < or = 7.0%. Overall, 2% achieved all 3 targets. CONCLUSIONS: Only 45% of T2DM with > or = 2 CVRF were screened for CAD according to SSED guidelines and 2% of T2DM had proper control over all CVRF. Efforts are still necessary to improve CAD prevention and screening of diabetic patients in Swiss University Hospitals.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ce mémoire présente une implantation de la création paresseuse de tâches desti- née à des systèmes multiprocesseurs à mémoire distribuée. Elle offre un sous-ensemble des fonctionnalités du Message-Passing Interface et permet de paralléliser certains problèmes qui se partitionnent difficilement de manière statique grâce à un système de partitionnement dynamique et de balancement de charge. Pour ce faire, il se base sur le langage Multilisp, un dialecte de Scheme orienté vers le traitement parallèle, et implante sur ce dernier une interface semblable à MPI permettant le calcul distribué multipro- cessus. Ce système offre un langage beaucoup plus riche et expressif que le C et réduit considérablement le travail nécessaire au programmeur pour pouvoir développer des programmes équivalents à ceux en MPI. Enfin, le partitionnement dynamique permet de concevoir des programmes qui seraient très complexes à réaliser sur MPI. Des tests ont été effectués sur un système local à 16 processeurs et une grappe à 16 processeurs et il offre de bonnes accélérations en comparaison à des programmes séquentiels équiva- lents ainsi que des performances acceptables par rapport à MPI. Ce mémoire démontre que l’usage des futures comme technique de partitionnement dynamique est faisable sur des multiprocesseurs à mémoire distribuée.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes our plans to evaluate the present state of affairs concerning parallel programming and its systems. Three subprojects are proposed: a survey among programmers and scientists, a comparison of parallel programming systems using a standard set of test programs, and a wiki resource for the parallel programming community - the Parawiki. We would like to invite you to participate and turn these subprojects into true community efforts.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this publication, we report on an online survey that was carried out among parallel programmers. More than 250 people worldwide have submitted answers to our questions, and their responses are analyzed here. Although not statistically sound, the data we provide give useful insights about which parallel programming systems and languages are known and in actual use. For instance, the collected data indicate that for our survey group MPI and (to a lesser extent) C are the most widely used parallel programming system and language, respectively.