10 resultados para Running time
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
A FORTRAN 90 program is presented which calculates the total cross sections, and the electron energy spectra of the singly and doubly differential cross sections for the single target ionization of neutral atoms ranging from hydrogen up to and including argon. The code is applicable for the case of both high and low Z projectile impact in fast ion-atom collisions. The theoretical models provided for the program user are based on two quantum mechanical approximations which have proved to be very successful in the study of ionization in ion-atom collisions. These are the continuum-distorted-wave (CDW) and continuum-distorted-wave eikonal-initial-state (CDW-EIS) approximations. The codes presented here extend previously published. codes for single ionization of. target hydrogen [Crothers and McCartney, Comput. Phys. Commun. 72 (1992) 288], target helium [Nesbitt, O'Rourke and Crothers, Comput. Phys. Commun. 114 (1998) 385] and target atoms ranging from lithium to neon [O'Rourke, McSherry and Crothers, Comput. Phys. Commun. 131 (2000) 129]. Cross sections for all of these target atoms may be obtained as limiting cases from the present code. Title of program: ARGON Catalogue identifier: ADSE Program summary URL: http://cpc.cs.qub.ac.uk/cpc/summaries/ADSE Program obtainable from: CPC Program Library Queen's University of Belfast, N. Ireland Licensing provisions: none Computer for which the program is designed and others on which it is operable: Computers: Four by 200 MHz Pro Pentium Linux server, DEC Alpha 21164; Four by 400 MHz Pentium 2 Xeon 450 Linux server, IBM SP2 and SUN Enterprise 3500 Installations: Queen's University, Belfast Operating systems under which the program has been tested: Red-hat Linux 5.2, Digital UNIX Version 4.0d, AIX, Solaris SunOS 5.7 Compilers: PGI workstations, DEC CAMPUS Programming language used: FORTRAN 90 with MPI directives No. of bits in a word: 64, except on Linux servers 32 Number of processors used: any number Has the code been vectorized or parallelized? Parallelized using MPI No. of bytes in distributed program, including test data, etc.: 32 189 Distribution format: tar gzip file Keywords: Single ionization, cross sections, continuum-distorted-wave model, continuum- distorted-wave eikonal-initial-state model, target atoms, wave treatment Nature of physical problem: The code calculates total, and differential cross sections for the single ionization of target atoms ranging from hydrogen up to and including argon by both light and heavy ion impact. Method of solution: ARGON allows the user to calculate the cross sections using either the CDW or CDW-EIS [J. Phys. B 16 (1983) 3229] models within the wave treatment. Restrictions on the complexity of the program: Both the CDW and CDW-EIS models are two-state perturbative approximations. Typical running time: Times vary according to input data and number of processors. For one processor the test input data for double differential cross sections (40 points) took less than one second, whereas the test input for total cross sections (20 points) took 32 minutes. Unusual features of the program: none (C) 2003 Elsevier B.V All rights reserved.
Resumo:
This paper concerns randomized leader election in synchronous distributed networks. A distributed leader election algorithm is presented for complete n-node networks that runs in O(1) rounds and (with high probability) takes only O(n-vlog3/2n) messages to elect a unique leader (with high probability). This algorithm is then extended to solve leader election on any connected non-bipartiten-node graph G in O(t(G)) time and O(t(G)n-vlog3/2n) messages, where t(G) is the mixing time of a random walk on G. The above result implies highly efficient (sublinear running time and messages) leader election algorithms for networks with small mixing times, such as expanders and hypercubes. In contrast, previous leader election algorithms had at least linear message complexity even in complete graphs. Moreover, super-linear message lower bounds are known for time-efficientdeterministic leader election algorithms. Finally, an almost-tight lower bound is presented for randomized leader election, showing that O(n-v) messages are needed for any O(1) time leader election algorithm which succeeds with high probability. It is also shown that O(n 1/3) messages are needed by any leader election algorithm that succeeds with high probability, regardless of the number of the rounds. We view our results as a step towards understanding the randomized complexity of leader election in distributed networks.
Resumo:
Performance evaluation of parallel software and architectural exploration of innovative hardware support face a common challenge with emerging manycore platforms: they are limited by the slow running time and the low accuracy of software simulators. Manycore FPGA prototypes are difficult to build, but they offer great rewards. Software running on such prototypes runs orders of magnitude faster than current simulators. Moreover, researchers gain significant architectural insight during the modeling process. We use the Formic FPGA prototyping board [1], which specifically targets scalable and cost-efficient multi-board prototyping, to build and test a 64-board model of a 512-core, MicroBlaze-based, non-coherent hardware prototype with a full network-on-chip in a 3D-mesh topology. We expand the hardware architecture to include the ARM Versatile Express platforms and build a 520-core heterogeneous prototype of 8 Cortex-A9 cores and 512 MicroBlaze cores. We then develop an MPI library for the prototype and evaluate it extensively using several bare-metal and MPI benchmarks. We find that our processor prototype is highly scalable, models faithfully single-chip multicore architectures, and is a very efficient platform for parallel programming research, being 50,000 times faster than software simulation.
Resumo:
This paper concerns randomized leader election in synchronous distributed networks. A distributed leader election algorithm is presented for complete n-node networks that runs in O(1) rounds and (with high probability) uses only O(√ √nlog<sup>3/2</sup>n) messages to elect a unique leader (with high probability). When considering the "explicit" variant of leader election where eventually every node knows the identity of the leader, our algorithm yields the asymptotically optimal bounds of O(1) rounds and O(. n) messages. This algorithm is then extended to one solving leader election on any connected non-bipartite n-node graph G in O(τ(. G)) time and O(τ(G)n√log<sup>3/2</sup>n) messages, where τ(. G) is the mixing time of a random walk on G. The above result implies highly efficient (sublinear running time and messages) leader election algorithms for networks with small mixing times, such as expanders and hypercubes. In contrast, previous leader election algorithms had at least linear message complexity even in complete graphs. Moreover, super-linear message lower bounds are known for time-efficient deterministic leader election algorithms. Finally, we present an almost matching lower bound for randomized leader election, showing that Ω(n) messages are needed for any leader election algorithm that succeeds with probability at least 1/. e+. ε, for any small constant ε. >. 0. We view our results as a step towards understanding the randomized complexity of leader election in distributed networks.
Resumo:
Authenticated encryption algorithms protect both the confidentiality and integrity of messages in a single processing pass. We show how to utilize the L◦P ◦S transform of the Russian GOST R 34.11-2012 standard hash “Streebog” to build an efficient, lightweight algorithm for Authenticated Encryption with Associated Data (AEAD) via the Sponge construction. The proposed algorithm “StriBob” has attractive security properties, is faster than the Streebog hash alone, twice as fast as the GOST 28147-89 encryption algorithm, and requires only a modest amount of running-time memory. StriBob is a Round 1 candidate in the CAESAR competition.
Resumo:
We use images of high spatial and temporal resolution, obtained using both ground- and space-based instrumentation, to investigate the role magnetic field inclination angles play in the propagation characteristics of running penumbral waves in the solar chromosphere. Analysis of a near-circular sunspot, close to the center of the solar disk, reveals a smooth rise in oscillatory period as a function of distance from the umbral barycenter. However, in one directional quadrant, corresponding to the north direction, a pronounced kink in the period-distance diagram is found. Utilizing a combination of the inversion of magnetic Stokes vectors and force-free field extrapolations, we attribute this behavior to the cut-off frequency imposed by the magnetic field geometry in this location. A rapid, localized inclination of the magnetic field lines in the north direction results in a faster increase in the dominant periodicity due to an accelerated reduction in the cut-off frequency. For the first time, we reveal how the spatial distribution of dominant wave periods, obtained with one of the highest resolution solar instruments currently available, directly reflects the magnetic geometry of the underlying sunspot, thus opening up a wealth of possibilities in future magnetohydrodynamic seismology studies. In addition, the intrinsic relationships we find between the underlying magnetic field geometries connecting the photosphere to the chromosphere, and the characteristics of running penumbral waves observed in the upper chromosphere, directly supports the interpretation that running penumbral wave phenomena are the chromospheric signature of upwardly propagating magneto-acoustic waves generated in the photosphere.
Resumo:
We present a mathematically rigorous Quality-of-Service (QoS) metric which relates the achievable quality of service metric (QoS) for a real-time analytics service to the server energy cost of offering the service. Using a new iso-QoS evaluation methodology, we scale server resources to meet QoS targets and directly rank the servers in terms of their energy-efficiency and by extension cost of ownership. Our metric and method are platform-independent and enable fair comparison of datacenter compute servers with significant architectural diversity, including micro-servers. We deploy our metric and methodology to compare three servers running financial option pricing workloads on real-life market data. We find that server ranking is sensitive to data inputs and desired QoS level and that although scale-out micro-servers can be up to two times more energy-efficient than conventional heavyweight servers for the same target QoS, they are still six times less energy efficient than high-performance computational accelerators.
Resumo:
We present a rigorous methodology and new metrics for fair comparison of server and microserver platforms. Deploying our methodology and metrics, we compare a microserver with ARM cores against two servers with ×86 cores running the same real-time financial analytics workload. We define workload-specific but platform-independent performance metrics for platform comparison, targeting both datacenter operators and end users. Our methodology establishes that a server based on the Xeon Phi co-processor delivers the highest performance and energy efficiency. However, by scaling out energy-efficient microservers, we achieve competitive or better energy efficiency than a power-equivalent server with two Sandy Bridge sockets, despite the microserver's slower cores. Using a new iso-QoS metric, we find that the ARM microserver scales enough to meet market throughput demand, that is, a 100% QoS in terms of timely option pricing, with as little as 55% of the energy consumed by the Sandy Bridge server.
Resumo:
FPGAs and GPUs are often used when real-time performance in video processing is required. An accelerated processor is chosen based on task-specific priorities (power consumption, processing time and detection accuracy), and this decision is normally made once at design time. All three characteristics are important, particularly in battery-powered systems. Here we propose a method for moving selection of processing platform from a single design-time choice to a continuous run time one.We implement Histogram of Oriented Gradients (HOG) detectors for cars and people and Mixture of Gaussians (MoG) motion detectors running across FPGA, GPU and CPU in a heterogeneous system. We use this to detect illegally parked vehicles in urban scenes. Power, time and accuracy information for each detector is characterised. An anomaly measure is assigned to each detected object based on its trajectory and location, when compared to learned contextual movement patterns. This drives processor and implementation selection, so that scenes with high behavioural anomalies are processed with faster but more power hungry implementations, but routine or static time periods are processed with power-optimised, less accurate, slower versions. Real-time performance is evaluated on video datasets including i-LIDS. Compared to power-optimised static selection, automatic dynamic implementation mapping is 10% more accurate but draws 12W extra power in our testbed desktop system.