99 resultados para PROCESSOR
Resumo:
A FORTRAN 90 program is presented which calculates the total cross sections, and the electron energy spectra of the singly and doubly differential cross sections for the single target ionization of neutral atoms ranging from hydrogen up to and including argon. The code is applicable for the case of both high and low Z projectile impact in fast ion-atom collisions. The theoretical models provided for the program user are based on two quantum mechanical approximations which have proved to be very successful in the study of ionization in ion-atom collisions. These are the continuum-distorted-wave (CDW) and continuum-distorted-wave eikonal-initial-state (CDW-EIS) approximations. The codes presented here extend previously published. codes for single ionization of. target hydrogen [Crothers and McCartney, Comput. Phys. Commun. 72 (1992) 288], target helium [Nesbitt, O'Rourke and Crothers, Comput. Phys. Commun. 114 (1998) 385] and target atoms ranging from lithium to neon [O'Rourke, McSherry and Crothers, Comput. Phys. Commun. 131 (2000) 129]. Cross sections for all of these target atoms may be obtained as limiting cases from the present code. Title of program: ARGON Catalogue identifier: ADSE Program summary URL: http://cpc.cs.qub.ac.uk/cpc/summaries/ADSE Program obtainable from: CPC Program Library Queen's University of Belfast, N. Ireland Licensing provisions: none Computer for which the program is designed and others on which it is operable: Computers: Four by 200 MHz Pro Pentium Linux server, DEC Alpha 21164; Four by 400 MHz Pentium 2 Xeon 450 Linux server, IBM SP2 and SUN Enterprise 3500 Installations: Queen's University, Belfast Operating systems under which the program has been tested: Red-hat Linux 5.2, Digital UNIX Version 4.0d, AIX, Solaris SunOS 5.7 Compilers: PGI workstations, DEC CAMPUS Programming language used: FORTRAN 90 with MPI directives No. of bits in a word: 64, except on Linux servers 32 Number of processors used: any number Has the code been vectorized or parallelized? Parallelized using MPI No. of bytes in distributed program, including test data, etc.: 32 189 Distribution format: tar gzip file Keywords: Single ionization, cross sections, continuum-distorted-wave model, continuum- distorted-wave eikonal-initial-state model, target atoms, wave treatment Nature of physical problem: The code calculates total, and differential cross sections for the single ionization of target atoms ranging from hydrogen up to and including argon by both light and heavy ion impact. Method of solution: ARGON allows the user to calculate the cross sections using either the CDW or CDW-EIS [J. Phys. B 16 (1983) 3229] models within the wave treatment. Restrictions on the complexity of the program: Both the CDW and CDW-EIS models are two-state perturbative approximations. Typical running time: Times vary according to input data and number of processors. For one processor the test input data for double differential cross sections (40 points) took less than one second, whereas the test input for total cross sections (20 points) took 32 minutes. Unusual features of the program: none (C) 2003 Elsevier B.V All rights reserved.
Resumo:
The utilization of the computational Grid processor network has become a common method for researchers and scientists without access to local processor clusters to avail of the benefits of parallel processing for compute-intensive applications. As a result, this demand requires effective and efficient dynamic allocation of available resources. Although static scheduling and allocation techniques have proved effective, the dynamic nature of the Grid requires innovative techniques for reacting to change and maintaining stability for users. The dynamic scheduling process requires quite powerful optimization techniques, which can themselves lack the performance required in reaction time for achieving an effective schedule solution. Often there is a trade-off between solution quality and speed in achieving a solution. This paper presents an extension of a technique used in optimization and scheduling which can provide the means of achieving this balance and improves on similar approaches currently published.
Resumo:
High-speed field-programmable gate array (FPGA) implementations of an adaptive least mean square (LMS) filter with application in an electronic support measures (ESM) digital receiver, are presented. They employ "fine-grained" pipelining, i.e., pipelining within the processor and result in an increased output latency when used in the LMS recursive system. Therefore, the major challenge is to maintain a low latency output whilst increasing the pipeline stage in the filter for higher speeds. Using the delayed LMS (DLMS) algorithm, fine-grained pipelined FPGA implementations using both the direct form (DF) and the transposed form (TF) are considered and compared. It is shown that the direct form LMS filter utilizes the FPGA resources more efficiently thereby allowing a 120 MHz sampling rate.
Resumo:
In this paper, a parallel-matching processor architecture with early jump-out (EJO) control is proposed to carry out high-speed biometric fingerprint database retrieval. The processor performs the fingerprint retrieval by using minutia point matching. An EJO method is applied to the proposed architecture to speed up the large database retrieval. The processor is implemented on a Xilinx Virtex-E, and occupies 6,825 slices and runs at up to 65 MHz. The software/hardware co-simulation benchmark with a database of 10,000 fingerprints verifies that the matching speed can achieve the rate of up to 1.22 million fingerprints per second. EJO results in about a 22% gain in computing efficiency.
Resumo:
A high-sample rate 3D median filtering processor architecture is proposed, based on a novel 3D median filtering algorithm, that can reduce the computing complexity in comparison with the traditional bubble sorting algorithm. A 3 x 3 x 3 filter processor is implemented in VHDL, and the simulation verifies that the processor can process a 128 x 128 x 96 MRI image in 0.03 seconds while running at 50 MHz.
Resumo:
The technical challenges in the design and programming of signal processors for multimedia communication are discussed. The development of terminal equipment to meet such demand presents a significant technical challenge, considering that it is highly desirable that the equipment be cost effective, power efficient, versatile, and extensible for future upgrades. The main challenges in the design and programming of signal processors for multimedia communication are, general-purpose signal processor design, application-specific signal processor design, operating systems and programming support and application programming. The size of FFT is programmable so that it can be used for various OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). The clustered architecture design and distributed ping-pong register files in the PAC DSP raise new challenges of code generation.
Resumo:
The efficient generation of parallel code for multi-processor environments, is a large and complicated issue. Attempts to address this problem have always resulted in significant input from users. Because of constraints on user knowledge and time, the automation of the process is a promising and practically important research area. In recent years heuristic approaches have been used to capture available knowledge and make it available for the parallelisation process. Here, the introduction of a novel approach of neural network techniques is combined with an expert system technique to enhance the availability of knowledge to aid in the automatic generation of parallel code.
Resumo:
A novel most significant digit first CORDIC architecture is presented that is suitable for the VLSI design of systolic array processor cells for performing QR decomposition. This is based on an on-line CORDIC algorithm with a constant scale factor and a latency independent of the wordlength. This has been derived through the extension of previously published CORDIC algorithms. It is shown that simplifying the calculation of convergence bounds also greatly simplifies the derivation of suitable VLSI architectures. Design studies, based on a 0.35-µ CMOS standard cell process, indicate that 20 such QR processor cells operating at rates suitable for radar beamfoming can be readily accommodated on a single chip.
Resumo:
This paper presents the design of a novel single chip adaptive beamformer capable of performing 50 Gflops, (Giga-floating-point operations/second). The core processor is a QR array implemented on a fully efficient linear systolic architecture, derived using a mapping that allows individual processors for boundary and internal cell operations. In addition, the paper highlights a number of rapid design techniques that have been used to realise this system. These include an architecture synthesis tool for quickly developing the circuit architecture and the utilisation of a library of parameterisable silicon intellectual property (IP) cores, to rapidly develop detailed silicon designs.
Resumo:
Screening for residues of anabolic steroids frequently requires extraction from tissues and fluids before analysis. Chemical procedures for these extractions can be complicated, expensive to perform and not ideal for the simultaneous extraction of analytes with different solubilities. Extraction by multi-immunoaffinity chromatography (MIAC) may be used as an alternative. Samples are passed through a column containing a range of antibodies immobilized on an inert support. The desired analytes are bound to their respective antibodies, washed and then eluted by a suitable solvent. The purified extracts can then be incorporated into the analytical tests, The analytes that can be extracted presently are alpha-nortestosterone, zeranol, trenbolone, diethylstilboestrol, boldenone and dexamethasone. Manually, the MIAC procedure is limited to about six columns per operator but bq automating the process using a robotic sample processor (RSP), 48 columns can be run simultaneously during the day or night. The RSP has also been adapted to transfer extracts and reagents on to ELISA plates. The automated system has proved to be a robust and reliable means of screening large numbers of samples for anabolic agents with minimal manual input
Resumo:
This paper describes the design, application, and evaluation of a user friendly, flexible, scalable and inexpensive Advanced Educational Parallel (AdEPar) digital signal processing (DSP) system based on TMS320C25 digital processors to implement DSP algorithms. This system will be used in the DSP laboratory by graduate students to work on advanced topics such as developing parallel DSP algorithms. The graduating senior students who have gained some experience in DSP can also use the system. The DSP laboratory has proved to be a useful tool in the hands of the instructor to teach the mathematically oriented topics of DSP that are often difficult for students to grasp. The DSP laboratory with assigned projects has greatly improved the ability of the students to understand such complex topics as the fast Fourier transform algorithm, linear and circular convolution, the theory and design of infinite impulse response (IIR) and finite impulse response (FIR) filters. The user friendly PC software support of the AdEPar system makes it easy to develop DSP programs for students. This paper gives the architecture of the AdEPar DSP system. The communication between processors and the PC-DSP processor communication are explained. The parallel debugger kernels and the restrictions of the system are described. The programming in the AdEPar is explained, and two benchmarks (parallel FFT and DES) are presented to show the system performance.
Resumo:
Apparatus for scanning a moving object includes a visible waveband sensor oriented to collect a series of images of the object as it passes through a field of view. An image processor uses the series of images to form a composite image. The image processor stores image pixel data for a current image and predecessor image in the series. It uses information in the current image and its predecessor to analyse images and derive likelihood measures indicating probabilities that current image pixels correspond to parts of the object. The image processor estimates motion between the current image and its predecessor from likelihood weighted pixels. It generates the composite image from frames positioned according to respective estimates of object image motion. Image motion may alternatively be detected be a speed sensor such as Doppler radar sensing object motion directly and providing image timing signals
Resumo:
Simultaneous multithreading processors dynamically share processor resources between multiple threads. In general, shared SMT resources may be managed explicitly, for instance, by dynamically setting queue occupation bounds for each thread as in the DCRA and Hill-Climbing policies. Alternatively, resources may be managed implicitly; that is, resource usage is controlled by placing the desired instruction mix in the resources. In this case, the main resource management tool is the instruction fetch policy which must predict the behavior of each thread (branch mispredictions, long-latency loads, etc.) as it fetches instructions.
Resumo:
Speeding up sequential programs on multicores is a challenging problem that is in urgent need of a solution. Automatic parallelization of irregular pointer-intensive codes, exempli?ed by the SPECint codes, is a very hard problem. This paper shows that, with a helping hand, such auto-parallelization is possible and fruitful. This paper makes the following contributions: (i) A compiler framework for extracting pipeline-like parallelism from outer program loops is presented. (ii) Using a light-weight programming model based on annotations, the programmer helps the compiler to ?nd thread-level parallelism. Each of the annotations speci?es only a small piece of semantic information that compiler analysis misses, e.g. stating that a variable is dead at a certain program point. The annotations are designed such that correctness is easily veri?ed. Furthermore, we present a tool for suggesting annotations to the programmer. (iii) The methodology is applied to autoparallelize several SPECint benchmarks. For the benchmark with most parallelism (hmmer), we obtain a scalable 7-fold speedup on an AMD quad-core dual processor. The annotations constitute a parallel programming model that relies extensively on a sequential program representation. Hereby, the complexity of debugging is not increased and it does not obscure the source code. These properties could prove valuable to increase the ef?ciency of parallel programming.
Resumo:
Branch prediction feeds a speculative execution processor core with instructions. Branch mispredictions are inevitable and have negative effects on performance and energy consumption. With the advent of highly accurate conditional branch predictors, nonconditional branch instructions are gaining importance.