955 resultados para Quadratic, sieve, CUDA, OpenMP, SOC, Tegrak1
Resumo:
Code parallelization using OpenMP for shared memory systems is relatively easier than using message passing for distributed memory systems. Despite this, it is still a challenge to use OpenMP to parallelize application codes in a way that yields effective scalable performance when executed on a shared memory parallel system. We describe an environment that will assist the programmer in the various tasks of code parallelization and this is achieved in a greatly reduced time frame and level of skill required. The parallelization environment includes a number of tools that address the main tasks of parallelism detection, OpenMP source code generation, debugging and optimization. These tools include a high quality, fully interprocedural dependence analysis with user interaction capabilities to facilitate the generation of efficient parallel code, an automatic relative debugging tool to identify erroneous user decisions in that interaction and also performance profiling to identify bottlenecks. Finally, experiences of parallelizing some NASA application codes are presented to illustrate some of the benefits of using the evolving environment.
Resumo:
Employing Bak’s dimension theory, we investigate the nonstable quadratic K-group K1,2n(A, ) = G2n(A, )/E2n(A, ), n 3, where G2n(A, ) denotes the general quadratic group of rank n over a form ring (A, ) and E2n(A, ) its elementary subgroup. Considering form rings as a category with dimension in the sense of Bak, we obtain a dimension filtration G2n(A, ) G2n0(A, ) G2n1(A, ) E2n(A, ) of the general quadratic group G2n(A, ) such that G2n(A, )/G2n0(A, ) is Abelian, G2n0(A, ) G2n1(A, ) is a descending central series, and G2nd(A)(A, ) = E2n(A, ) whenever d(A) = (Bass–Serre dimension of A) is finite. In particular K1,2n(A, ) is solvable when d(A) <.
Resumo:
A generic architecture for implementing a QR array processor in silicon is presented. This improves on previous research by considerably simplifying the derivation of timing schedules for a QR system implemented as a folded linear array, where account has to be taken of processor cell latency and timing at the detailed circuit level. The architecture and scheduling derived have been used to create a generator for the rapid design of System-on-a-Chip (SoC) cores for QR decomposition. This is demonstrated through the design of a single-chip architecture for implementing an adaptive beamformer for radar applications. Published as IEEE Trans Circuits and Systems Part II, Analog and Digital Signal Processing, April 2003 NOT Express Briefs. Parts 1 and II of Journal reorganised since then into Regular Papers and Express briefs
Resumo:
A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable. © 2005 IEEE.
Resumo:
This research published in the foremost international journal in information theory and shows interplay between complex random matrix and multiantenna information theory. Dr T. Ratnarajah is leader in this area of research and his work has been contributed in the development of graduate curricula (course reader) in Massachusetts Institute of Technology (MIT), USA, By Professor Alan Edelman. The course name is "The Mathematics and Applications of Random Matrices", see http://web.mit.edu/18.338/www/projects.html
Resumo:
The standard linear-quadratic survival model for radiotherapy is used to investigate different schedules of radiation treatment planning to study how these may be affected by different tumour repopulation kinetics between treatments.
Resumo:
Sulfoxidation reactions of 4,6-dimethyl-2-methylthiopyrimidine have been performed using titanosilicate catalysts in ionic liquids, dioxane and ethanol. The ionic liquid reactions showed superior reactivity compared with molecular solvents. Moreover, on examination of the recycling of the catalyst, a significant increase in the stability of catalyst was found both in terms of recycling activity and leaching of the titanium from the catalyst. The mechanism by which the ionic liquid reduces the solubilisation of the catalysts is explored.
Resumo:
The standard linear-quadratic (LQ) survival model for external beam radiotherapy is reviewed with particular emphasis on studying how different schedules of radiation treatment planning may be affected by different tumour repopulation kinetics. The LQ model is further examined in the context of tumour control probability (TCP) models. The application of the Zaider and Minerbo non-Poissonian TCP model incorporating the effect of cellular repopulation is reviewed. In particular the recent development of a cell cycle model within the original Zaider and Minerbo TCP formalism is highlighted. Application of this TCP cell-cycle model in clinical treatment plans is explored and analysed.
Resumo:
Hardware synthesis from dataflow graphs of signal processing systems is a growing research area as focus shifts to high level design methodologies. For data intensive systems, dataflow based synthesis can lead to an inefficient usage of memory due to the restrictive nature of synchronous dataflow and its inability to easily model data reuse. This paper explores how dataflow graph changes can be used to drive both the on-chip and off-chip memory organisation and how these memory architectures can be mapped to a hardware implementation. By exploiting the data reuse inherent to many image processing algorithms and by creating memory hierarchies, off-chip memory bandwidth can be reduced by a factor of a thousand from the original dataflow graph level specification of a motion estimation algorithm, with a minimal increase in memory size. This analysis is verified using results gathered from implementation of the motion estimation algorithm on a Xilinx Virtex-4 FPGA, where the delay between the memories and processing elements drops from 14.2 ns down to 1.878 ns through the refinement of the memory architecture. Care must be taken when modeling these algorithms however, as inefficiencies in these models can be easily translated into overuse of hardware resources.