957 resultados para data dependence
Resumo:
The use of bit-level systolic array circuits as building blocks in the construction of larger word-level systolic systems is investigated. It is shown that the overall structure and detailed timing of such systems may be derived quite simply using the dependence graph and cut-set procedure developed by S. Y. Kung (1988). This provides an attractive and intuitive approach to the bit-level design of many VLSI signal processing components. The technique can be applied to ripple-through and partly pipelined circuits as well as fully systolic designs. It therefore provides a means of examining the relative tradeoff between levels of pipelining, chip area, power consumption, and throughput rate within a given VLSI design.
Resumo:
The highly structured nature of many digital signal processing operations allows these to be directly implemented as regular VLSI circuits. This feature has been successfully exploited in the design of a number of commercial chips, some examples of which are described. While many of the architectures on which such chips are based were originally derived on heuristic basis, there is an increasing interest in the development of systematic design techniques for the direct mapping of computations onto regular VLSI arrays. The purpose of this paper is to show how the the technique proposed by Kung can be readily extended to the design of VLSI signal processing chips where the organisation of computations at the level of individual data bits is of paramount importance. The technique in question allows architectures to be derived using the projection and retiming of data dependence graphs.
Resumo:
Thesis (M. S.)--University of Illinois at Urbana-Champaign.
Resumo:
The performance of a program will ultimately be limited by its serial (scalar) portion, as pointed out by Amdahl′s Law. Reported studies thus far of instruction-level parallelism have mixed data-parallel program portions with scalar program portions, often leading to contradictory and controversial results. We report an instruction-level behavioral characterization of scalar code containing minimal data-parallelism, extracted from highly vectorized programs of the PERFECT benchmark suite running on a Cray Y-MP system. We classify scalar basic blocks according to their instruction mix, characterize the data dependencies seen in each class, and, as a first step, measure the maximum intrablock instruction-level parallelism available. We observe skewed rather than balanced instruction distributions in scalar code and in individual basic block classes of scalar code; nonuniform distribution of parallelism across instruction classes; and, as expected, limited available intrablock parallelism. We identify frequently occurring data-dependence patterns and discuss new instructions to reduce latency. Toward effective scalar hardware, we study latency-pipelining trade-offs and restricted multiple instruction issue mechanisms.
Resumo:
An embedded architecture of optical vector matrix multiplier (OVMM) is presented. The embedded architecture is aimed at optimising the data flow of vector matrix multiplier (VMM) to promote its performance. Data dependence is discussed when the OVMM is connected to a cluster system. A simulator is built to analyse the performance according to the architecture. According to the simulation, Amdahl's law is used to analyse the hybrid opto-electronic system. It is found that the electronic part and its interaction with optical part form the bottleneck of system.
Resumo:
近年来,以数据依赖分析为基础的高级编译优化成为现代编译器的重要研发内容.针对这类编译优化的测试问题提出了一种测试程序自动生成方法,能够根据指定的数据依赖特征生成测试程序.首先设计了LoSpec语言用以描述测试程序,然后采用一种便于表示数据依赖关系的模型——过程图作为中间表示模型实现了测试程序的自动生成,并开发了自动测试工具LoTester.与已有方法相比,该方法对高级优化更具针对性,自动化程度较高.LoTester目前在一款面向多媒体应用的优化编译器EECC的开发中得到应用并获得了良好效果.
Resumo:
In this work we show how automatic relative debugging can be used to find differences in computation between a correct serial program and an OpenMP parallel version of that program that does not yield correct results. Backtracking and re-execution are used to determine the first OpenMP parallel region that produces a difference in computation that may lead to an incorrect value the user has indicated. Our approach also lends itself to finding differences between parallel computations, where executing with M threads produces expected results but an N thread execution does not (M, N > 1, M ≠ N). OpenMP programs created using a parallelization tool are addressed by utilizing static analysis and directive information from the tool. Hand-parallelized programs, where OpenMP directives are inserted by the user, are addressed by performing data dependence and directive analysis.
Resumo:
Les simulations et figures ont été réalisées avec le logiciel R.
Resumo:
Pós-graduação em Agronomia (Energia na Agricultura) - FCA
Resumo:
Unstructured mesh codes for modelling continuum physics phenomena have evolved to provide the facility to model complex interacting systems. Parallelisation of such codes using single Program Multi Data (SPMD) domain decomposition techniques implemented with message passing has been demonstrated to provide high parallel efficiency, scalability to large numbers of processors P and portability across a wide range of parallel platforms. High efficiency, especially for large P requires that load balance is achieved in each parallel loop. For a code in which loops span a variety of mesh entity types, for example, elements, faces and vertices, some compromise is required between load balance for each entity type and the quantity of inter-processor communication required to satisfy data dependence between processors.
Resumo:
The European Nature Information System (EUNIS) has been implemented for the establishment of a marine European habitats inventory. Its hierarchical classification is defined and relies on environmental variables which primarily constrain biological communities (e.g. substrate types, sea energy level, depth and light penetration). The EUNIS habitat classification scheme relies on thresholds (e.g. fraction of light and energy) which are based on expert judgment or on the empirical analysis of the above environmental data. The present paper proposes to establish and validate an appropriate threshold for energy classes (high, moderate and low) and for subtidal biological zonation (infralittoral and circalittoral) suitable for EUNIS habitat classification of the Western Iberian coast. Kineticwave-induced energy and the fraction of photosynthetically available light exerted on the marine bottom were respectively assigned to the presence of kelp (Saccorhiza polyschides, Laminaria hyperborea and Laminaria ochroleuca) and seaweed species in general. Both data were statistically described, ordered fromthe largest to the smallest and percentile analyseswere independently performed. The threshold between infralittoral and circalittoral was based on the first quartile while the ‘moderate energy’ class was established between the 12.5 and 87.5 percentiles. To avoid data dependence on sampling locations and assess the confidence interval a bootstrap technique was applied. According to this analysis,more than 75% of seaweeds are present at locations where more than 3.65% of the surface light reaches the sea bottom. The range of energy levels estimated using S. polyschides data, indicate that on the IberianWest coast the ‘moderate energy’ areas are between 0.00303 and 0.04385 N/m2 of wave-induced energy. The lack of agreement between different studies in different regions of Europe suggests the need for more standardization in the future. However, the obtained thresholds in the present study will be very useful in the near future to implement and establish the Iberian EUNIS habitats inventory.