994 resultados para Instruction set
Resumo:
A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable. © 2005 IEEE.
Resumo:
An application specific programmable processor (ASIP) suitable for the real-time implementation of matrix computations such as Singular Value and QR Decomposition is presented. The processor incorporates facilities for the issue of parallel instructions and a dual-bus architecture that are designed to achieve high performance. Internally, it uses a CORDIC module to perform arithmetic operations, with pipelining of the internal recursive loop exploited to multiplex the two independent micro-rotations onto a single piece of hardware. The net result is a flexible processing element whose functionality can be changed under program control, which combines high performance with efficient silicon implementation. This is illustrated through the results of a detailed silicon design study and the applications of the techniques to a combined SVD/QRD system.
Resumo:
Thesis (M.S.)--University of Illinois at Urbana-Champaign.
Resumo:
We propose an ISA extension that decouples the data access and register write operations in a load instruction. We describe system and hardware support for decoupled loads. Furthermore, we show how compilers can generate better static instruction schedules by hoisting a decoupled load’s data access above may-alias stores and branches. We find that decoupled loads improve performance with geometric mean speedups of 8.4%.
Resumo:
With the increasing importance of Application Domain Specific Processor (ADSP) design, a significant challenge is to identify special-purpose operations for implementation as a customized instruction. While many methodologies have been proposed for this purpose, they all work for a single algorithm chosen from the target application domain. Such algorithm-specific approaches are not suitable for designing instruction sets applicable to a whole family of related algorithms. For an entire range of related algorithms, this paper develops a methodology for identifying compound operations, as a basis for designing “domain-specific” Instruction Set Architectures (ISAs) that can efficiently run most of the algorithms in a given domain. Our methodology combines three different static analysis techniques to identify instruction sequences common to several related algorithms: identification of (non-branching) instruction sequences that occur commonly across the algorithms; identification of instruction sequences nested within iterative constructs that are thus executed frequently; and identification of commonly-occurring instruction sequences that span basic blocks. Choosing different combinations of these results enables us to design domain-specific special operations with different desired characteristics, such as performance or suitability as a library function. To demonstrate our approach, case studies are carried out for a family of thirteen string matching algorithms. Finally, the validity of our static analysis results is confirmed through independent dynamic analysis experiments and performance improvement measurements.
Resumo:
In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Touch keyboarding as a vocational skill is disappearing at a time when students and educators across alleducational sectors are expected to use a computer keyboard on a regular basis. there is documentation surrounding the embedding of Information and Communication Technology (ICT) within the curricula and yet within the National Training Packages touch keyboarding, previously considered a core component, is now an elective in the Business Services framework. This situation is an odds with current practice overseas where touch keyboarding is a component of primary and secondary curricula. From Rhetoric to Practice explores the current issues and practice in teaching and learning touch keyboarding in primary, secondary and tertiary institutions. Through structured interview participants detailed current practice of teachers and their students. Further, tertiary students participated in a training program aimed at achquiring touch keyboarding as a skill to enhance their studies. The researcher's background experience of fifteen years teaching touch keyboarding and computer literacty to adults and 30 years in Business Services trade provides a strong basis for this project. The teaching experience is enhanced by industry experience in administration, course coordination in technical, community and tertiary institutions and a strong commitment to the efficient usage of a computer by all. The findings of this project identified coursework expectations requiring all students from kindergarten to tertiary to use a computer keyboard on a weekly basis and that neither teaching nor learning tough keyboarding appears in the primary, secondary and tertiary curricula in New South Wales. Further, teachers recognised tough keyboarding as the prefered style over 'hunt and peck' keyboarding while acknowledging the teaching and learning difficulties of time constraints, the need for qualified touch keyboarding teachers and issues arising when retraining students from existing poor habits. In conclusion, this project recommends that computer keyboarding be defined as a writing tool for education, vocation and life, with early instruction set in primary schooling area and embedding touch keyboarding with the secondary, technical and tertiary areas and finally to draw the attention of educational authorities to the Duty Of Care aspects associated with computer keyboarding in the classroom.
Resumo:
An Application Specific Instruction-set Processor (ASIP) is a specialized processor tailored to run a particular application/s efficiently. However, when there are multiple candidate applications in the application’s domain it is difficult and time consuming to find optimum set of applications to be implemented. Existing ASIP design approaches perform this selection manually based on a designer’s knowledge. We help in cutting down the number of candidate applications by devising a classification method to cluster similar applications based on the special-purpose operations they share. This provides a significant reduction in the comparison overhead while resulting in customized ASIP instruction sets which can benefit a whole family of related applications. Our method gives users the ability to quantify the degree of similarity between the sets of shared operations to control the size of clusters. A case study involving twelve algorithms confirms that our approach can successfully cluster similar algorithms together based on the similarity of their component operations.
Resumo:
Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization and low power dissipation, we propose REDEFINE, a polymorphic ASIC in which specialized hardware units are replaced with basic hardware units that can create the same functionality by runtime re-composition. It is a ``future-proof'' custom hardware solution for multiple applications and their derivatives in a domain. In this article, we describe a compiler framework and supporting hardware comprising compute, storage, and communication resources. Applications described in high-level language (e.g., C) are compiled into application substructures. For each application substructure, a set of compute elements on the hardware are interconnected during runtime to form a pattern that closely matches the communication pattern of that particular application. The advantage is that the bounded CEs are neither processor cores nor logic elements as in FPGAs. Hence, REDEFINE offers the power and performance advantage of an ASIC and the hardware reconfigurability and programmability of that of an FPGA/instruction set processor. In addition, the hardware supports custom instruction pipelining. Existing instruction-set extensible processors determine a sequence of instructions that repeatedly occur within the application to create custom instructions at design time to speed up the execution of this sequence. We extend this scheme further, where a kernel is compiled into custom instructions that bear strong producer-consumer relationship (and not limited to frequently occurring sequences of instructions). Custom instructions, realized as hardware compositions effected at runtime, allow several instances of the same to be active in parallel. A key distinguishing factor in majority of the emerging embedded applications is stream processing. To reduce the overheads of data transfer between custom instructions, direct communication paths are employed among custom instructions. In this article, we present the overview of the hardware-aware compiler framework, which determines the NoC-aware schedule of transports of the data exchanged between the custom instructions on the interconnect. The results for the FFT kernel indicate a 25% reduction in the number of loads/stores, and throughput improves by log(n) for n-point FFT when compared to sequential implementation. Overall, REDEFINE offers flexibility and a runtime reconfigurability at the expense of 1.16x in power and 8x in area when compared to an ASIC. REDEFINE implementation consumes 0.1x the power of an FPGA implementation. In addition, the configuration overhead of the FPGA implementation is 1,000x more than that of REDEFINE.
Resumo:
ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is prohibitive, especially when the volumes involved are low. However, if the architecture synthesis trajectory for such algorithms is such that the target architecture can be identified as an interconnection of elementary parameterized computational structures, then it is possible to attain a close match, both in terms of performance and power with respect to an ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly programmable (configurable) and can be viewed as an application specific instruction-set processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP algorithms.
Resumo:
A avaliação da qualidade de vida tem sido cada vez mais utilizada pelos profissionais da área de saúde para mensurar o impacto de doenças na vida dos pacientes, bem como para avaliar os resultados dos tratamentos realizados. O crescente interesse por protocolos de pesquisa clínica em doenças não degenerativas do quadril tem encontrado muitos obstáculos na avaliação objetiva de seus resultados, principalmente nos estudos de observação de novas intervenções terapêuticas, como a artroscopia. O Nonarthritic Hip Score (NAHS) é um instrumento de avaliação clínica, desenvolvido originalmente em inglês, cujo objetivo é avaliar a função da articulação do quadril em pacientes jovens e fisicamente ativos. O objetivo desse estudo foi traduzir esse instrumento para a língua portuguesa, adaptá-lo para a cultura brasileira e validá-lo para que possa ser utilizado na avaliação de qualidade de vida de pacientes brasileiros com dor no quadril, sem doença degenerativa. A metodologia utilizada é a sugerida por Guillemin et al. (1993) e revisado por Beaton et al. (2000), que propuseram um conjunto de instruções padronizadas para adaptação cultural de instrumentos de qualidade de vida, incluindo cinco etapas: tradução, tradução de volta, revisão pelo comitê, pré-teste e teste, com reavaliação dos pesos dos escores, se relevante. A versão de consenso foi aplicada em 30 indivíduos. As questões sobre atividades esportivas e tarefas domésticas foram modificadas, para melhor adaptação à cultura brasileira. A versão brasileira do Nonarthritic Hip Score (NAHS-Brasil) foi respondida por 64 pacientes com dor no quadril, a fim de avaliar as propriedades de medida do instrumento: reprodutibilidade, consistência interna e validade. A reprodutibilidade foi 0,9, mostrando uma forte correlação; a consistência interna mostrou correlação entre 0,8 e 0,9, considerada boa e excelente; a validade foi considerada respectivamente boa e excelente; a correlação entre NAHS-Brasil e WOMAC foi 0,9; e a correlação entre o NAHS-Brasil e Questionário Algofuncional de Lequesne foi 0,79. O Nonarthritic Hip Score foi traduzido para a língua portuguesa e adaptado à cultura brasileira, de acordo com o conjunto de instruções padronizadas para adaptação cultural de instrumentos de qualidade de vida. Sua reprodutibilidade, consistência interna e validade foram também demonstradas.
Resumo:
The Message-Driven Processor is a node of a large-scale multiprocessor being developed by the Concurrent VLSI Architecture Group. It is intended to support fine-grained, message passing, parallel computation. It contains several novel architectural features, such as a low-latency network interface, extensive type-checking hardware, and on-chip memory that can be used as an associative lookup table. This document is a programmer's guide to the MDP. It describes the processor's register architecture, instruction set, and the data types supported by the processor. It also details the MDP's message sending and exception handling facilities.
Resumo:
Quadsim is an intermediate code simulator. It allows you to "run" programs that your compiler generates in intermediate code format. Its user interface is similar to most debuggers in that you can step through your program, instruction by instruction, set breakpoints, examine variable values, and so on. The intermediate code format used by Quadsim is that described in [Aho 86]. If your compiler generates intermediate code in this format, you will be able to take intermediate-code files generated by your compiler, load them into the simulator, and watch them "run." You are provided with functions that hide the internal representation of intermediate code. You can use these functions within your compiler to generate intermediate code files that can be read by the simulator. Quadsim was inspired and greatly influenced by [Aho 86]. The material in chapter 8 (Intermediate Code Generation) of [Aho 86] should be considered background material for users of Quadsim.
Resumo:
As the commoditization of sensing, actuation and communication hardware increases, so does the potential for dynamically tasked sense and respond networked systems (i.e., Sensor Networks or SNs) to replace existing disjoint and inflexible special-purpose deployments (closed-circuit security video, anti-theft sensors, etc.). While various solutions have emerged to many individual SN-centric challenges (e.g., power management, communication protocols, role assignment), perhaps the largest remaining obstacle to widespread SN deployment is that those who wish to deploy, utilize, and maintain a programmable Sensor Network lack the programming and systems expertise to do so. The contributions of this thesis centers on the design, development and deployment of the SN Workbench (snBench). snBench embodies an accessible, modular programming platform coupled with a flexible and extensible run-time system that, together, support the entire life-cycle of distributed sensory services. As it is impossible to find a one-size-fits-all programming interface, this work advocates the use of tiered layers of abstraction that enable a variety of high-level, domain specific languages to be compiled to a common (thin-waist) tasking language; this common tasking language is statically verified and can be subsequently re-translated, if needed, for execution on a wide variety of hardware platforms. snBench provides: (1) a common sensory tasking language (Instruction Set Architecture) powerful enough to express complex SN services, yet simple enough to be executed by highly constrained resources with soft, real-time constraints, (2) a prototype high-level language (and corresponding compiler) to illustrate the utility of the common tasking language and the tiered programming approach in this domain, (3) an execution environment and a run-time support infrastructure that abstract a collection of heterogeneous resources into a single virtual Sensor Network, tasked via this common tasking language, and (4) novel formal methods (i.e., static analysis techniques) that verify safety properties and infer implicit resource constraints to facilitate resource allocation for new services. This thesis presents these components in detail, as well as two specific case-studies: the use of snBench to integrate physical and wireless network security, and the use of snBench as the foundation for semester-long student projects in a graduate-level Software Engineering course.
Resumo:
In the field of embedded systems design, coprocessors play an important role as a component to increase performance. Many embedded systems are built around a small General Purpose Processor (GPP). If the GPP cannot meet the performance requirements for a certain operation, a coprocessor can be included in the design. The GPP can then offload the computationally intensive operation to the coprocessor; thus increasing the performance of the overall system. A common application of coprocessors is the acceleration of cryptographic algorithms. The work presented in this thesis discusses coprocessor architectures for various cryptographic algorithms that are found in many cryptographic protocols. Their performance is then analysed on a Field Programmable Gate Array (FPGA) platform. Firstly, the acceleration of Elliptic Curve Cryptography (ECC) algorithms is investigated through the use of instruction set extension of a GPP. The performance of these algorithms in a full hardware implementation is then investigated, and an architecture for the acceleration the ECC based digital signature algorithm is developed. Hash functions are also an important component of a cryptographic system. The FPGA implementation of recent hash function designs from the SHA-3 competition are discussed and a fair comparison methodology for hash functions presented. Many cryptographic protocols involve the generation of random data, for keys or nonces. This requires a True Random Number Generator (TRNG) to be present in the system. Various TRNG designs are discussed and a secure implementation, including post-processing and failure detection, is introduced. Finally, a coprocessor for the acceleration of operations at the protocol level will be discussed, where, a novel aspect of the design is the secure method in which private-key data is handled