984 resultados para Register transfer level
Resumo:
Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption.
Resumo:
Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption.
Resumo:
The original contribution of this thesis to knowledge are novel digital readout architectures for hybrid pixel readout chips. The thesis presents asynchronous bus-based architecture, a data-node based column architecture and a network-based pixel matrix architecture for data transportation. It is shown that the data-node architecture achieves readout efficiency 99% with half the output rate as a bus-based system. The network-based solution avoids “broken” columns due to some manufacturing errors, and it distributes internal data traffic more evenly across the pixel matrix than column-based architectures. An improvement of > 10% to the efficiency is achieved with uniform and non-uniform hit occupancies. Architectural design has been done using transaction level modeling (TLM) and sequential high-level design techniques for reducing the design and simulation time. It has been possible to simulate tens of column and full chip architectures using the high-level techniques. A decrease of > 10 in run-time is observed using these techniques compared to register transfer level (RTL) design technique. Reduction of 50% for lines-of-code (LoC) for the high-level models compared to the RTL description has been achieved. Two architectures are then demonstrated in two hybrid pixel readout chips. The first chip, Timepix3 has been designed for the Medipix3 collaboration. According to the measurements, it consumes < 1 W/cm^2. It also delivers up to 40 Mhits/s/cm^2 with 10-bit time-over-threshold (ToT) and 18-bit time-of-arrival (ToA) of 1.5625 ns. The chip uses a token-arbitrated, asynchronous two-phase handshake column bus for internal data transfer. It has also been successfully used in a multi-chip particle tracking telescope. The second chip, VeloPix, is a readout chip being designed for the upgrade of Vertex Locator (VELO) of the LHCb experiment at CERN. Based on the simulations, it consumes < 1.5 W/cm^2 while delivering up to 320 Mpackets/s/cm^2, each packet containing up to 8 pixels. VeloPix uses a node-based data fabric for achieving throughput of 13.3 Mpackets/s from the column to the EoC. By combining Monte Carlo physics data with high-level simulations, it has been demonstrated that the architecture meets requirements of the VELO (260 Mpackets/s/cm^2 with efficiency of 99%).
Resumo:
La conception de systèmes hétérogènes exige deux étapes importantes, à savoir : la modélisation et la simulation. Habituellement, des simulateurs sont reliés et synchronisés en employant un bus de co-simulation. Les approches courantes ont beaucoup d’inconvénients : elles ne sont pas toujours adaptées aux environnements distribués, le temps d’exécution de simulation peut être très décevant, et chaque simulateur a son propre noyau de simulation. Nous proposons une nouvelle approche qui consiste au développement d’un simulateur compilé multi-langage où chaque modèle peut être décrit en employant différents langages de modélisation tel que SystemC, ESyS.Net ou autres. Chaque modèle contient généralement des modules et des moyens de communications entre eux. Les modules décrivent des fonctionnalités propres à un système souhaité. Leur description est réalisée en utilisant la programmation orientée objet et peut être décrite en utilisant une syntaxe que l’utilisateur aura choisie. Nous proposons ainsi une séparation entre le langage de modélisation et la simulation. Les modèles sont transformés en une même représentation interne qui pourrait être vue comme ensemble d’objets. Notre environnement compile les objets internes en produisant un code unifié au lieu d’utiliser plusieurs langages de modélisation qui ajoutent beaucoup de mécanismes de communications et des informations supplémentaires. Les optimisations peuvent inclure différents mécanismes tels que le regroupement des processus en un seul processus séquentiel tout en respectant la sémantique des modèles. Nous utiliserons deux niveaux d’abstraction soit le « register transfer level » (RTL) et le « transaction level modeling » (TLM). Le RTL permet une modélisation à bas niveau d’abstraction et la communication entre les modules se fait à l’aide de signaux et des signalisations. Le TLM est une modélisation d’une communication transactionnelle à un plus haut niveau d’abstraction. Notre objectif est de supporter ces deux types de simulation, mais en laissant à l’usager le choix du langage de modélisation. De même, nous proposons d’utiliser un seul noyau au lieu de plusieurs et d’enlever le bus de co-simulation pour accélérer le temps de simulation.
Resumo:
Uma arquitetura reconfigurável e multiprocessada para a implementação física de Redes de Petri foi desenvolvida em VHDL e mapeada sobre um FPGA. Convencionalmente, as Redes de Petri são transformadas em uma linguagem de descrição de hardware no nível de transferências entre registradores e um processo de síntese de alto nível é utilizado para gerar as funções booleanas e tabelas de transição de estado para que se possa, finalmente, mapeá-las num FPGA (Morris et al., 2000) (Soto and Pereira, 2001). A arquitetura proposta possui blocos lógicos reconfiguráveis desenvolvidos exclusivamente para a implementação dos lugares e das transições da rede, não sendo necessária a descrição da rede em níveis de abstração intermediários e nem a utilização de um processo de síntese para realizar o mapeamento da rede na arquitetura. A arquitetura permite o mapeamento de modelos de Redes de Petri com diferenciação entre as marcas e associação de tempo no disparo das transições, sendo composta por um arranjo de processadores reconfiguráveis, cada um dos quais representando o comportamento de uma transição da Rede de Petri a ser mapeada e por um sistema de comunicação, implementado por um conjunto de roteadores que são capazes de enviar pacotes de dados de um processador reconfigurável a outro. A arquitetura proposta foi validada num FPGA de 10.570 elementos lógicos com uma topologia que permitiu a implementação de Redes de Petri de até 9 transições e 36 lugares, atingindo uma latência de 15,4ns e uma vazão de até 17,12GB/s com uma freqüência de operação de 64,58MHz.
Resumo:
The constant increase in digital systems complexity definitely demands the automation of the corresponding synthesis process. This paper presents a computational environment designed to produce both software and hardware implementations of a system. The tool for code generation has been named ACG8051. As for the hardware synthesis there has been produced a larger environment consisting of four programs, namely: PIPE2TAB, AGPS, TABELA, and TAB2VHDL. ACG8051 and PIPE2TAB use place/transition net descriptions from PIPE as inputs. ACG8051 is aimed at generating assembly code for the 8051 micro-controller. PIPE2TAB produces a tabular version of a Mealy type finite state machine of the system, its output is fed into AGPS that is used for state allocation. The resulting digital system is then input to TABELA, which minimizes control functions and outputs of the digital system. Finally, the output generated by TABELA is fed to TAB2VHDL that produces a VHDL description of the system at the register transfer level. Thus, we present here a set of tools designed to take a high-level description of a digital system, represented by a place/transition net, and produces as output both an assembly code that can be immediately run on an 8051 micro-controller, and a VHDL description that can be used to directly implement the hardware parts either on an FPGA or as an ASIC.
Resumo:
Field-programmable gate arrays are ideal hosts to custom accelerators for signal, image, and data processing but de- mand manual register transfer level design if high performance and low cost are desired. High-level synthesis reduces this design burden but requires manual design of complex on-chip and off-chip memory architectures, a major limitation in applications such as video processing. This paper presents an approach to resolve this shortcoming. A constructive process is described that can derive such accelerators, including on- and off-chip memory storage from a C description such that a user-defined throughput constraint is met. By employing a novel statement-oriented approach, dataflow intermediate models are derived and used to support simple ap- proaches for on-/off-chip buffer partitioning, derivation of custom on-chip memory hierarchies and architecture transformation to ensure user-defined throughput constraints are met with minimum cost. When applied to accelerators for full search motion estima- tion, matrix multiplication, Sobel edge detection, and fast Fourier transform, it is shown how real-time performance up to an order of magnitude in advance of existing commercial HLS tools is enabled whilst including all requisite memory infrastructure. Further, op- timizations are presented that reduce the on-chip buffer capacity and physical resource cost by up to 96% and 75%, respectively, whilst maintaining real-time performance.
Resumo:
International audience
Resumo:
Este trabalho tem como foco a aplicação de técnicas de otimização de potência no alto nível de abstração para circuitos CMOS, e em particular no nível arquitetural e de transferência de registrados (Register Transfer Leve - RTL). Diferentes arquiteturas para projetos especificos de algorítmos de filtros FIR e transformada rápida de Fourier (FFT) são implementadas e comparadas. O objetivo é estabelecer uma metodologia de projeto para baixa potência neste nível de abstração. As técnicas de redução de potência abordadas tem por obetivo a redução da atividade de chaveamento através das técnicas de exploração arquitetural e codificação de dados. Um dos métodos de baixa potência que tem sido largamente utilizado é a codificação de dados para a redução da atividade de chaveamento em barramentos. Em nosso trabalho, é investigado o processo de codificação dos sinais para a obtenção de módulos aritméticos eficientes em termos de potência que operam diretamente com esses códigos. O objetivo não consiste somente na redução da atividade de chavemanto nos barramentos de dados mas também a minimização da complexidade da lógica combinacional dos módulos. Nos algorítmos de filtros FIR e FFT, a representação dos números em complemento de 2 é a forma mais utilizada para codificação de operandos com sinal. Neste trabalho, apresenta-se uma nova arquitetura para operações com sinal que mantém a mesma regularidade um multiplicador array convencional. Essa arquitetura pode operar com números na base 2m, o que permite a redução do número de linhas de produtos parciais, tendo-se desta forma, ganhos significativos em desempenho e redução de potência. A estratégia proposta apresenta resultados significativamente melhores em relação ao estado da arte. A flexibilidade da arquitetura proposta permite a construção de multiplicadores com diferentes valores de m. Dada a natureza dos algoritmos de filtro FIR e FFT, que envolvem o produto de dados por apropriados coeficientes, procura-se explorar o ordenamento ótimo destes coeficientes nos sentido de minimizar o consumo de potência das arquiteturas implementadas.
Digital signal processing and digital system design using discrete cosine transform [student course]
Resumo:
The discrete cosine transform (DCT) is an important functional block for image processing applications. The implementation of a DCT has been viewed as a specialized research task. We apply a micro-architecture based methodology to the hardware implementation of an efficient DCT algorithm in a digital design course. Several circuit optimization and design space exploration techniques at the register-transfer and logic levels are introduced in class for generating the final design. The students not only learn how the algorithm can be implemented, but also receive insights about how other signal processing algorithms can be translated into a hardware implementation. Since signal processing has very broad applications, the study and implementation of an extensively used signal processing algorithm in a digital design course significantly enhances the learning experience in both digital signal processing and digital design areas for the students.
Resumo:
Phase change problems arise in many practical applications such as air-conditioning and refrigeration, thermal energy storage systems and thermal management of electronic devices. The physical phenomenon in such applications are complex and are often difficult to be studied in detail with the help of only experimental techniques. The efforts to improve computational techniques for analyzing two-phase flow problems with phase change are therefore gaining momentum. The development of numerical methods for multiphase flow has been motivated generally by the need to account more accurately for (a) large topological changes such as phase breakup and merging, (b) sharp representation of the interface and its discontinuous properties and (c) accurate and mass conserving motion of the interface. In addition to these considerations, numerical simulation of multiphase flow with phase change introduces additional challenges related to discontinuities in the velocity and the temperature fields. Moreover, the velocity field is no longer divergence free. For phase change problems, the focus of developmental efforts has thus been on numerically attaining a proper conservation of energy across the interface in addition to the accurate treatment of fluxes of mass and momentum conservation as well as the associated interface advection. Among the initial efforts related to the simulation of bubble growth in film boiling applications the work in \cite{Welch1995} was based on the interface tracking method using a moving unstructured mesh. That study considered moderate interfacial deformations. A similar problem was subsequently studied using moving, boundary fitted grids \cite{Son1997}, again for regimes of relatively small topological changes. A hybrid interface tracking method with a moving interface grid overlapping a static Eulerian grid was developed \cite{Juric1998} for the computation of a range of phase change problems including, three-dimensional film boiling \cite{esmaeeli2004computations}, multimode two-dimensional pool boiling \cite{Esmaeeli2004} and film boiling on horizontal cylinders \cite{Esmaeeli2004a}. The handling of interface merging and pinch off however remains a challenge with methods that explicitly track the interface. As large topological changes are crucial for phase change problems, attention has turned in recent years to front capturing methods utilizing implicit interfaces that are more effective in treating complex interface deformations. The VOF (Volume of Fluid) method was adopted in \cite{Welch2000} to simulate the one-dimensional Stefan problem and the two-dimensional film boiling problem. The approach employed a specific model for mass transfer across the interface involving a mass source term within cells containing the interface. This VOF based approach was further coupled with the level set method in \cite{Son1998}, employing a smeared-out Heaviside function to avoid the numerical instability related to the source term. The coupled level set, volume of fluid method and the diffused interface approach was used for film boiling with water and R134a at the near critical pressure condition \cite{Tomar2005}. The effect of superheat and saturation pressure on the frequency of bubble formation were analyzed with this approach. The work in \cite{Gibou2007} used the ghost fluid and the level set methods for phase change simulations. A similar approach was adopted in \cite{Son2008} to study various boiling problems including three-dimensional film boiling on a horizontal cylinder, nucleate boiling in microcavity \cite{lee2010numerical} and flow boiling in a finned microchannel \cite{lee2012direct}. The work in \cite{tanguy2007level} also used the ghost fluid method and proposed an improved algorithm based on enforcing continuity and divergence-free condition for the extended velocity field. The work in \cite{sato2013sharp} employed a multiphase model based on volume fraction with interface sharpening scheme and derived a phase change model based on local interface area and mass flux. Among the front capturing methods, sharp interface methods have been found to be particularly effective both for implementing sharp jumps and for resolving the interfacial velocity field. However, sharp velocity jumps render the solution susceptible to erroneous oscillations in pressure and also lead to spurious interface velocities. To implement phase change, the work in \cite{Hardt2008} employed point mass source terms derived from a physical basis for the evaporating mass flux. To avoid numerical instability, the authors smeared the mass source by solving a pseudo time-step diffusion equation. This measure however led to mass conservation issues due to non-symmetric integration over the distributed mass source region. The problem of spurious pressure oscillations related to point mass sources was also investigated by \cite{Schlottke2008}. Although their method is based on the VOF, the large pressure peaks associated with sharp mass source was observed to be similar to that for the interface tracking method. Such spurious fluctuation in pressure are essentially undesirable because the effect is globally transmitted in incompressible flow. Hence, the pressure field formation due to phase change need to be implemented with greater accuracy than is reported in current literature. The accuracy of interface advection in the presence of interfacial mass flux (mass flux conservation) has been discussed in \cite{tanguy2007level,tanguy2014benchmarks}. The authors found that the method of extending one phase velocity to entire domain suggested by Nguyen et al. in \cite{nguyen2001boundary} suffers from a lack of mass flux conservation when the density difference is high. To improve the solution, the authors impose a divergence-free condition for the extended velocity field by solving a constant coefficient Poisson equation. The approach has shown good results with enclosed bubble or droplet but is not general for more complex flow and requires additional solution of the linear system of equations. In current thesis, an improved approach that addresses both the numerical oscillation of pressure and the spurious interface velocity field is presented by featuring (i) continuous velocity and density fields within a thin interfacial region and (ii) temporal velocity correction steps to avoid unphysical pressure source term. Also I propose a general (iii) mass flux projection correction for improved mass flux conservation. The pressure and the temperature gradient jump condition are treated sharply. A series of one-dimensional and two-dimensional problems are solved to verify the performance of the new algorithm. Two-dimensional and cylindrical film boiling problems are also demonstrated and show good qualitative agreement with the experimental observations and heat transfer correlations. Finally, a study on Taylor bubble flow with heat transfer and phase change in a small vertical tube in axisymmetric coordinates is carried out using the new multiphase, phase change method.
Resumo:
Two oyster species are currently present along the French coasts : the indigenous European flat oyster (Ostrea edulis), and the Pacific cupped oyster (Crassostrea gigas), that has been introduced from Japan since the beginning of the 70ies. The flat oyster successively suffered from two protozoan diseases during the 60ies and its production decreased from 20 000 tons/year by that time to 1 500 tons/year nowadays. Consequently, the oyster production is principally (99%) based upon the Pacific oyster species with approximately 150 000 tons/year among which 90% are grown from the natural spat. However, the hatchery production of this species is developing and was estimated to 400 to 800 millions spat in 2002. Moreover, strengthened relationships between IFREMER and the 5 commercial hatcheries, that all joined the SYSAAF (Union of the French poultry, shellfish and fish farming selectors), allow to plan for new genetic breeding programs. At the end of the 80ies, IFREMER initiated a genetic breeding program for the resistance of the European flat oyster to the bonamiosis, and obtained strains more tolerant to this disease. After two generations of massal selection, molecular markers had identified a reduced genetic basis in this program. It was then reoriented to an intra-familial selection. However, we were confronted to a zootechnic problem to manage such a scheme and we compromised by an intra-cohorts of families selection scheme managed using molecular markers. The program has now reached the transfer level with experimentation at a professional scale. Concerning the Pacific cupped oyster, and in parallel with the obtaining and the study of polyploids, performance of different Asian cupped oyster strains were compared to the one introduced in France thirty years ago and currently suffering from summer mortalities. The local strain exhibited better performance, certainly based upon a good local adaptation. In other respects, although early growth is a relevant criteria for selection for growth to commercial stage, it is not to be privileged in the context of an oyster producing region with a limited food availability. Contrary, the spat summer mortality became a priority for numerous teams (genetic, physiology, pathology, ecology,...) joined in the MOREST program. The first results showed important survival differences between fullsib and halsib families. They indicate a genetic determinism to this character "survival" and promote for its selection.
Resumo:
Gene transfer and expression in eukaryotes is often limited by a number of stably maintained gene copies and by epigenetic silencing effects. Silencing may be limited by the use of epigenetic regulatory sequences such as matrix attachment regions (MAR). Here, we show that successive transfections of MAR-containing vectors allow a synergistic increase of transgene expression. This finding is partly explained by an increased entry into the cell nuclei and genomic integration of the DNA, an effect that requires both the MAR element and iterative transfections. Fluorescence in situ hybridization analysis often showed single integration events, indicating that DNAs introduced in successive transfections could recombine. High expression was also linked to the cell division cycle, so that nuclear transport of the DNA occurs when homologous recombination is most active. Use of cells deficient in either non-homologous end-joining or homologous recombination suggested that efficient integration and expression may require homologous recombination-based genomic integration of MAR-containing plasmids and the lack of epigenetic silencing events associated with tandem gene copies. We conclude that MAR elements may promote homologous recombination, and that cells and vectors can be engineered to take advantage of this property to mediate highly efficient gene transfer and expression.
Resumo:
The present paper focuses on the analysis and discussion of a likelihood ratio (LR) development for propositions at a hierarchical level known in the context as 'offence level'. Existing literature on the topic has considered LR developments for so-called offender to scene transfer cases. These settings involve-in their simplest form-a single stain found on a crime scene, but with possible uncertainty about the degree to which that stain is relevant (i.e. that it has been left by the offender). Extensions to multiple stains or multiple offenders have also been reported. The purpose of this paper is to discuss a development of a LR for offence level propositions when case settings involve potential transfer in the opposite direction, i.e. victim/scene to offender transfer. This setting has previously not yet been considered. The rationale behind the proposed LR is illustrated through graphical probability models (i.e. Bayesian networks). The role of various uncertain parameters is investigated through sensitivity analyses as well as simulations.
Resumo:
The mechanism involved in the Tm3+ (F-3(4))-->Ho3+ (I-5(7)) energy transfer and Tm3+ (H-3(4), H-3(6))-->Tm3+ (F-3(4), F-3(4)) cross relaxation as a function of the donor and acceptor concentrations was investigated in Tm-Ho-codoped fluorozirconate glasses. The experimental transfer rates were determined for the Tm-->Ho energy transfer from the best fit of the acceptor luminescence decay using an expression which takes into account the Inokuti-Hirayama model and localized donor-to-acceptor interaction solution. The original acceptor solution derived from the Inokuti-Hirayama model fits well the acceptor luminescence transient only for low-concentrated systems. The results showed that a fast excitation diffusion that occurs in a very short time (t<