951 resultados para Field programmable gate arrays
Resumo:
Low-power processors and accelerators that were originally designed for the embedded systems market are emerging as building blocks for servers. Power capping has been actively explored as a technique to reduce the energy footprint of high-performance processors. The opportunities and limitations of power capping on the new low-power processor and accelerator ecosystem are less understood. This paper presents an efficient power capping and management infrastructure for heterogeneous SoCs based on hybrid ARM/FPGA designs. The infrastructure coordinates dynamic voltage and frequency scaling with task allocation on a customised Linux system for the Xilinx Zynq SoC. We present a compiler-assisted power model to guide voltage and frequency scaling, in conjunction with workload allocation between the ARM cores and the FPGA, under given power caps. The model achieves less than 5% estimation bias to mean power consumption. In an FFT case study, the proposed power capping schemes achieve on average 97.5% of the performance of the optimal execution and match the optimal execution in 87.5% of the cases, while always meeting power constraints.
Resumo:
The overall aim of the work presented in this paper has been to develop Montgomery modular multiplication architectures suitable for implementation on modern reconfigurable hardware. Accordingly, novel high-radix systolic array Montgomery multiplier designs are presented, as we believe that the inherent regular structure and absence of global interconnect associated with these, make them well-suited for implementation on modern FPGAs. Unlike previous approaches, each processing element (PE) comprises both an adder and a multiplier. The inclusion of a multiplier in the PE means that the need to pre-compute or store any multiples of the operands is avoided. This also allows very high-radix implementations to be realised, further reducing the amount of clock cycles per modular multiplication, while still maintaining a competitive critical delay. For demonstrative purposes, 512-bit and 1024-bit FPGA implementations using radices of 2(8) and 2(16) are presented. The subsequent throughput rates are the fastest reported to date.
Resumo:
Le besoin pour des biocapteurs à haute sensibilité mais simples à préparer et à utiliser est en constante augmentation, notamment dans le domaine biomédical. Les cristaux colloïdaux formés par des microsphères de polymère ont déjà prouvé leur fort potentiel en tant que biocapteurs grâce à l’association des propriétés des polymères et à la diffraction de la lumière visible de la structure périodique. Toutefois, une meilleure compréhension du comportement de ces structures est primordiale avant de pouvoir développer des capteurs efficaces et polyvalents. Ce travail propose d’étudier la formation et les propriétés des cristaux colloïdaux résultant de l’auto-assemblage de microsphères de polymère en milieu aqueux. Dans ce but, des particules avec différentes caractéristiques ont été synthétisées et caractérisées afin de corréler les propriétés des particules et le comportement de la structure cristalline. Dans un premier temps, des microsphères réticulées de polystyrène anioniques et cationiques ont été préparées par polymérisation en émulsion sans tensioactif. En variant la quantité de comonomère chargé, le chlorure de vinylbenzyltriméthylammonium ou le sulfonate styrène de sodium, des particules de différentes tailles, formes, polydispersités et charges surfaciques ont été obtenues. En effet, une augmentation de la quantité du comonomère ionique permet de stabiliser de façon électrostatique une plus grande surface et de diminuer ainsi la taille des particules. Cependant, au-dessus d’une certaine concentration, la polymérisation du comonomère en solution devient non négligeable, provoquant un élargissement de la distribution de taille. Quand la polydispersité est faible, ces microsphères chargées, même celles non parfaitement sphériques, peuvent s’auto-assembler et former des cristaux colloïdaux diffractant la lumière visible. Il semble que les répulsions électrostatiques créées par les charges surfaciques favorisent la formation de la structure périodique sur un grand domaine de concentrations et améliorent leur stabilité en présence de sel. Dans un deuxième temps, le besoin d’un constituant stimulable nous a orientés vers les structures cœur-écorce. Ces microsphères, synthétisées en deux étapes par polymérisation en émulsion sans tensioactif, sont formées d’un cœur de polystyrène et d’une écorce d’hydrogel. Différents hydrogels ont été utilisés afin d’obtenir des propriétés différentes : le poly(acide acrylique) pour sa sensibilité au pH, le poly(N-isopropylacrylamide) pour sa thermosensibilité, et, enfin, le copolymère poly(N-isopropylacrylamide-co-acide acrylique) donnant une double sensibilité. Ces microsphères forment des cristaux colloïdaux diffractant la lumière visible à partir d’une certaine concentration critique et pour un large domaine de concentrations. D’après les changements observés dans les spectres de diffraction, les stimuli ont un impact sur la structure cristalline mais l’amplitude de cet effet varie avec la concentration. Ce comportement semble être le résultat des changements induits par la transition de phase volumique sur les interactions entre particules plutôt qu’une conséquence du changement de taille. Les interactions attractives de van der Waals et les répulsions stériques sont clairement affectées par la transition de phase volumique de l’écorce de poly(N-isopropylacrylamide). Dans le cas des microsphères sensibles au pH, les interactions électrostatiques sont aussi à considérer. L’effet de la concentration peut alors être mis en relation avec la portée de ces interactions. Finalement, dans l’objectif futur de développer des biocapteurs de glucose, les microsphères cœur-écorce ont été fonctionnalisées avec l’acide 3-aminophénylboronique afin de les rendre sensibles au glucose. Les effets de la fonctionnalisation et de la complexation avec le glucose sur les particules et leur empilement périodique ont été examinés. La structure cristalline est visiblement affectée par la présence de glucose, même si le mécanisme impliqué reste à élucider.
Resumo:
This paper formally derives a new path-based neural branch prediction algorithm (FPP) into blocks of size two for a lower hardware solution while maintaining similar input-output characteristic to the algorithm. The blocked solution, here referred to as B2P algorithm, is obtained using graph theory and retiming methods. Verification approaches were exercised to show that prediction performances obtained from the FPP and B2P algorithms differ within one mis-prediction per thousand instructions using a known framework for branch prediction evaluation. For a chosen FPGA device, circuits generated from the B2P algorithm showed average area savings of over 25% against circuits for the FPP algorithm with similar time performances thus making the proposed blocked predictor superior from a practical viewpoint.
Resumo:
This paper develops cycle-level FPGA circuits of an organization for a fast path-based neural branch predictor Our results suggest that practical sizes of prediction tables are limited to around 32 KB to 64 KB in current FPGA technology due mainly to FPGA area of logic resources to maintain the tables. However the predictor scales well in terms of prediction speed. Table sizes alone should not be used as the only metric for hardware budget when comparing neural-based predictor to predictors of totally different organizations. This paper also gives early evidence to shift the attention on to the recovery from mis-prediction latency rather than on prediction latency as the most critical factor impacting accuracy of predictions for this class of branch predictors.
Resumo:
This paper presents the evaluation in power consumption of a clocking technique for pipelined designs. The technique shows a dynamic power consumption saving of around 30% over a conventional global clocking mechanism. The results were obtained from a series of experiments of a systolic circuit implemented in Virtex-II devices. The conversion from a global-clocked pipelined design to the proposed technique is straightforward, preserving the original datapath design. The savings can be used immediately either as a power reduction benefit or to increase the frequency of operation of a design for the same power consumption.
Resumo:
Es wurden funktionalisierte polymerunterstützte planare Phospholipid-Modellmembran-Systeme hergestellt und auf jeder Präparationsstufe eingehend charakterisiert. Dünne Polysaccharidfilme wurden in der Form von quellbaren Gelen auf oxidische Oberflächen aufgebracht und bezüglich ihres Quellungsverhaltens und der Oberflächeneigenschaften in Abhängigkeit vom Wassergehalt untersucht. Lipidmonoschichten unterschiedlicher Zusammensetzung wurden mittels Langmuir-Blodgett-Tranfer auf Polymersubstrate übertragen und bezüglich der Stärke der Lipid/Polymer Wechselwirkung, der lateralen Selbstdiffusion in Abhängigkeit von der Wasseraktivität, dem Spreitverhalten der monomolekularen Membran auf dem Substrat in Abhängigkeit von der Wasseraktivität und dem Lateraldruck der Monoschicht, sowie des Ausmaßes der Hydratation im Kopfgruppenbereich der Lipidmembran in Abhängigkeit von der Wasseraktivität mittels Fluoreszensondenmethoden (Fluoreszenzerholung nach Photobleichung (FRAP), Fluoreszenzmikroskopie und Fluoreszenzspektroskopie) untersucht. Diffusions- und Spreitverhalten von amphiphilen Monoschichten auf Polymersubstraten wurden auf der Basis von in dieser Arbeit entwickelten physikalischen Modellen diskutiert. Mittels Langmuir-Schäfer Transfer wurde auf polymerunterstützte Lipidmonoschichten eine zweite Monoschicht übertragen. Die somit erhaltenen Lipid-Doppelschichtmembranen wurden bezüglich ihrer Stabilität, der lateralen Struktur, der lateralen Selbstdiffusion, des Spreitverhaltens auf unbedeckte Bereiche sowie der Stärke der Membran/Substrat Wechselwirkung vermittels Fluoreszenzmikroskopie, FRAP und Interferenz-Kontrast-Mikroskopie (RICM) untersucht. Schließlich wurden substratgestützte Doppelschicht-Lipidmembranen mit als Protonenpumpen fungierenden integralen Membranproteinen versehen. Die laterale Selbstdiffusion der rekonstituierten Proteinmoleküle wurde mittels FRAP, die funktionale Aktivität der Protonenpumpen mit einem Ionen-sensitiven Feldeffekttransistor-Array analysiert.
Resumo:
Die vorliegende Arbeit befasst sich mit der Entwicklung und dem Aufbau eines Experiments zur hochpräzisen Bestimmung des g-Faktors gebundener Elektronen in hochgeladenen Ionen. Der g-Faktor eines Teilchens ist eine dimensionslose Konstante, die die Stärke der Wechselwirkung mit einem magnetischen Feld beschreibt. Im Falle eines an ein hochgeladenes Ion gebundenen Elektrons, dient es als einer der genausten Tests der Quantenelektrodynamik gebundener Zustande (BS-QED). Die Messung wird in einem dreifach Penning-Fallen System durchgeführt und basiert auf dem kontinuierlichen Stern-Gerlach-Effekt. Der erste Teil dieser Arbeit gibt den aktuellen Wissensstand über magnetische Momente wieder. Der hier gewählte experimentelle Aufbau wird begründet. Anschließend werden die experimentellen Anforderungen und die verwendeten Messtechniken erläutert. Das Ladungsbrüten der Ionen - einer der wichtigsten Aufgaben dieser Arbeit - ist dargestellt. Seine Realisierung basiert auf einer Feld-Emissions-Spitzen-Anordnung, die die Messung des Wirkungsquerschnitts für Elektronenstoßionisation ermöglicht. Der letzte Teil der Arbeit widmet sich der Entwicklung und dem Aufbau des Penning-Fallen Systems, sowie der Implementierung des Nachweisprozesses. Gegenwärtig ist der Aufbau zur Erzeugung hochgeladener Ionen und der dazugehörigen Messung des g-Faktors abgeschlossen, einschließlich des Steuerprogramms für die erste Datennahme. Die Ionenerzeugung und das Ladungsbrüten werden die nächsten Schritte sein.
Resumo:
Systems relying on fixed hardware components with a static level of parallelism can suffer from an underuse of logical resources, since they have to be designed for the worst-case scenario. This problem is especially important in video applications due to the emergence of new flexible standards, like Scalable Video Coding (SVC), which offer several levels of scalability. In this paper, Dynamic and Partial Reconfiguration (DPR) of modern FPGAs is used to achieve run-time variable parallelism, by using scalable architectures where the size can be adapted at run-time. Based on this proposal, a scalable Deblocking Filter core (DF), compliant with the H.264/AVC and SVC standards has been designed. This scalable DF allows run-time addition or removal of computational units working in parallel. Scalability is offered together with a scalable parallelization strategy at the macroblock (MB) level, such that when the size of the architecture changes, MB filtering order is modified accordingly
Resumo:
This paper proposes an automatic framework for the seamless integration of hardware accelerators, starting from an OpenMP-based application and an XML file describing the HW/SW partitioning. It extends a fully software architecture by generating and integrating the cores, along with the proper interfaces, and the code for scheduling and synchronization. Experimental results show that it is possible to validate different solutions only by varying the input code.
Resumo:
Adaptive hardware requires some reconfiguration capabilities. FPGAs with native dynamic partial reconfiguration (DPR) support pose a dilemma for system designers: whether to use native DPR or to build a virtual reconfigurable circuit (VRC) on top of the FPGA which allows selecting alternative functions by a multiplexing scheme. This solution allows much faster reconfiguration, but with higher resource overhead. This paper discusses the advantages of both implementations for a 2D image processing matrix. Results show how higher operating frequency is obtained for the matrix using DPR. However, this is compensated in the VRC during evolution due to the comparatively negligible reconfiguration time. Regarding area, the DPR implementation consumes slightly more resources due to the reconfiguration engine, but adds further more capabilities to the system.
Resumo:
Evolvable Hardware (EH) is a technique that consists of using reconfigurable hardware devices whose configuration is controlled by an Evolutionary Algorithm (EA). Our system consists of a fully-FPGA implemented scalable EH platform, where the Reconfigurable processing Core (RC) can adaptively increase or decrease in size. Figure 1 shows the architecture of the proposed System-on-Programmable-Chip (SoPC), consisting of a MicroBlaze processor responsible of controlling the whole system operation, a Reconfiguration Engine (RE), and a Reconfigurable processing Core which is able to change its size in both height and width. This system is used to implement image filters, which are generated autonomously thanks to the evolutionary process. The system is complemented with a camera that enables the usage of the platform for real time applications.
Resumo:
GaN y AlN son materiales semiconductores piezoeléctricos del grupo III-V. La heterounión AlGaN/GaN presenta una elevada carga de polarización tanto piezoeléctrica como espontánea en la intercara, lo que genera en su cercanía un 2DEG de grandes concentración y movilidad. Este 2DEG produce una muy alta potencia de salida, que a su vez genera una elevada temperatura de red. Las tensiones de puerta y drenador provocan un stress piezoeléctrico inverso, que puede afectar a la carga de polarización piezoeléctrica y así influir la densidad 2DEG y las características de salida. Por tanto, la física del dispositivo es relevante para todos sus aspectos eléctricos, térmicos y mecánicos. En esta tesis se utiliza el software comercial COMSOL, basado en el método de elementos finitos (FEM), para simular el comportamiento integral electro-térmico, electro-mecánico y electro-térmico-mecánico de los HEMTs de GaN. Las partes de acoplamiento incluyen el modelo de deriva y difusión para el transporte electrónico, la conducción térmica y el efecto piezoeléctrico. Mediante simulaciones y algunas caracterizaciones experimentales de los dispositivos, hemos analizado los efectos térmicos, de deformación y de trampas. Se ha estudiado el impacto de la geometría del dispositivo en su auto-calentamiento mediante simulaciones electro-térmicas y algunas caracterizaciones eléctricas. Entre los resultados más sobresalientes, encontramos que para la misma potencia de salida la distancia entre los contactos de puerta y drenador influye en generación de calor en el canal, y así en su temperatura. El diamante posee une elevada conductividad térmica. Integrando el diamante en el dispositivo se puede dispersar el calor producido y así reducir el auto-calentamiento, al respecto de lo cual se han realizado diversas simulaciones electro-térmicas. Si la integración del diamante es en la parte superior del transistor, los factores determinantes para la capacidad disipadora son el espesor de la capa de diamante, su conductividad térmica y su distancia a la fuente de calor. Este procedimiento de disipación superior también puede reducir el impacto de la barrera térmica de intercara entre la capa adaptadora (buffer) y el substrato. La muy reducida conductividad eléctrica del diamante permite que pueda contactar directamente el metal de puerta (muy cercano a la fuente de calor), lo que resulta muy conveniente para reducir el auto-calentamiento del dispositivo con polarización pulsada. Por otra parte se simuló el dispositivo con diamante depositado en surcos atacados sobre el sustrato como caminos de disipación de calor (disipador posterior). Aquí aparece una competencia de factores que influyen en la capacidad de disipación, a saber, el surco atacado contribuye a aumentar la temperatura del dispositivo debido al pequeño tamaño del disipador, mientras que el diamante disminuiría esa temperatura gracias a su elevada conductividad térmica. Por tanto, se precisan capas de diamante relativamente gruesas para reducer ele efecto de auto-calentamiento. Se comparó la simulación de la deformación local en el borde de la puerta del lado cercano al drenador con estructuras de puerta estándar y con field plate, que podrían ser muy relevantes respecto a fallos mecánicos del dispositivo. Otras simulaciones se enfocaron al efecto de la deformación intrínseca de la capa de diamante en el comportamiento eléctrico del dispositivo. Se han comparado los resultados de las simulaciones de la deformación y las características eléctricas de salida con datos experimentales obtenidos por espectroscopía micro-Raman y medidas eléctricas, respectivamente. Los resultados muestran el stress intrínseco en la capa producido por la distribución no uniforme del 2DEG en el canal y la región de acceso. Además de aumentar la potencia de salida del dispositivo, la deformación intrínseca en la capa de diamante podría mejorar la fiabilidad del dispositivo modulando la deformación local en el borde de la puerta del lado del drenador. Finalmente, también se han simulado en este trabajo los efectos de trampas localizados en la superficie, el buffer y la barrera. Las medidas pulsadas muestran que tanto las puertas largas como las grandes separaciones entre los contactos de puerta y drenador aumentan el cociente entre la corriente pulsada frente a la corriente continua (lag ratio), es decir, disminuir el colapse de corriente (current collapse). Este efecto ha sido explicado mediante las simulaciones de los efectos de trampa de superficie. Por su parte, las referidas a trampas en el buffer se enfocaron en los efectos de atrapamiento dinámico, y su impacto en el auto-calentamiento del dispositivo. Se presenta también un modelo que describe el atrapamiento y liberación de trampas en la barrera: mientras que el atrapamiento se debe a un túnel directo del electrón desde el metal de puerta, el desatrapamiento consiste en la emisión del electrón en la banda de conducción mediante túnel asistido por fonones. El modelo también simula la corriente de puerta, debida a la emisión electrónica dependiente de la temperatura y el campo eléctrico. Además, también se ilustra la corriente de drenador dependiente de la temperatura y el campo eléctrico. ABSTRACT GaN and AlN are group III-V piezoelectric semiconductor materials. The AlGaN/GaN heterojunction presents large piezoelectric and spontaneous polarization charge at the interface, leading to high 2DEG density close to the interface. A high power output would be obtained due to the high 2DEG density and mobility, which leads to elevated lattice temperature. The gate and drain biases induce converse piezoelectric stress that can influence the piezoelectric polarization charge and further influence the 2DEG density and output characteristics. Therefore, the device physics is relevant to all the electrical, thermal, and mechanical aspects. In this dissertation, by using the commercial finite-element-method (FEM) software COMSOL, we achieved the GaN HEMTs simulation with electro-thermal, electro-mechanical, and electro-thermo-mechanical full coupling. The coupling parts include the drift-diffusion model for the electron transport, the thermal conduction, and the piezoelectric effect. By simulations and some experimental characterizations, we have studied the device thermal, stress, and traps effects described in the following. The device geometry impact on the self-heating was studied by electro-thermal simulations and electrical characterizations. Among the obtained interesting results, we found that, for same power output, the distance between the gate and drain contact can influence distribution of the heat generation in the channel and thus influence the channel temperature. Diamond possesses high thermal conductivity. Integrated diamond with the device can spread the generated heat and thus potentially reduce the device self-heating effect. Electro-thermal simulations on this topic were performed. For the diamond integration on top of the device (top-side heat spreading), the determinant factors for the heat spreading ability are the diamond thickness, its thermal conductivity, and its distance to the heat source. The top-side heat spreading can also reduce the impact of thermal boundary resistance between the buffer and the substrate on the device thermal behavior. The very low electrical conductivity of diamond allows that it can directly contact the gate metal (which is very close to the heat source), being quite convenient to reduce the self-heating for the device under pulsed bias. Also, the diamond coated in vias etched in the substrate as heat spreading path (back-side heat spreading) was simulated. A competing mechanism influences the heat spreading ability, i.e., the etched vias would increase the device temperature due to the reduced heat sink while the coated diamond would decrease the device temperature due to its higher thermal conductivity. Therefore, relative thick coated diamond is needed in order to reduce the self-heating effect. The simulated local stress at the gate edge of the drain side for the device with standard and field plate gate structure were compared, which would be relevant to the device mechanical failure. Other stress simulations focused on the intrinsic stress in the diamond capping layer impact on the device electrical behaviors. The simulated stress and electrical output characteristics were compared to experimental data obtained by micro-Raman spectroscopy and electrical characterization, respectively. Results showed that the intrinsic stress in the capping layer caused the non-uniform distribution of 2DEG in the channel and the access region. Besides the enhancement of the device power output, intrinsic stress in the capping layer can potentially improve the device reliability by modulating the local stress at the gate edge of the drain side. Finally, the surface, buffer, and barrier traps effects were simulated in this work. Pulsed measurements showed that long gates and distances between gate and drain contact can increase the gate lag ratio (decrease the current collapse). This was explained by simulations on the surface traps effect. The simulations on buffer traps effects focused on illustrating the dynamic trapping/detrapping in the buffer and the self-heating impact on the device transient drain current. A model was presented to describe the trapping and detrapping in the barrier. The trapping was the electron direct tunneling from the gate metal while the detrapping was the electron emission into the conduction band described by phonon-assisted tunneling. The reverse gate current was simulated based on this model, whose mechanism can be attributed to the temperature and electric field dependent electron emission in the barrier. Furthermore, the mechanism of the device bias via the self-heating and electric field impact on the electron emission and the transient drain current were also illustrated.
Resumo:
Cyber-Physical Systems need to handle increasingly complex tasks, which additionally, may have variable operating conditions over time. Therefore, dynamic resource management to adapt the system to different needs is required. In this paper, a new bus-based architecture, called ARTICo3, which by means of Dynamic Partial Reconfiguration, allows the replication of hardware tasks to support module redundancy, multi-thread operation or dual-rail solutions for enhanced side-channel attack protection is presented. A configuration-aware data transaction unit permits data dispatching to more than one module in parallel, or provide coalesced data dispatching among different units to maximize the advantages of burst transactions. The selection of a given configuration is application independent but context-aware, which may be achieved by the combination of a multi-thread model similar to the CUDA kernel model specification, combined with a dynamic thread/task/kernel scheduler. A multi-kernel application for face recognition is used as an application example to show one scenario of the ARTICo3 architecture.