957 resultados para On-Chip Multiprocessor (OCM)
Resumo:
Due to the growth of design size and complexity, design verification is an important aspect of the Logic Circuit development process. The purpose of verification is to validate that the design meets the system requirements and specification. This is done by either functional or formal verification. The most popular approach to functional verification is the use of simulation based techniques. Using models to replicate the behaviour of an actual system is called simulation. In this thesis, a software/data structure architecture without explicit locks is proposed to accelerate logic gate circuit simulation. We call thus system ZSIM. The ZSIM software architecture simulator targets low cost SIMD multi-core machines. Its performance is evaluated on the Intel Xeon Phi and 2 other machines (Intel Xeon and AMD Opteron). The aim of these experiments is to: • Verify that the data structure used allows SIMD acceleration, particularly on machines with gather instructions ( section 5.3.1). • Verify that, on sufficiently large circuits, substantial gains could be made from multicore parallelism ( section 5.3.2 ). • Show that a simulator using this approach out-performs an existing commercial simulator on a standard workstation ( section 5.3.3 ). • Show that the performance on a cheap Xeon Phi card is competitive with results reported elsewhere on much more expensive super-computers ( section 5.3.5 ). To evaluate the ZSIM, two types of test circuits were used: 1. Circuits from the IWLS benchmark suit [1] which allow direct comparison with other published studies of parallel simulators.2. Circuits generated by a parametrised circuit synthesizer. The synthesizer used an algorithm that has been shown to generate circuits that are statistically representative of real logic circuits. The synthesizer allowed testing of a range of very large circuits, larger than the ones for which it was possible to obtain open source files. The experimental results show that with SIMD acceleration and multicore, ZSIM gained a peak parallelisation factor of 300 on Intel Xeon Phi and 11 on Intel Xeon. With only SIMD enabled, ZSIM achieved a maximum parallelistion gain of 10 on Intel Xeon Phi and 4 on Intel Xeon. Furthermore, it was shown that this software architecture simulator running on a SIMD machine is much faster than, and can handle much bigger circuits than a widely used commercial simulator (Xilinx) running on a workstation. The performance achieved by ZSIM was also compared with similar pre-existing work on logic simulation targeting GPUs and supercomputers. It was shown that ZSIM simulator running on a Xeon Phi machine gives comparable simulation performance to the IBM Blue Gene supercomputer at very much lower cost. The experimental results have shown that the Xeon Phi is competitive with simulation on GPUs and allows the handling of much larger circuits than have been reported for GPU simulation. When targeting Xeon Phi architecture, the automatic cache management of the Xeon Phi, handles and manages the on-chip local store without any explicit mention of the local store being made in the architecture of the simulator itself. However, targeting GPUs, explicit cache management in program increases the complexity of the software architecture. Furthermore, one of the strongest points of the ZSIM simulator is its portability. Note that the same code was tested on both AMD and Xeon Phi machines. The same architecture that efficiently performs on Xeon Phi, was ported into a 64 core NUMA AMD Opteron. To conclude, the two main achievements are restated as following: The primary achievement of this work was proving that the ZSIM architecture was faster than previously published logic simulators on low cost platforms. The secondary achievement was the development of a synthetic testing suite that went beyond the scale range that was previously publicly available, based on prior work that showed the synthesis technique is valid.
Resumo:
Cellular models are important tools in various research areas related to colorectal biology and associated diseases. Herein, we review the most widely used cell lines and the different techniques to grow them, either as cell monolayer, polarized two-dimensional epithelia on membrane filters, or as three-dimensional spheres in scaffoldfree or matrix-supported culture conditions. Moreover, recent developments, such as gut-on-chip devices or the ex vivo growth of biopsy-derived organoids, are also discussed. We provide an overview on the potential applications but also on the limitations for each of these techniques, while evaluating their contribution to provide more reliable cellular models for research, diagnostic testing, or pharmacological validation related to colon physiology and pathophysiology.
Resumo:
The focus of this research is to explore the applications of the finite difference formulation based on the latency insertion method (LIM) to the analysis of circuit interconnects. Special attention is devoted to addressing the issues that arise in very large networks such as on-chip signal and power distribution networks. We demonstrate that the LIM has the power and flexibility to handle various types of analysis required at different stages of circuit design. The LIM is particularly suitable for simulations of very large scale linear networks and can significantly outperform conventional circuit solvers (such as SPICE).
Resumo:
The performance, energy efficiency and cost improvements due to traditional technology scaling have begun to slow down and present diminishing returns. Underlying reasons for this trend include fundamental physical limits of transistor scaling, the growing significance of quantum effects as transistors shrink, and a growing mismatch between transistors and interconnects regarding size, speed and power. Continued Moore's Law scaling will not come from technology scaling alone, and must involve improvements to design tools and development of new disruptive technologies such as 3D integration. 3D integration presents potential improvements to interconnect power and delay by translating the routing problem into a third dimension, and facilitates transistor density scaling independent of technology node. Furthermore, 3D IC technology opens up a new architectural design space of heterogeneously-integrated high-bandwidth CPUs. Vertical integration promises to provide the CPU architectures of the future by integrating high performance processors with on-chip high-bandwidth memory systems and highly connected network-on-chip structures. Such techniques can overcome the well-known CPU performance bottlenecks referred to as memory and communication wall. However the promising improvements to performance and energy efficiency offered by 3D CPUs does not come without cost, both in the financial investments to develop the technology, and the increased complexity of design. Two main limitations to 3D IC technology have been heat removal and TSV reliability. Transistor stacking creates increases in power density, current density and thermal resistance in air cooled packages. Furthermore the technology introduces vertical through silicon vias (TSVs) that create new points of failure in the chip and require development of new BEOL technologies. Although these issues can be controlled to some extent using thermal-reliability aware physical and architectural 3D design techniques, high performance embedded cooling schemes, such as micro-fluidic (MF) cooling, are fundamentally necessary to unlock the true potential of 3D ICs. A new paradigm is being put forth which integrates the computational, electrical, physical, thermal and reliability views of a system. The unification of these diverse aspects of integrated circuits is called Co-Design. Independent design and optimization of each aspect leads to sub-optimal designs due to a lack of understanding of cross-domain interactions and their impacts on the feasibility region of the architectural design space. Co-Design enables optimization across layers with a multi-domain view and thus unlocks new high-performance and energy efficient configurations. Although the co-design paradigm is becoming increasingly necessary in all fields of IC design, it is even more critical in 3D ICs where, as we show, the inter-layer coupling and higher degree of connectivity between components exacerbates the interdependence between architectural parameters, physical design parameters and the multitude of metrics of interest to the designer (i.e. power, performance, temperature and reliability). In this dissertation we present a framework for multi-domain co-simulation and co-optimization of 3D CPU architectures with both air and MF cooling solutions. Finally we propose an approach for design space exploration and modeling within the new Co-Design paradigm, and discuss the possible avenues for improvement of this work in the future.
Resumo:
En esta tesis se aborda la implementación de un sistema completo de visión activa, en el que se capturan y generan imágenes de resolución espacial variable. Todo el sistema se integra en un sólo dispositivo del tipo AP SoC (All Programmable System on Chip), lo que nos permite llevar a cabo el codiseño hardware-software del mismo, implementando en la parte lógica los bloques de preprocesado intensivo, y en la parte software los algoritmos de procesado de control más complejo. El objetivo es que, trabajando con un campo visual del orden de Megapíxeles, se pueda procesar una tasa moderada de imágenes por segundo. Las imágenes multiresolución se generan a partir de sensores de resolución uniforme con una latencia nula, lo que permite tener preparada la imagen de resolución variable en el mismo instante en que se ha terminado de capturar la imagen original. Como innovación con respecto a las primeras contribuciones relacionadas con esta Tesis, se procesan imágenes con toda la información de color. Esto implica la necesidad de diseñar conversores entre espacios de color distintos, para adecuar la información al tipo de procesado que se va a realizar con ella. Estos bloques se integran sin alterar la latencia de entrega de los sucesivos fotogramas. El procesamiento de estas imágenes multirresolución genera un mapa de saliencia que permite mover la fóvea hacía la región considerada como más relevante en la escena. El contenido de la imagen se estructura en una jerarquía de niveles de abstracción. A diferencia de otras arquitecturas de este tipo, como son la pirámide regular y el polígono foveal, en las que se trabaja con imágenes de resolución uniforme en los distintos niveles de la jerarquía, la pirámide irregular foveal que se propone en esta tesis combina las ideas de trabajar con una imagen realmente multirresolución, que incluya el campo de visión completo que abarcan sensor y óptica, con el procesamiento jerárquico propio de las pirámides irregulares. Para ello en esta tesis se propone la implementación de un algoritmo de diezmado irregular que, tomando como base la imagen multirresolución, dará como resultado una estructura piramidal donde los distintos niveles no son imágenes sino grafos orientados a la resolución del problema de segmentación y estimación de saliencia. Todo el sistema se integra en torno a la arquitectura de bus AXI, que permite conectar entre si todos los cores desarrollados en la parte lógica, así como el acceso a la memoria compartida con los algoritmos implementados en la parte software. Esto es posible gracias a los bloques de acceso directo a memoria AXI-VDMA, en una propuesta de configuración que permite tanto la integración perfectamente coordinada de la transferencia de la imagen multirresolución generada a la zona de trabajo del algoritmo de segmentación como su recuperación para la posterior visualización del resultado del proceso, y todo ello con una tasa de trabajo que mejora los resultados de plataformas similares.
Resumo:
Integrated circuit scaling has enabled a huge growth in processing capability, which necessitates a corresponding increase in inter-chip communication bandwidth. As bandwidth requirements for chip-to-chip interconnection scale, deficiencies of electrical channels become more apparent. Optical links present a viable alternative due to their low frequency-dependent loss and higher bandwidth density in the form of wavelength division multiplexing. As integrated photonics and bonding technologies are maturing, commercialization of hybrid-integrated optical links are becoming a reality. Increasing silicon integration leads to better performance in optical links but necessitates a corresponding co-design strategy in both electronics and photonics. In this light, holistic design of high-speed optical links with an in-depth understanding of photonics and state-of-the-art electronics brings their performance to unprecedented levels. This thesis presents developments in high-speed optical links by co-designing and co-integrating the primary elements of an optical link: receiver, transmitter, and clocking.
In the first part of this thesis a 3D-integrated CMOS/Silicon-photonic receiver will be presented. The electronic chip features a novel design that employs a low-bandwidth TIA front-end, double-sampling and equalization through dynamic offset modulation. Measured results show -14.9dBm of sensitivity and energy efficiency of 170fJ/b at 25Gb/s. The same receiver front-end is also used to implement source-synchronous 4-channel WDM-based parallel optical receiver. Quadrature ILO-based clocking is employed for synchronization and a novel frequency-tracking method that exploits the dynamics of IL in a quadrature ring oscillator to increase the effective locking range. An adaptive body-biasing circuit is designed to maintain the per-bit-energy consumption constant across wide data-rates. The prototype measurements indicate a record-low power consumption of 153fJ/b at 32Gb/s. The receiver sensitivity is measured to be -8.8dBm at 32Gb/s.
Next, on the optical transmitter side, three new techniques will be presented. First one is a differential ring modulator that breaks the optical bandwidth/quality factor trade-off known to limit the speed of high-Q ring modulators. This structure maintains a constant energy in the ring to avoid pattern-dependent power droop. As a first proof of concept, a prototype has been fabricated and measured up to 10Gb/s. The second technique is thermal stabilization of micro-ring resonator modulators through direct measurement of temperature using a monolithic PTAT temperature sensor. The measured temperature is used in a feedback loop to adjust the thermal tuner of the ring. A prototype is fabricated and a closed-loop feedback system is demonstrated to operate at 20Gb/s in the presence of temperature fluctuations. The third technique is a switched-capacitor based pre-emphasis technique designed to extend the inherently low bandwidth of carrier injection micro-ring modulators. A measured prototype of the optical transmitter achieves energy efficiency of 342fJ/bit at 10Gb/s and the wavelength stabilization circuit based on the monolithic PTAT sensor consumes 0.29mW.
Lastly, a first-order frequency synthesizer that is suitable for high-speed on-chip clock generation will be discussed. The proposed design features an architecture combining an LC quadrature VCO, two sample-and-holds, a PI, digital coarse-tuning, and rotational frequency detection for fine-tuning. In addition to an electrical reference clock, as an extra feature, the prototype chip is capable of receiving a low jitter optical reference clock generated by a high-repetition-rate mode-locked laser. The output clock at 8GHz has an integrated RMS jitter of 490fs, peak-to-peak periodic jitter of 2.06ps, and total RMS jitter of 680fs. The reference spurs are measured to be –64.3dB below the carrier frequency. At 8GHz the system consumes 2.49mW from a 1V supply.
Resumo:
This dissertation presents detailed experimental and theoretical investigations of nonlinear and nonreciprocal effects in magnetic garnet films. The dissertation thus comprises two major sections. The first section concentrates on the study of a new class of nonlinear magneto-optic thin film materials possessing strong higher order magnetic susceptibility for nonlinear optical applications. The focus was on enlarging the nonlinear performance of ferrite garnet films by strain generation and compositional gradients in the sputter-deposition growth of these films. Under this project several bismuth-substituted yttrium iron garnet (Bi,Y) 3 (Fe,Ga)5 O12(acronym as Bi:YIG) films have been sputter-deposited over gadolinium gallium garnet (Gd 3 Ga5 O12 ) substrates and characterized for their nonlinear optical response. One of the important findings of this work is that lattice mismatch strain drives the second harmonic (SH) signal in the Bi:YIG films, in agreement with theoretical predictions; whereas micro-strain was found not to correlate significantly with SH signal at the micro-strain levels present in these films. This study also elaborates on the role of the film's constitutive elements and their concentration gradients in nonlinear response of the films. Ultrahigh sensitivity delivered by second harmonic generation provides a new exciting tool for studying magnetized surfaces and buried interfaces, making this work important from both a fundamental and application point of view. The second part of the dissertation addresses an important technological need; namely the development of an on-chip optical isolator for use in photonic integrated circuits. It is based on two related novel effects, nonreciprocal and unidirectional optical Bloch oscillations (BOs), recently proposed and developed by Professor Miguel Levy and myself. This dissertation work has established a comprehensive theoretical background for the implementation of these effects in magneto-optic waveguide arrays. The model systems we developed consist of photonic lattices in the form of one-dimensional waveguide arrays where an optical force is introduced into the array through geometrical design turning the beam sideways. Laterally displaced photons are periodically returned to a central guide by photonic crystal action. The effect leads to a novel oscillatory optical phenomenon that can be magnetically controlled and rendered unidirectional. An on-chip optical isolator was designed based on the unidirectionality of the magneto-opticBloch oscillatory motion. The proposed device delivers an isolation ratio as high as 36 dB that remains above 30 dB in a 0.7 nm wavelength bandwidth, at the telecommunication wavelength 1.55 μm. Slight modifications in isolator design allow one to achieve an even more impressive isolation ratio ~ 55 dB, but at the expense of smaller bandwidth. Moreover, the device allows multifunctionality, such as optical switching with a simultaneous isolation function, well suited for photonic integrated circuits.
Resumo:
DNA as powerful building molecule, is widely used for the assembly of molecular structures and dynamic molecular devices with different potential applications, ranging from synthetic biology to diagnostics. The feature of sequence programmability, which makes it possible to predict how single stranded DNA molecules fold and interact with one another, allowed the development of spatiotemporally controlled nanostructures and the engineering of supramolecular devices. The first part of this thesis addresses the development of an integrated chemiluminescence (CL)-based lab-on-chip sensor for detection of Adenosine-5-triphosphate (ATP) life biomarker in extra-terrestrial environments.Subsequently, we investigated whether it is possible to study the interaction and the recognition between biomolecules and their targets, mimicking the intracellular environment in terms of crowding, confinement and compartmentalization. To this purpose, we developed a split G-quadruplex DNAzyme platform for the chemiluminescent and quantitative detection of antibodies based on antibody-induced co-localization proximity mechanism in which a split G-quadruplex DNAzyme is led to reassemble into the functional native G-quadruplex conformation as the effect of a guided spatial nanoconfinement.The following part of this thesis aims at developing chemiluminescent nanoparticles for bioimaging and photodynamic therapy applications.In chapter5 a realistic and accurate evaluation of the potentiality of electrochemistry and chemiluminescence (CL) for biosensors development (i.e., is it better to “measure an electron or a photon”?), has been achieved.In chapter 6 the emission anisotropy phenomenon for an emitting dipole bound to the interface between two media with different refractive index has been investigated for chemiluminescence detection.
Resumo:
The first topic analyzed in the thesis will be Neural Architecture Search (NAS). I will focus on two different tools that I developed, one to optimize the architecture of Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing that has recently emerged, and one to optimize the data precision of tensors inside CNNs. The first NAS proposed explicitly targets the optimization of the most peculiar architectural parameters of TCNs, namely dilation, receptive field, and the number of features in each layer. Note that this is the first NAS that explicitly targets these networks. The second NAS proposed instead focuses on finding the most efficient data format for a target CNN, with the granularity of the layer filter. Note that applying these two NASes in sequence allows an "application designer" to minimize the structure of the neural network employed, minimizing the number of operations or the memory usage of the network. After that, the second topic described is the optimization of neural network deployment on edge devices. Importantly, exploiting edge platforms' scarce resources is critical for NN efficient execution on MCUs. To do so, I will introduce DORY (Deployment Oriented to memoRY) -- an automatic tool to deploy CNNs on low-cost MCUs. DORY, in different steps, can manage different levels of memory inside the MCU automatically, offload the computation workload (i.e., the different layers of a neural network) to dedicated hardware accelerators, and automatically generates ANSI C code that orchestrates off- and on-chip transfers with the computation phases. On top of this, I will introduce two optimized computation libraries that DORY can exploit to deploy TCNs and Transformers on edge efficiently. I conclude the thesis with two different applications on bio-signal analysis, i.e., heart rate tracking and sEMG-based gesture recognition.
Resumo:
At the intersection of biology, chemistry, and engineering, biosensors are a multidisciplinary innovation that provide a cost-effective alternative to traditional laboratory techniques. Due to their advantages, biosensors are used in medical diagnostics, environmental monitoring, food safety and many other fields. The first part of the thesis is concerned with learning the state of the art of paper-based immunosensors with bioluminescent (BL) and chemiluminescent (CL) detection. The use of biospecific assays combined with CL detection and paper-based technology offers an optimal approach to creating analytical tools for on-site applications and we have focused on the specific areas that need to be considered more in order to ensure a future practical implementation of these methods in routine analyses. The subsequent part of the thesis addresses the development of an autonomous lab-on-chip platform for performing chemiluminescent-based bioassays in space environment, exploiting a CubeSat platform for astrobiological investigations. An origami-inspired microfluidic paper-based analytical device has been developed with the purpose of assesses its performance in space and to evaluate its functionality and the resilience of the (bio)molecules when exposed to a radiation-rich environment. Subsequently, we designed a paper-based assay to detect traces of ovalbumin in food samples, creating a user-friendly immunosensing platform. To this purpose, we developed an origami device that exploits a competitive immunoassay coupled with chemiluminescence detection and magnetic microbeads used to immobilize ovalbumin on paper. Finally, with the aim of exploring the use of biomimetic materials, an hydrogel-based chemiluminescence biosensor for the detection of H2O2 and glucose was developed. A guanosine hydrogel was prepared and loaded with luminol and hemin, miming a DNAzyme activity. Subsequently, the hydrogel was modified by incorporating glucose oxidase enzyme to enable glucose biosensing. The emitted photons were detected using a portable device equipped with a smartphone's CMOS (complementary metal oxide semiconductor) camera for CL emission detection.
Resumo:
Photoplethysmography (PPG) sensors allow for noninvasive and comfortable heart-rate (HR) monitoring, suitable for compact wearable devices. However, PPG signals collected from such devices often suffer from corruption caused by motion artifacts. This is typically addressed by combining the PPG signal with acceleration measurements from an inertial sensor. Recently, different energy-efficient deep learning approaches for heart rate estimation have been proposed. To test these new solutions, in this work, we developed a highly wearable platform (42mm x 48 mm x 1.2mm) for PPG signal acquisition and processing, based on GAP9, a parallel ultra low power system-on-chip featuring nine cores RISC-V compute cluster with neural network accelerator and 1 core RISC-V controller. The hardware platform also integrates a commercial complete Optical Biosensing Module and an ARM-Cortex M4 microcontroller unit (MCU) with Bluetooth low-energy connectivity. To demonstrate the capabilities of the system, a deep learning-based approach for PPG-based HR estimation has been deployed. Thanks to the reduced power consumption of the digital computational platform, the total power budget is just 2.67 mW providing up to 5 days of operation (105 mAh battery).
Resumo:
This master's thesis investigates different aspects of Dual-Active-Bridge (DAB) Converter and extends aspects further to Multi-Active-Bridges (MAB). The thesis starts with an overview of the applications of the DAB and MAB and their importance. The analytical part of the thesis includes the derivation of the peak and RMS currents, which is required for finding the losses present in the system. The power converters, considered in this thesis are DAB, Triple-Active Bridge (TAB) and Quad-Active Bridge (QAB). All the theoretical calculations are compared with the simulation results from PLECS software for identifying the correctness of the reviewed and developed theory. The Hardware-in-the-Loop (HIL) simulation is conducted for checking the control operation in real-time with the help of the RT box from the Plexim. Additionally, as in real systems digital signal processor (DSP), system-on-chip or field programmable gate array is employed for the control of the power electronic systems, and the execution of the control in the real-time simulation (RTS) conducted is performed by DSP.
Resumo:
LLF (Least Laxity First) scheduling, which assigns a higher priority to a task with a smaller laxity, has been known as an optimal preemptive scheduling algorithm on a single processor platform. However, little work has been made to illuminate its characteristics upon multiprocessor platforms. In this paper, we identify the dynamics of laxity from the system’s viewpoint and translate the dynamics into LLF multiprocessor schedulability analysis. More specifically, we first characterize laxity properties under LLF scheduling, focusing on laxity dynamics associated with a deadline miss. These laxity dynamics describe a lower bound, which leads to the deadline miss, on the number of tasks of certain laxity values at certain time instants. This lower bound is significant because it represents invariants for highly dynamic system parameters (laxity values). Since the laxity of a task is dependent of the amount of interference of higher-priority tasks, we can then derive a set of conditions to check whether a given task system can go into the laxity dynamics towards a deadline miss. This way, to the author’s best knowledge, we propose the first LLF multiprocessor schedulability test based on its own laxity properties. We also develop an improved schedulability test that exploits slack values. We mathematically prove that the proposed LLF tests dominate the state-of-the-art EDZL tests. We also present simulation results to evaluate schedulability performance of both the original and improved LLF tests in a quantitative manner.
Resumo:
Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware -- not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU scheduling are not applicable. The reason is that we are not interested in how long it takes for any given GPU thread to complete, but rather how long it takes for all of them to complete. We therefore develop a simple method for finding an upper bound on the makespan of a group of GPU threads executing the same program and competing for the resources of a single streaming multiprocessor (whose architecture is based on NVIDIA Fermi, with some simplifying assunptions). We then build upon this method to formulate the derivation of the exact worst-case makespan (and corresponding schedule) as an optimization problem. Addressing the issue of tractability, we also present a technique for efficiently computing a safe estimate of the worstcase makespan with minimal pessimism, which may be used when finding an exact value would take too long.
Resumo:
Known algorithms capable of scheduling implicit-deadline sporadic tasks over identical processors at up to 100% utilisation invariably involve numerous preemptions and migrations. To the challenge of devising a scheduling scheme with as few preemptions and migrations as possible, for a given guaranteed utilisation bound, we respond with the algorithm NPS-F. It is configurable with a parameter, trading off guaranteed schedulable utilisation (up to 100%) vs preemptions. For any possible configuration, NPS-F introduces fewer preemptions than any other known algorithm matching its utilisation bound. A clustered variant of the algorithm, for systems made of multicore chips, eliminates (costly) off-chip task migrations, by dividing processors into disjoint clusters, formed by cores on the same chip (with the cluster size being a parameter). Clusters are independently scheduled (each, using non-clustered NPS-F). The utilisation bound is only moderately affected. We also formulate an important extension (applicable to both clustered and non-clustered NPS-F) which optimises the supply of processing time to executing tasks and makes it more granular. This reduces processing capacity requirements for schedulability without increasing preemptions.