10 resultados para Computer Engineering|Electrical engineering
em DRUM (Digital Repository at the University of Maryland)
Resumo:
An abstract of this work will be presented at the Compiler, Architecture and Tools Conference (CATC), Intel Development Center, Haifa, Israel November 23, 2015.
Resumo:
Heterogeneous computing systems have become common in modern processor architectures. These systems, such as those released by AMD, Intel, and Nvidia, include both CPU and GPU cores on a single die available with reduced communication overhead compared to their discrete predecessors. Currently, discrete CPU/GPU systems are limited, requiring larger, regular, highly-parallel workloads to overcome the communication costs of the system. Without the traditional communication delay assumed between GPUs and CPUs, we believe non-traditional workloads could be targeted for GPU execution. Specifically, this thesis focuses on the execution model of nested parallel workloads on heterogeneous systems. We have designed a simulation flow which utilizes widely used CPU and GPU simulators to model heterogeneous computing architectures. We then applied this simulator to non-traditional GPU workloads using different execution models. We also have proposed a new execution model for nested parallelism allowing users to exploit these heterogeneous systems to reduce execution time.
Resumo:
Due to increasing integration density and operating frequency of today's high performance processors, the temperature of a typical chip can easily exceed 100 degrees Celsius. However, the runtime thermal state of a chip is very hard to predict and manage due to the random nature in computing workloads, as well as the process, voltage and ambient temperature variability (together called PVT variability). The uneven nature (both in time and space) of the heat dissipation of the chip could lead to severe reliability issues and error-prone chip behavior (e.g. timing errors). Many dynamic power/thermal management techniques have been proposed to address this issue such as dynamic voltage and frequency scaling (DVFS), clock gating and etc. However, most of such techniques require accurate knowledge of the runtime thermal state of the chip to make efficient and effective control decisions. In this work we address the problem of tracking and managing the temperature of microprocessors which include the following sub-problems: (1) how to design an efficient sensor-based thermal tracking system on a given design that could provide accurate real-time temperature feedback; (2) what statistical techniques could be used to estimate the full-chip thermal profile based on very limited (and possibly noise-corrupted) sensor observations; (3) how do we adapt to changes in the underlying system's behavior, since such changes could impact the accuracy of our thermal estimation. The thermal tracking methodology proposed in this work is enabled by on-chip sensors which are already implemented in many modern processors. We first investigate the underlying relationship between heat distribution and power consumption, then we introduce an accurate thermal model for the chip system. Based on this model, we characterize the temperature correlation that exists among different chip modules and explore statistical approaches (such as those based on Kalman filter) that could utilize such correlation to estimate the accurate chip-level thermal profiles in real time. Such estimation is performed based on limited sensor information because sensors are usually resource constrained and noise-corrupted. We also took a further step to extend the standard Kalman filter approach to account for (1) nonlinear effects such as leakage-temperature interdependency and (2) varying statistical characteristics in the underlying system model. The proposed thermal tracking infrastructure and estimation algorithms could consistently generate accurate thermal estimates even when the system is switching among workloads that have very distinct characteristics. Through experiments, our approaches have demonstrated promising results with much higher accuracy compared to existing approaches. Such results can be used to ensure thermal reliability and improve the effectiveness of dynamic thermal management techniques.
Resumo:
Malware is a foundational component of cyber crime that enables an attacker to modify the normal operation of a computer or access sensitive, digital information. Despite the extensive research performed to identify such programs, existing schemes fail to detect evasive malware, an increasingly popular class of malware that can alter its behavior at run-time, making it difficult to detect using today’s state of the art malware analysis systems. In this thesis, we present DVasion, a comprehensive strategy that exposes such evasive behavior through a multi-execution technique. DVasion successfully detects behavior that would have been missed by traditional, single-execution approaches, while addressing the limitations of previously proposed multi-execution systems. We demonstrate the accuracy of our system through strong parallels with existing work on evasive malware, as well as uncover the hidden behavior within 167 of 1,000 samples.
Resumo:
A poster of this paper will be presented at the 25th International Conference on Parallel Architecture and Compilation Technology (PACT ’16), September 11-15, 2016, Haifa, Israel.
Resumo:
The performance, energy efficiency and cost improvements due to traditional technology scaling have begun to slow down and present diminishing returns. Underlying reasons for this trend include fundamental physical limits of transistor scaling, the growing significance of quantum effects as transistors shrink, and a growing mismatch between transistors and interconnects regarding size, speed and power. Continued Moore's Law scaling will not come from technology scaling alone, and must involve improvements to design tools and development of new disruptive technologies such as 3D integration. 3D integration presents potential improvements to interconnect power and delay by translating the routing problem into a third dimension, and facilitates transistor density scaling independent of technology node. Furthermore, 3D IC technology opens up a new architectural design space of heterogeneously-integrated high-bandwidth CPUs. Vertical integration promises to provide the CPU architectures of the future by integrating high performance processors with on-chip high-bandwidth memory systems and highly connected network-on-chip structures. Such techniques can overcome the well-known CPU performance bottlenecks referred to as memory and communication wall. However the promising improvements to performance and energy efficiency offered by 3D CPUs does not come without cost, both in the financial investments to develop the technology, and the increased complexity of design. Two main limitations to 3D IC technology have been heat removal and TSV reliability. Transistor stacking creates increases in power density, current density and thermal resistance in air cooled packages. Furthermore the technology introduces vertical through silicon vias (TSVs) that create new points of failure in the chip and require development of new BEOL technologies. Although these issues can be controlled to some extent using thermal-reliability aware physical and architectural 3D design techniques, high performance embedded cooling schemes, such as micro-fluidic (MF) cooling, are fundamentally necessary to unlock the true potential of 3D ICs. A new paradigm is being put forth which integrates the computational, electrical, physical, thermal and reliability views of a system. The unification of these diverse aspects of integrated circuits is called Co-Design. Independent design and optimization of each aspect leads to sub-optimal designs due to a lack of understanding of cross-domain interactions and their impacts on the feasibility region of the architectural design space. Co-Design enables optimization across layers with a multi-domain view and thus unlocks new high-performance and energy efficient configurations. Although the co-design paradigm is becoming increasingly necessary in all fields of IC design, it is even more critical in 3D ICs where, as we show, the inter-layer coupling and higher degree of connectivity between components exacerbates the interdependence between architectural parameters, physical design parameters and the multitude of metrics of interest to the designer (i.e. power, performance, temperature and reliability). In this dissertation we present a framework for multi-domain co-simulation and co-optimization of 3D CPU architectures with both air and MF cooling solutions. Finally we propose an approach for design space exploration and modeling within the new Co-Design paradigm, and discuss the possible avenues for improvement of this work in the future.
Resumo:
As the semiconductor industry struggles to maintain its momentum down the path following the Moore's Law, three dimensional integrated circuit (3D IC) technology has emerged as a promising solution to achieve higher integration density, better performance, and lower power consumption. However, despite its significant improvement in electrical performance, 3D IC presents several serious physical design challenges. In this dissertation, we investigate physical design methodologies for 3D ICs with primary focus on two areas: low power 3D clock tree design, and reliability degradation modeling and management. Clock trees are essential parts for digital system which dissipate a large amount of power due to high capacitive loads. The majority of existing 3D clock tree designs focus on minimizing the total wire length, which produces sub-optimal results for power optimization. In this dissertation, we formulate a 3D clock tree design flow which directly optimizes for clock power. Besides, we also investigate the design methodology for clock gating a 3D clock tree, which uses shutdown gates to selectively turn off unnecessary clock activities. Different from the common assumption in 2D ICs that shutdown gates are cheap thus can be applied at every clock node, shutdown gates in 3D ICs introduce additional control TSVs, which compete with clock TSVs for placement resources. We explore the design methodologies to produce the optimal allocation and placement for clock and control TSVs so that the clock power is minimized. We show that the proposed synthesis flow saves significant clock power while accounting for available TSV placement area. Vertical integration also brings new reliability challenges including TSV's electromigration (EM) and several other reliability loss mechanisms caused by TSV-induced stress. These reliability loss models involve complex inter-dependencies between electrical and thermal conditions, which have not been investigated in the past. In this dissertation we set up an electrical/thermal/reliability co-simulation framework to capture the transient of reliability loss in 3D ICs. We further derive and validate an analytical reliability objective function that can be integrated into the 3D placement design flow. The reliability aware placement scheme enables co-design and co-optimization of both the electrical and reliability property, thus improves both the circuit's performance and its lifetime. Our electrical/reliability co-design scheme avoids unnecessary design cycles or application of ad-hoc fixes that lead to sub-optimal performance. Vertical integration also enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. In addition, a dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance. The proposed physical design methodologies should act as important building blocks for 3D ICs and push 3D ICs toward mainstream acceptance in the near future.
Resumo:
Strawberries harvested for processing as frozen fruits are currently de-calyxed manually in the field. This process requires the removal of the stem cap with green leaves (i.e. the calyx) and incurs many disadvantages when performed by hand. Not only does it necessitate the need to maintain cutting tool sanitation, but it also increases labor time and exposure of the de-capped strawberries before in-plant processing. This leads to labor inefficiency and decreased harvest yield. By moving the calyx removal process from the fields to the processing plants, this new practice would reduce field labor and improve management and logistics, while increasing annual yield. As labor prices continue to increase, the strawberry industry has shown great interest in the development and implementation of an automated calyx removal system. In response, this dissertation describes the design, operation, and performance of a full-scale automatic vision-guided intelligent de-calyxing (AVID) prototype machine. The AVID machine utilizes commercially available equipment to produce a relatively low cost automated de-calyxing system that can be retrofitted into existing food processing facilities. This dissertation is broken up into five sections. The first two sections include a machine overview and a 12-week processing plant pilot study. Results of the pilot study indicate the AVID machine is able to de-calyx grade-1-with-cap conical strawberries at roughly 66 percent output weight yield at a throughput of 10,000 pounds per hour. The remaining three sections describe in detail the three main components of the machine: a strawberry loading and orientation conveyor, a machine vision system for calyx identification, and a synchronized multi-waterjet knife calyx removal system. In short, the loading system utilizes rotational energy to orient conical strawberries. The machine vision system determines cut locations through RGB real-time feature extraction. The high-speed multi-waterjet knife system uses direct drive actuation to locate 30,000 psi cutting streams to precise coordinates for calyx removal. Based on the observations and studies performed within this dissertation, the AVID machine is seen to be a viable option for automated high-throughput strawberry calyx removal. A summary of future tasks and further improvements is discussed at the end.
Resumo:
The thesis aims to exploit properties of thin films for applications such as spintronics, UV detection and gas sensing. Nanoscale thin films devices have myriad advantages and compatibility with Si-based integrated circuits processes. Two distinct classes of material systems are investigated, namely ferromagnetic thin films and semiconductor oxides. To aid the designing of devices, the surface properties of the thin films were investigated by using electron and photon characterization techniques including Auger electron spectroscopy (AES), X-ray photoelectron spectroscopy (XPS), grazing incidence X-ray diffraction (GIXRD), and energy-dispersive X-ray spectroscopy (EDS). These are complemented by nanometer resolved local proximal probes such as atomic force microscopy (AFM), magnetic force microscopy (MFM), electric force microscopy (EFM), and scanning tunneling microscopy to elucidate the interplay between stoichiometry, morphology, chemical states, crystallization, magnetism, optical transparency, and electronic properties. Specifically, I studied the effect of annealing on the surface stoichiometry of the CoFeB/Cu system by in-situ AES and discovered that magnetic nanoparticles with controllable areal density can be produced. This is a good alternative for producing nanoparticles using a maskless process. Additionally, I studied the behavior of magnetic domain walls of the low coercivity alloy CoFeB patterned nanowires. MFM measurement with the in-plane magnetic field showed that, compared to their permalloy counterparts, CoFeB nanowires require a much smaller magnetization switching field , making them promising for low-power-consumption domain wall motion based devices. With oxides, I studied CuO nanoparticles on SnO2 based UV photodetectors (PDs), and discovered that they promote the responsivity by facilitating charge transfer with the formed nanoheterojunctions. I also demonstrated UV PDs with spectrally tunable photoresponse with the bandgap engineered ZnMgO. The bandgap of the alloyed ZnMgO thin films was tailored by varying the Mg contents and AES was demonstrated as a surface scientific approach to assess the alloying of ZnMgO. With gas sensors, I discovered the rf-sputtered anatase-TiO2 thin films for a selective and sensitive NO2 detection at room temperature, under UV illumination. The implementation of UV enhances the responsivity, response and recovery rate of the TiO2 sensor towards NO2 significantly. Evident from the high resolution XPS and AFM studies, the surface contamination and morphology of the thin films degrade the gas sensing response. I also demonstrated that surface additive metal nanoparticles on thin films can improve the response and the selectivity of oxide based sensors. I employed nanometer-scale scanning probe microscopy to study a novel gas senor scheme consisting of gallium nitride (GaN) nanowires with functionalizing oxides layer. The results suggested that AFM together with EFM is capable of discriminating low-conductive materials at the nanoscale, providing a nondestructive method to quantitatively relate sensing response to the surface morphology.
Resumo:
Nanostructures are highly attractive for future electrical energy storage devices because they enable large surface area and short ion transport time through thin electrode layers for high power devices. Significant enhancement in power density of batteries has been achieved by nano-engineered structures, particularly anode and cathode nanostructures spatially separated far apart by a porous membrane and/or a defined electrolyte region. A self-aligned nanostructured battery fully confined within a single nanopore presents a powerful platform to determine the rate performance and cyclability limits of nanostructured storage devices. Atomic layer deposition (ALD) has enabled us to create and evaluate such structures, comprised of nanotubular electrodes and electrolyte confined within anodic aluminum oxide (AAO) nanopores. The V2O5- V2O5 symmetric nanopore battery displays exceptional power-energy performance and cyclability when tested as a massively parallel device (~2billion/cm2), each with ~1m3 volume (~1fL). Cycled between 0.2V and 1.8V, this full cell has capacity retention of 95% at 5C rate and 46% at 150C, with more than 1000 charge/discharge cycles. These results demonstrate the promise of ultrasmall, self-aligned/regular, densely packed nanobattery structures as a testbed to study ionics and electrodics at the nanoscale with various geometrical modifications and as a building block for high performance energy storage systems[1, 2]. Further increase of full cell output potential is also demonstrated in asymmetric full cell configurations with various low voltage anode materials. The asymmetric full cell nanopore batteries, comprised of V2O5 as cathode and prelithiated SnO2 or anatase phase TiO2 as anode, with integrated nanotubular metal current collectors underneath each nanotubular storage electrode, also enabled by ALD. By controlling the amount of lithium ion prelithiated into SnO2 anode, we can tune full cell output voltage in the range of 0.3V and 3V. This asymmetric nanopore battery array displays exceptional rate performance and cyclability. When cycled between 1V and 3V, it has capacity retention of approximately 73% at 200C rate compared to 1C, with only 2% capacity loss after more than 500 charge/discharge cycles. With increased full cell output potential, the asymmetric V2O5-SnO2 nanopore battery shows significantly improved energy and power density. This configuration presents a more realistic test - through its asymmetric (vs symmetric) configuration – of performance and cyclability in nanoconfined environment. This dissertation covers (1) Ultra small electrochemical storage platform design and fabrication, (2) Electron and ion transport in nanostructured electrodes inside a half cell configuration, (3) Ion transport between anode and cathode in confined nanochannels in symmetric full cells, (4) Scale up energy and power density with geometry optimization and low voltage anode materials in asymmetric full cell configurations. As a supplement, selective growth of ALD to improve graphene conductance will also be discussed[3]. References: 1. Liu, C., et al., (Invited) A Rational Design for Batteries at Nanoscale by Atomic Layer Deposition. ECS Transactions, 2015. 69(7): p. 23-30. 2. Liu, C.Y., et al., An all-in-one nanopore battery array. Nature Nanotechnology, 2014. 9(12): p. 1031-1039. 3. Liu, C., et al., Improving Graphene Conductivity through Selective Atomic Layer Deposition. ECS Transactions, 2015. 69(7): p. 133-138.