990 resultados para Compute unified device architectures
Resumo:
Organic solar cells show great promise as an economically and environmentally friendly technology to utilize solar energy because of their simple fabrication processes and minimal material usage. However, new innovations and breakthroughs are needed for organic solar cell technology to become competitive in the future. This article reviews research efforts and accomplishments focusing on three issues: power conversion efficiency, device stability and processability for mass production, followed by an outlook for optimizing OSC performance through device engineering and new architecture designs to realize next generation organic solar cells.
Resumo:
We propose a unified model for large signal and small signal non-quasi-static analysis of long channel symmetric double gate MOSFET. The model is physics based and relies only on the very basic approximation needed for a charge-based model. It is based on the EKV formalism Enz C, Vittoz EA. Charge based MOS transistor modeling. Wiley; 2006] and is valid in all regions of operation and thus suitable for RF circuit design. Proposed model is verified with professional numerical device simulator and excellent agreement is found. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
FACTS controllers are emerging as viable and economic solutions to the problems of large interconnected ne networks, which can endanger the system security. These devices are characterized by their fast response, absence of inertia, and minimum maintenance requirements. Thyristor controlled equipment like Thyristor Controlled Series Capacitor (TCSC), Static Var Compensator (SVC), Thyristor Controlled Phase angle Regulator (TCPR) etc. which involve passive elements result in devices of large sizes with substantial cost and significant labour for installation. An all solid-state device using GTOs leads to reduction in equipment size and has improved performance. The Unified Power Flow Controller (UPFC) is a versatile controller which can be used to control the active and reactive power in the Line independently. The concept of UPFC makes it possible to handle practically all power flow control and transmission line compensation problems, using solid-state controllers, which provide functional flexibility, generally not attainable by conventional thyristor controlled systems. In this paper, we present the development of a control scheme for the series injected voltage of the UPFC to damp the power oscillations and improve transient stability in a power system. (C) 1998 Elsevier Science Ltd. All rights reserved.
Resumo:
Frequent accesses to the register file make it one of the major sources of energy consumption in ILP architectures. The large number of functional units connected to a large unified register file in VLIW architectures make power dissipation in the register file even worse because of the need for a large number of ports. High power dissipation in a relatively smaller area occupied by a register file leads to a high power density in the register file and makes it one of the prime hot-spots. This makes it highly susceptible to the possibility of a catastrophic heatstroke. This in turn impacts the performance and cost because of the need for periodic cool down and sophisticated packaging and cooling techniques respectively. Clustered VLIW architectures partition the register file among clusters of functional units and reduce the number of ports required thereby reducing the power dissipation. However, we observe that the aggregate accesses to register files in clustered VLIW architectures (and associated energy consumption) become very high compared to the centralized VLIW architectures and this can be attributed to a large number of explicit inter-cluster communications. Snooping based clustered VLIW architectures provide very limited but very fast way of inter-cluster communication by allowing some of the functional units to directly read some of the operands from the register file of some of the other clusters. In this paper, we propose instruction scheduling algorithms that exploit the limited snooping capability to reduce the register file energy consumption on an average by 12% and 18% and improve the overall performance by 5% and 11% for a 2-clustered and a 4-clustered machine respectively, over an earlier state-of-the-art clustered scheduling algorithm when evaluated in the context of snooping based clustered VLIW architectures.
Resumo:
Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.
Resumo:
Coarse Grained Reconfigurable Architectures (CGRA) are emerging as embedded application processing units in computing platforms for Exascale computing. Such CGRAs are distributed memory multi- core compute elements on a chip that communicate over a Network-on-chip (NoC). Numerical Linear Algebra (NLA) kernels are key to several high performance computing applications. In this paper we propose a systematic methodology to obtain the specification of Compute Elements (CE) for such CGRAs. We analyze block Matrix Multiplication and block LU Decomposition algorithms in the context of a CGRA, and obtain theoretical bounds on communication requirements, and memory sizes for a CE. Support for high performance custom computations common to NLA kernels are met through custom function units (CFUs) in the CEs. We present results to justify the merits of such CFUs.
Resumo:
The sun has the potential to power the Earth's total energy needs, but electricity from solar power still constitutes an extremely small fraction of our power generation because of its high cost relative to traditional energy sources. Therefore, the cost of solar must be reduced to realize a more sustainable future. This can be achieved by significantly increasing the efficiency of modules that convert solar radiation to electricity. In this thesis, we consider several strategies to improve the device and photonic design of solar modules to achieve record, ultrahigh (> 50%) solar module efficiencies. First, we investigate the potential of a new passivation treatment, trioctylphosphine sulfide, to increase the performance of small GaAs solar cells for cheaper and more durable modules. We show that small cells (mm2), which currently have a significant efficiency decrease (~ 5%) compared to larger cells (cm2) because small cells have a higher fraction of recombination-active surface from the sidewalls, can achieve significantly higher efficiencies with effective passivation of the sidewalls. We experimentally validate the passivation qualities of treatment by trioctylphosphine sulfide (TOP:S) through four independent studies and show that this facile treatment can enable efficient small devices. Then, we discuss our efforts toward the design and prototyping of a spectrum-splitting module that employs optical elements to divide the incident spectrum into different color bands, which allows for higher efficiencies than traditional methods. We present a design, the polyhedral specular reflector, that has the potential for > 50% module efficiencies even with realistic losses from combined optics, cell, and electrical models. Prototyping efforts of one of these designs using glass concentrators yields an optical module whose combined spectrum-splitting and concentration should correspond to a record module efficiency of 42%. Finally, we consider how the manipulation of radiatively emitted photons from subcells in multijunction architectures can be used to achieve even higher efficiencies than previously thought, inspiring both optimization of incident and radiatively emitted photons for future high efficiency designs. In this thesis work, we explore novel device and photonic designs that represent a significant departure from current solar cell manufacturing techniques and ultimately show the potential for much higher solar cell efficiencies.
Resumo:
This paper describes a unified approach to modelling the polysilicon thin film transistor (TFT) for the purposes of circuit design. The approach uses accurate methods of predicting the channel conductance and then fitting the resulting data with a polynomial. Two methods are proposed to find the channel conductance: a device model and measurement. The approach is suitable because the TFT does not have a well defined threshold voltage. The polynomial conductance is then integrated generally to find the drain current and channel charge, necessary for a complete circuit model. © 1991 The Japan Society of Applied Physics.
Resumo:
We present a simple and semi-physical analytical description of the current-voltage characteristics of amorphous oxide semiconductor thin-film transistors in the above-threshold and sub-threshold regions. Both regions are described by single unified expression that employs the same set of model parameter values directly extracted from measured terminal characteristics. The model accurately reproduces measured characteristics of amorphous semiconductor thin film transistors in general, yielding a scatter of < 4%. © 1980-2012 IEEE.
Resumo:
In this theoretical paper, the analysis of the effect that ON-state active-device resistance has on the performance of a Class-E tuned power amplifier using a shunt inductor topology is presented. The work is focused on the relatively unexplored area of design facilitation of Class-E tuned amplifiers where intrinsically low-output-capacitance monolithic microwave integrated circuit switching devices such as pseudomorphic high electron mobility transistors are used. In the paper, the switching voltage and current waveforms in the presence of ON-resistance are analyzed in order to provide insight into circuit properties such as RF output power, drain efficiency, and power-output capability. For a given amplifier specification, a design procedure is illustrated whereby it is possible to compute optimal circuit component values which account for prescribed switch resistance loss. Furthermore, insight into how ON-resistance affects transistor selection in terms of peak switch voltage and current requirements is described. Finally, a design example is given in order to validate the theoretical analysis against numerical simulation.
Resumo:
Task dataflow languages simplify the specification of parallel programs by dynamically detecting and enforcing dependencies between tasks. These languages are, however, often restricted to a single level of parallelism. This language design is reflected in the runtime system, where a master thread explicitly generates a task graph and worker threads execute ready tasks and wake-up their dependents. Such an approach is incompatible with state-of-the-art schedulers such as the Cilk scheduler, that minimize the creation of idle tasks (work-first principle) and place all task creation and scheduling off the critical path. This paper proposes an extension to the Cilk scheduler in order to reconcile task dependencies with the work-first principle. We discuss the impact of task dependencies on the properties of the Cilk scheduler. Furthermore, we propose a low-overhead ticket-based technique for dependency tracking and enforcement at the object level. Our scheduler also supports renaming of objects in order to increase task-level parallelism. Renaming is implemented using versioned objects, a new type of hyper object. Experimental evaluation shows that the unified scheduler is as efficient as the Cilk scheduler when tasks have no dependencies. Moreover, the unified scheduler is more efficient than SMPSS, a particular implementation of a task dataflow language.
Resumo:
Passive person detection and localization is an emerging area in UWB localization systems, whereby people are not required to carry any UWB ranging device. Based on experimental data, we propose a novel method to detect static persons in the absence of template waveforms, and to compute distances to these persons. Our method makes very little assumptions on the environment and can achieve ranging performances on the order of 50 cm, using off-the-shelf UWB devices. © 2013 IEEE.
Resumo:
General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectural space focussed on reconfigurable computing architectures. Two dominant features differentiate reconfigurable from special-purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP-space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a fine- grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and finite- state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time-switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources significantly narrows the range of efficiency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register-file building blocks interconnected via a byte-wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20x the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and efficient general-purpose computing structures.
Resumo:
Performance and manufacturability are two important issues that must be taken into account during MEMS design. Existing MEMS design models or systems follow a process-driven design paradigm, that is, design starts from the specification of process sequence or the customization of foundry-ready process template. There has been essentially no methodology or model that supports generic, high-level design synthesis for MEMS conceptual design. As a result, there lacks a basis for specifying the initial process sequences. To address this problem, this paper proposes a performance-driven, microfabrication-oriented methodology for MEMS conceptual design. A unified behaviour representation method is proposed which incorporates information of both physical interactions and chemical/biological/other reactions. Based on this method, a behavioural process based design synthesis model is proposed, which exploits multidisciplinary phenomena for design solutions, including both the structural components and their configuration for the MEMS device, as well as the necessary substances for the chemical/biological/other reactions. The model supports both forward and backward synthetic search for suitable phenomena. To ensure manufacturability, a strategy of using microfabrication-oriented phenomena as design knowledge is proposed, where the phenomena are developed from existing MEMS devices that have associated MEMS-specific microfabrication processes or foundry-ready process templates. To test the applicability of the proposed methodology, the paper also studies microfluidic device design and uses a micro-pump design for the case study.
Resumo:
Proposed is a unique cell histogram architecture which will process k data items in parallel to compute 2q histogram bins per time step. An array of m/2q cells computes an m-bin histogram with a speed-up factor of k; k ⩾ 2 makes it faster than current dual-ported memory implementations. Furthermore, simple mechanisms for conflict-free storing of the histogram bins into an external memory array are discussed.