2 resultados para Parallelism
em CaltechTHESIS
Resumo:
Technology scaling has enabled drastic growth in the computational and storage capacity of integrated circuits (ICs). This constant growth drives an increasing demand for high-bandwidth communication between and within ICs. In this dissertation we focus on low-power solutions that address this demand. We divide communication links into three subcategories depending on the communication distance. Each category has a different set of challenges and requirements and is affected by CMOS technology scaling in a different manner. We start with short-range chip-to-chip links for board-level communication. Next we will discuss board-to-board links, which demand a longer communication range. Finally on-chip links with communication ranges of a few millimeters are discussed.
Electrical signaling is a natural choice for chip-to-chip communication due to efficient integration and low cost. IO data rates have increased to the point where electrical signaling is now limited by the channel bandwidth. In order to achieve multi-Gb/s data rates, complex designs that equalize the channel are necessary. In addition, a high level of parallelism is central to sustaining bandwidth growth. Decision feedback equalization (DFE) is one of the most commonly employed techniques to overcome the limited bandwidth problem of the electrical channels. A linear and low-power summer is the central block of a DFE. Conventional approaches employ current-mode techniques to implement the summer, which require high power consumption. In order to achieve low-power operation we propose performing the summation in the charge domain. This approach enables a low-power and compact realization of the DFE as well as crosstalk cancellation. A prototype receiver was fabricated in 45nm SOI CMOS to validate the functionality of the proposed technique and was tested over channels with different levels of loss and coupling. Measurement results show that the receiver can equalize channels with maximum 21dB loss while consuming about 7.5mW from a 1.2V supply. We also introduce a compact, low-power transmitter employing passive equalization. The efficacy of the proposed technique is demonstrated through implementation of a prototype in 65nm CMOS. The design achieves up to 20Gb/s data rate while consuming less than 10mW.
An alternative to electrical signaling is to employ optical signaling for chip-to-chip interconnections, which offers low channel loss and cross-talk while providing high communication bandwidth. In this work we demonstrate the possibility of building compact and low-power optical receivers. A novel RC front-end is proposed that combines dynamic offset modulation and double-sampling techniques to eliminate the need for a short time constant at the input of the receiver. Unlike conventional designs, this receiver does not require a high-gain stage that runs at the data rate, making it suitable for low-power implementations. In addition, it allows time-division multiplexing to support very high data rates. A prototype was implemented in 65nm CMOS and achieved up to 24Gb/s with less than 0.4pJ/b power efficiency per channel. As the proposed design mainly employs digital blocks, it benefits greatly from technology scaling in terms of power and area saving.
As the technology scales, the number of transistors on the chip grows. This necessitates a corresponding increase in the bandwidth of the on-chip wires. In this dissertation, we take a close look at wire scaling and investigate its effect on wire performance metrics. We explore a novel on-chip communication link based on a double-sampling architecture and dynamic offset modulation technique that enables low power consumption and high data rates while achieving high bandwidth density in 28nm CMOS technology. The functionality of the link is demonstrated using different length minimum-pitch on-chip wires. Measurement results show that the link achieves up to 20Gb/s of data rate (12.5Gb/s/$\mu$m) with better than 136fJ/b of power efficiency.
Resumo:
A neural network is a highly interconnected set of simple processors. The many connections allow information to travel rapidly through the network, and due to their simplicity, many processors in one network are feasible. Together these properties imply that we can build efficient massively parallel machines using neural networks. The primary problem is how do we specify the interconnections in a neural network. The various approaches developed so far such as outer product, learning algorithm, or energy function suffer from the following deficiencies: long training/ specification times; not guaranteed to work on all inputs; requires full connectivity.
Alternatively we discuss methods of using the topology and constraints of the problems themselves to design the topology and connections of the neural solution. We define several useful circuits-generalizations of the Winner-Take-All circuitthat allows us to incorporate constraints using feedback in a controlled manner. These circuits are proven to be stable, and to only converge on valid states. We use the Hopfield electronic model since this is close to an actual implementation. We also discuss methods for incorporating these circuits into larger systems, neural and nonneural. By exploiting regularities in our definition, we can construct efficient networks. To demonstrate the methods, we look to three problems from communications. We first discuss two applications to problems from circuit switching; finding routes in large multistage switches, and the call rearrangement problem. These show both, how we can use many neurons to build massively parallel machines, and how the Winner-Take-All circuits can simplify our designs.
Next we develop a solution to the contention arbitration problem of high-speed packet switches. We define a useful class of switching networks and then design a neural network to solve the contention arbitration problem for this class. Various aspects of the neural network/switch system are analyzed to measure the queueing performance of this method. Using the basic design, a feasible architecture for a large (1024-input) ATM packet switch is presented. Using the massive parallelism of neural networks, we can consider algorithms that were previously computationally unattainable. These now viable algorithms lead us to new perspectives on switch design.