23 resultados para Trade creation
em Indian Institute of Science - Bangalore - Índia
Resumo:
This paper presents a power, latency and throughput trade-off study on NoCs by varying microarchitectural (e.g. pipelining) and circuit level (e.g. frequency and voltage) parameters. We change pipelining depth, operating frequency and supply voltage for 3 example NoCs - 16 node 2D Torus, Tree network and Reduced 2D Torus. We use an in-house NoC exploration framework capable of topology generation and comparison using parameterized models of Routers and links developed in SystemC. The framework utilizes interconnect power and delay models from a low-level modelling tool called Intacte[1]1. We find that increased pipelining can actually reduce latency. We also find that there exists an optimal degree of pipelining which is the most energy efficient in terms of minimizing energy-delay product.
Resumo:
A very concise and diversity-oriented approach to rapidly access frondosin-related frameworks from commercially available building blocks is outlined.
Resumo:
Abstract is not available.
Resumo:
CMPs enable simultaneous execution of multiple applications on the same platforms that share cache resources. Diversity in the cache access patterns of these simultaneously executing applications can potentially trigger inter-application interference, leading to cache pollution. Whereas a large cache can ameliorate this problem, the issues of larger power consumption with increasing cache size, amplified at sub-100nm technologies, makes this solution prohibitive. In this paper in order to address the issues relating to power-aware performance of caches, we propose a caching structure that addresses the following: 1. Definition of application-specific cache partitions as an aggregation of caching units (molecules). The parameters of each molecule namely size, associativity and line size are chosen so that the power consumed by it and access time are optimal for the given technology. 2. Application-Specific resizing of cache partitions with variable and adaptive associativity per cache line, way size and variable line size. 3. A replacement policy that is transparent to the partition in terms of size, heterogeneity in associativity and line size. Through simulation studies we establish the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.
Resumo:
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires which leads to delay in execution and significantly high energy consumption.In this paper, we propose a new instruction scheduling algorithm that exploits scheduling slacks of instructions and communication slacks of data values together to achieve better energy-performance trade-offs for clustered architectures with heterogeneous interconnect. Our instruction scheduling algorithm achieves 35% and 40% reduction in communication energy, whereas the overall energy-delay product improves by 4.5% and 6.5% respectively for 2 cluster and 4 cluster machines with marginal increase (1.6% and 1.1%) in execution time. Our test bed uses the Trimaran compiler infrastructure.
Resumo:
A common and practical paradigm in cooperative communication systems is the use of a dynamically selected `best' relay to decode and forward information from a source to a destination. Such systems use two phases - a relay selection phase, in which the system uses transmission time and energy to select the best relay, and a data transmission phase, in which it uses the spatial diversity benefits of selection to transmit data. In this paper, we derive closed-form expressions for the overall throughput and energy consumption, and study the time and energy trade-off between the selection and data transmission phases. To this end, we analyze a baseline non-adaptive system and several adaptive systems that adapt the selection phase, relay transmission power, or transmission time. Our results show that while selection yields significant benefits, the selection phase's time and energy overhead can be significant. In fact, at the optimal point, the selection can be far from perfect, and depends on the number of relays and the mode of adaptation. The results also provide guidelines about the optimal system operating point for different modes of adaptation. The analysis also sheds new insights on the fast splitting-based algorithm considered in this paper for relay selection.
Resumo:
A new conformal creation field cosmology is considered and it is found that a negative energy scalar field nonminimally coupled to the gravitational field gives rise to creation and, in contrast to Hoyle-Narlikar theory, no a priori assumption about the rate of creation is required to solve the field equations.
Resumo:
We describe a System-C based framework we are developing, to explore the impact of various architectural and microarchitectural level parameters of the on-chip interconnection network elements on its power and performance. The framework enables one to choose from a variety of architectural options like topology, routing policy, etc., as well as allows experimentation with various microarchitectural options for the individual links like length, wire width, pitch, pipelining, supply voltage and frequency. The framework also supports a flexible traffic generation and communication model. We provide preliminary results of using this framework to study the power, latency and throughput of a 4x4 multi-core processing array using mesh, torus and folded torus, for two different communication patterns of dense and sparse linear algebra. The traffic consists of both Request-Response messages (mimicing cache accesses)and One-Way messages. We find that the average latency can be reduced by increasing the pipeline depth, as it enables higher link frequencies. We also find that there exists an optimum degree of pipelining which minimizes energy-delay product.
Resumo:
This paper describes the efforts at MILE lab, IISc, to create a 100,000-word database each in Kannada and Tamil for the design and development of Online Handwritten Recognition. It has been collected from over 600 users in order to capture the variations in writing style. We describe features of the scripts and how the number of symbols were reduced to be able to effectively train the data for recognition. The list of words include all the characters, Kannada and Indo-Arabic numerals, punctuations and other symbols. A semi-automated tool for the annotation of data from stroke to word level is used. It segments each word into stroke groups and also acts as a validation mechanism for segmentation. The tool displays the stroke, stroke groups and aksharas of a word and hence can be used to study the various styles of writing, delayed strokes and for assigning quality tags to the words. The tool is currently being used for annotating Tamil and Kannada data. The output is stored in a standard XML format.
Resumo:
Electronic exchanges are double-sided marketplaces that allow multiple buyers to trade with multiple sellers, with aggregation of demand and supply across the bids to maximize the revenue in the market. Two important issues in the design of exchanges are (1) trade determination (determining the number of goods traded between any buyer-seller pair) and (2) pricing. In this paper we address the trade determination issue for one-shot, multi-attribute exchanges that trade multiple units of the same good. The bids are configurable with separable additive price functions over the attributes and each function is continuous and piecewise linear. We model trade determination as mixed integer programming problems for different possible bid structures and show that even in two-attribute exchanges, trade determination is NP-hard for certain bid structures. We also make some observations on the pricing issues that are closely related to the mixed integer formulations.
Resumo:
We studied the feasibility of the measurement of Higgs pair creation at a photon linear collider. From the sensitivity to the anomalous self-coupling of the Higgs boson, the optimum gamma gamma collision energy was found to be around 270 GeV for a Higgs mass of 120 GeV/c(2). We found that large backgrounds such as gamma gamma -> W+W-, ZZ, and b (b) over barb (b) over bar can be suppressed if correct assignment of tracks to parent partons is achieved and Higgs pair events can be observed with a statistical significance of similar to 5 sigma by operating the photon linear collider for 5 years.
Resumo:
The experimental implementation of a quantum algorithm requires the decomposition of unitary operators. Here we treat unitary-operator decomposition as an optimization problem, and use a genetic algorithm-a global-optimization method inspired by nature's evolutionary process-for operator decomposition. We apply this method to NMR quantum information processing, and find a probabilistic way of performing universal quantum computation using global hard pulses. We also demonstrate the efficient creation of the singlet state (a special type of Bell state) directly from thermal equilibrium, using an optimum sequence of pulses. © 2012 American Physical Society.
Resumo:
The impact of gate-to-source/drain overlap length on performance and variability of 65 nm CMOS is presented. The device and circuit variability is investigated as a function of three significant process parameters, namely gate length, gate oxide thickness, and halo dose. The comparison is made with three different values of gate-to-source/drain overlap length namely 5 nm, 0 nm, and -5 nm and at two different leakage currents of 10 nA and 100 nA. The Worst-Case-Analysis approach is used to study the inverter delay fluctuations at the process corners. The drive current of the device for device robustness and stage delay of an inverter for circuit robustness are taken as performance metrics. The design trade-off between performance and variability is demonstrated both at the device level and circuit level. It is shown that larger overlap length leads to better performance, while smaller overlap length results in better variability. Performance trades with variability as overlap length is varied. An optimal value of overlap length of 0 nm is recommended at 65 nm gate length, for a reasonable combination of performance and variability.
Resumo:
The experimental implementation of a quantum algorithm requires the decomposition of unitary operators. Here we treat unitary-operator decomposition as an optimization problem, and use a genetic algorithm-a global-optimization method inspired by nature's evolutionary process-for operator decomposition. We apply this method to NMR quantum information processing, and find a probabilistic way of performing universal quantum computation using global hard pulses. We also demonstrate the efficient creation of the singlet state (a special type of Bell state) directly from thermal equilibrium, using an optimum sequence of pulses.
Resumo:
We consider a complex, additive, white Gaussian noise channel with flat fading. We study its diversity order vs transmission rate for some known power allocation schemes. The capacity region is divided into three regions. For one power allocation scheme, the diversity order is exponential throughout the capacity region. For selective channel inversion (SCI) scheme, the diversity order is exponential in low and high rate region but polynomial in mid rate region. For fast fading case we also provide a new upper bound on block error probability and a power allocation scheme that minimizes it. The diversity order behaviour of this scheme is same as for SCI but provides lower BER than the other policies.