22 resultados para Architecture and state
em Indian Institute of Science - Bangalore - Índia
Resumo:
Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. With the requirement to process packets at line rates, high-performance routers need to forward millions of packets every second with each packet needing up to seven memory accesses. Earlier work shows that a single cache for the nodes of a trie can reduce the number of external memory accesses. It is observed that the locality characteristics of the level-one nodes of a trie are significantly different from those of lower level nodes. Hence, we propose a heterogeneously segmented cache architecture (HSCA) which uses separate caches for level-one and lower level nodes, each with carefully chosen sizes. Besides reducing misses, segmenting the cache allows us to focus on optimizing the more frequently accessed level-one node segment. We find that due to the nonuniform distribution of nodes among cache sets, the level-one nodes cache is susceptible t high conflict misses. We reduce conflict misses by introducing a novel two-level mapping-based cache placement framework. We also propose an elegant way to fit the modified placement function into the cache organization with minimal increase in access time. Further, we propose an attribute preserving trace generation methodology which emulates real traces and can generate traces with varying locality. Performanc results reveal that our HSCA scheme results in a 32 percent speedup in average memory access time over a unified nodes cache. Also, HSC outperforms IHARC, a cache for lookup results, with as high as a 10-fold speedup in average memory access time. Two-level mappin further enhances the performance of the base HSCA by up to 13 percent leading to an overall improvement of up to 40 percent over the unified scheme.
Resumo:
The influence of strain rate and state-of-stress on the formation of ferrite in stainless steel type AISI 304L, 304 and 304 as-cast, during hot working has been studied. Compression and torsion tests were conducted in the temperature range 1100 to 1250 degrees C and strain rate range 0.001 to 100 s(-1) on these materials, Ferrite formation occurs during deformation at temperatures above 1150 degrees C and strain rates above 10 s(-1), in stainless steel type AISI 304L and 304. The tendency for the formation of ferrite is more in as-cast 304 than in wrought 304, In as-cast 304 the ferrite forms at lower temperatures and strain rates, The tendency for the ferrite formation is more in torsion than in compression.
Resumo:
1. Resilience-based approaches are increasingly being called upon to inform ecosystem management, particularly in arid and semi-arid regions. This requires management frameworks that can assess ecosystem dynamics, both within and between alternative states, at relevant time scales. 2. We analysed long-term vegetation records from two representative sites in the North American sagebrush-steppe ecosystem, spanning nine decades, to determine if empirical patterns were consistent with resilience theory, and to determine if cheatgrass Bromus tectorum invasion led to thresholds as currently envisioned by expert-based state-and-transition models (STM). These data span the entire history of cheatgrass invasion at these sites and provide a unique opportunity to assess the impacts of biotic invasion on ecosystem resilience. 3. We used univariate and multivariate statistical tools to identify unique plant communities and document the magnitude, frequency and directionality of community transitions through time. Community transitions were characterized by 37-47% dissimilarity in species composition, they were not evenly distributed through time, their frequency was not correlated with precipitation, and they could not be readily attributed to fire or grazing. Instead, at both sites, the majority of community transitions occurred within an 8-10year period of increasing cheatgrass density, became infrequent after cheatgrass density peaked, and thereafter transition frequency declined. 4. Greater cheatgrass density, replacement of native species and indication of asymmetry in community transitions suggest that thresholds may have been exceeded in response to cheatgrass invasion at one site (more arid), but not at the other site (less arid). Asymmetry in the direction of community transitions also identified communities that were at-risk' of cheatgrass invasion, as well as potential restoration pathways for recovery of pre-invasion states. 5. Synthesis and applications. These results illustrate the complexities associated with threshold identification, and indicate that criteria describing the frequency, magnitude, directionality and temporal scale of community transitions may provide greater insight into resilience theory and its application for ecosystem management. These criteria are likely to vary across biogeographic regions that are susceptible to cheatgrass invasion, and necessitate more in-depth assessments of thresholds and alternative states, than currently available.
Resumo:
In this paper, using the intrinsically disordered oncoprotein Myc as an example, we present a mathematical model to help explain how protein oscillatory dynamics can influence state switching. Earlier studies have demonstrated that, while Myc overexpression can facilitate state switching and transform a normal cell into a cancer phenotype, its downregulation can reverse state-switching. A fundamental aspect of the model is that a Myc threshold determines cell fate in cells expressing p53. We demonstrate that a non-cooperative positive feedback loop coupled with Myc sequestration at multiple binding sites can generate bistable Myc levels. Normal quiescent cells with Myc levels below the threshold can respond to mitogenic signals to activate the cyclin/cdk oscillator for limited cell divisions but the p53/Mdm2 oscillator remains nonfunctional. In response to stress, the p53/Mdm2 oscillator is activated in pulses that are critical to DNA repair. But if stress causes Myc levels to cross the threshold, Myc inactivates the p53/Mdm2 oscillator, abrogates p53 pulses, and pushes the cyclin/cdk oscillator into overdrive sustaining unchecked proliferation seen in cancer. However, if Myc is downregulated, the cyclin/cdk oscillator is inactivated and the p53/Mdm2 oscillator is reset and the cancer phenotype is reversed. (C) 2015 Elsevier Ltd. All rights reserved.
Resumo:
The effects of contact architecture, graphene defect density and metal-semiconductor work function difference on the resistivity of metal-graphene contacts have been investigated. An architecture with metal on the bottom of graphene is found to yield resistivities that are lower, by a factor of four, and most consistent as compared to metal on top of graphene. Growth defects in graphene film were found to further reduce resistivity by a factor of two. Using a combination of method and metal used, the contact resistivity of graphene has been decreased by a factor of 10 to 1200. +/-. 250 Omega mu m using palladium as the contact metal. While the improved consistency is due to the metal being able to contact uncontaminated graphene in the metal on the bottom architecture, lower contact resistivities observed on defective graphene with the same metal are attributed to the increased number of modes of quantum transport in the channel.
Resumo:
This paper proposes a Petri net model for a commercial network processor (Intel iXP architecture) which is a multithreaded multiprocessor architecture. We consider and model three different applications viz., IPv4 forwarding, network address translation, and IP security running on IXP 2400/2850. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the Intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. We conduct a detailed performance evaluation, identify the bottleneck resource, and propose a few architectural extensions and evaluate them in detail.
Resumo:
This paper presents the architecture and the VHDL design of an integer 2-D DCT used in the H.264/AVC. The 2-D DCT computation is performed by exploiting it’s orthogonality and separability property. The symmetry of the forward and inverse transform is used in this implementation. To reduce the computation overhead for the addition, subtraction and multiplication operations, we analyze the suitability of carry-free position independent residue number system (RNS) for the implementation of 2-D DCT. The implementation has been carried out in VHDL for Altera FPGA. We used the negative number representation in RNS, bit width analysis of the transforms and dedicated registers present in the Logic element of the FPGA to optimize the area. The complexity and efficiency analysis show that the proposed architecture could provide higher through-put.
Resumo:
A theoretical study on the propagation of plane waves in the presence of a hot mean flow in a uniform pipe is presented. The temperature variation in the pipe is taken to be a linear temperature gradient along the axis. The theoretical studies include the formulation of a wave equation based on continuity, momentum, and state equation, and derivation of a general four-pole matrix, which is shown to yield the well-known transfer matrices for several other simpler cases.
Resumo:
Run-time interoperability between different applications based on H.264/AVC is an emerging need in networked infotainment, where media delivery must match the desired resolution and quality of the end terminals. In this paper, we describe the architecture and design of a polymorphic ASIC to support this. The H.264 decoding flow is partitioned into modules, such that the polymorphic ASIC meets the design goals of low-power, low-area, high flexibility, high throughput and fast interoperability between different profiles and levels of H.264. We demonstrate the idea with a multi-mode decoder that can decode baseline, main and high profile H.264 streams and can interoperate at run.time across these profiles. The decoder is capable of processing frame sizes of up to 1024 times 768 at 30 fps. The design synthesized with UMC 0.13 mum technology, occupies 250 k gates and runs at 100 MHz.
Resumo:
REDEFINE is a reconfigurable SoC architecture that provides a unique platform for high performance and low power computing by exploiting the synergistic interaction between coarse grain dynamic dataflow model of computation (to expose abundant parallelism in applications) and runtime composition of efficient compute structures (on the reconfigurable computation resources). We propose and study the throttling of execution in REDEFINE to maximize the architecture efficiency. A feature specific fast hybrid (mixed level) simulation framework for early in design phase study is developed and implemented to make the huge design space exploration practical. We do performance modeling in terms of selection of important performance criteria, ranking of the explored throttling schemes and investigate effectiveness of the design space exploration using statistical hypothesis testing. We find throttling schemes which give appreciable (24.8%) overall performance gain in the architecture and 37% resource usage gain in the throttling unit simultaneously.
Resumo:
The design of a dual-DSP microprocessor system and its application for parallel FFT and two-dimensional convolution are explained. The system is based on a master-salve configuration. Two ADSP-2101s are configured as slave processors and a PC/AT serves as the master. The master serves as a control processor to transfer the program code and data to the DSPs. The system architecture and the algorithms for the two applications, viz. FFT and two-dimensional convolutions, are discussed.
Resumo:
This article aims at identifying the research issues and challenges that need to be addressed to achieve sustainable transportation system for Indian cities. The same is achieved by understanding the current system and trends of urbanization, motorization and modal shares in India; and their impact on mobility and safety (the two basic goals of transportation) as well as environment. Further, the article explores the efforts by the central and state governments in India to address the sustainability issues, and the problems and issues over and above the present efforts to achieve sustainability. The article concludes by summarizing the research issues with respect to planning/modelling, non-motorized transport, public transport, driver behaviour and road safety and traffic management. It is expected that these research issues will provide potential directions for carrying out further research aimed at achieving sustainable transport system for Indian cities.
Resumo:
The prevalent virtualization technologies provide QoS support within the software layers of the virtual machine monitor(VMM) or the operating system of the virtual machine(VM). The QoS features are mostly provided as extensions to the existing software used for accessing the I/O device because of which the applications sharing the I/O device experience loss of performance due to crosstalk effects or usable bandwidth. In this paper we examine the NIC sharing effects across VMs on a Xen virtualized server and present an alternate paradigm that improves the shared bandwidth and reduces the crosstalk effect on the VMs. We implement the proposed hardwaresoftware changes in a layered queuing network (LQN) model and use simulation techniques to evaluate the architecture. We find that simple changes in the device architecture and associated system software lead to application throughput improvement of up to 60%. The architecture also enables finer QoS controls at device level and increases the scalability of device sharing across multiple virtual machines. We find that the performance improvement derived using LQN model is comparable to that reported by similar but real implementations.
Resumo:
In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.
Resumo:
Frequent accesses to the register file make it one of the major sources of energy consumption in ILP architectures. The large number of functional units connected to a large unified register file in VLIW architectures make power dissipation in the register file even worse because of the need for a large number of ports. High power dissipation in a relatively smaller area occupied by a register file leads to a high power density in the register file and makes it one of the prime hot-spots. This makes it highly susceptible to the possibility of a catastrophic heatstroke. This in turn impacts the performance and cost because of the need for periodic cool down and sophisticated packaging and cooling techniques respectively. Clustered VLIW architectures partition the register file among clusters of functional units and reduce the number of ports required thereby reducing the power dissipation. However, we observe that the aggregate accesses to register files in clustered VLIW architectures (and associated energy consumption) become very high compared to the centralized VLIW architectures and this can be attributed to a large number of explicit inter-cluster communications. Snooping based clustered VLIW architectures provide very limited but very fast way of inter-cluster communication by allowing some of the functional units to directly read some of the operands from the register file of some of the other clusters. In this paper, we propose instruction scheduling algorithms that exploit the limited snooping capability to reduce the register file energy consumption on an average by 12% and 18% and improve the overall performance by 5% and 11% for a 2-clustered and a 4-clustered machine respectively, over an earlier state-of-the-art clustered scheduling algorithm when evaluated in the context of snooping based clustered VLIW architectures.