39 resultados para High Performance


Relevância:

70.00% 70.00%

Publicador:

Resumo:

High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have Implemented a Firewall with this architecture in reconfigurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using, our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results In both speed and area Improvement when It is Implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields.High throughput classification Invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly In terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware Implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for file worst case packet size. The Firewall rule update Involves only memory re-initialiization in software without any hardware change.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conducting and semiconducting polymers are important materials in the development of printed, flexible, large-area electronics such as flat-panel displays and photovoltaic cells. There has been rapid progress in developing conjugated polymers with high transport mobility required for high-performance field-effect transistors (FETs), beginning(1) with mobilities around 10(-4) cm(2) V-1 s(-1) to a recent report(2) of 1 cm(2) V-1 s(-1) for poly(2,5-bis(3-tetradecylthiophen-2-yl) thieno[3,2-b] thiophene) (PBTTT). Here, the electrical properties of PBTTT are studied at high charge densities both as the semiconductor layer in FETs and in electrochemically doped films to determine the transport mechanism. We show that data obtained using a wide range of parameters (temperature, gate-induced carrier density, source-drain voltage and doping level) scale onto the universal curve predicted for transport in the Luttinger liquid description of the one-dimensional `metal'.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires which leads to delay in execution and significantly high energy consumption.In this paper, we propose a new instruction scheduling algorithm that exploits scheduling slacks of instructions and communication slacks of data values together to achieve better energy-performance trade-offs for clustered architectures with heterogeneous interconnect. Our instruction scheduling algorithm achieves 35% and 40% reduction in communication energy, whereas the overall energy-delay product improves by 4.5% and 6.5% respectively for 2 cluster and 4 cluster machines with marginal increase (1.6% and 1.1%) in execution time. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Processor architects have a challenging task of evaluating a large design space consisting of several interacting parameters and optimizations. In order to assist architects in making crucial design decisions, we build linear regression models that relate Processor performance to micro-architecture parameters, using simulation based experiments. We obtain good approximate models using an iterative process in which Akaike's information criteria is used to extract a good linear model from a small set of simulations, and limited further simulation is guided by the model using D-optimal experimental designs. The iterative process is repeated until desired error bounds are achieved. We used this procedure to establish the relationship of the CPI performance response to 26 key micro-architectural parameters using a detailed cycle-by-cycle superscalar processor simulator The resulting models provide a significance ordering on all micro-architectural parameters and their interactions, and explain the performance variations of micro-architectural techniques.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Digest caches have been proposed as an effective method tospeed up packet classification in network processors. In this paper, weshow that the presence of a large number of small flows and a few largeflows in the Internet has an adverse impact on the performance of thesedigest caches. In the Internet, a few large flows transfer a majority ofthe packets whereas the contribution of several small flows to the totalnumber of packets transferred is small. In such a scenario, the LRUcache replacement policy, which gives maximum priority to the mostrecently accessed digest, tends to evict digests belonging to the few largeflows. We propose a new cache management algorithm called SaturatingPriority (SP) which aims at improving the performance of digest cachesin network processors by exploiting the disparity between the number offlows and the number of packets transferred. Our experimental resultsdemonstrate that SP performs better than the widely used LRU cachereplacement policy in size constrained caches. Further, we characterizethe misses experienced by flow identifiers in digest caches.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The paper presents an adaptive Fourier filtering technique and a relaying scheme based on a combination of a digital band-pass filter along with a three-sample algorithm, for applications in high-speed numerical distance protection. To enhance the performance of above-mentioned technique, a high-speed fault detector has been used. MATLAB based simulation studies show that the adaptive Fourier filtering technique provides fast tripping for near faults and security for farther faults. The digital relaying scheme based on a combination of digital band-pass filter along with three-sample data window algorithm also provides accurate and high-speed detection of faults. The paper also proposes a high performance 16-bit fixed point DSP (Texas Instruments TMS320LF2407A) processor based hardware scheme suitable for implementation of the above techniques. To evaluate the performance of the proposed relaying scheme under steady state and transient conditions, PC based menu driven relay test procedures are developed using National Instruments LabVIEW software. The test signals are generated in real time using LabVIEW compatible analog output modules. The results obtained from the simulation studies as well as hardware implementations are also presented.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data Prefetchers identify and make use of any regularity present in the history/training stream to predict future references and prefetch them into the cache. The training information used is typically the primary misses seen at a particular cache level, which is a filtered version of the accesses seen by the cache. In this work we demonstrate that extending the training information to include secondary misses and hits along with primary misses helps improve the performance of prefetchers. In addition to empirical evaluation, we use the information theoretic metric entropy, to quantify the regularity present in extended histories. Entropy measurements indicate that extended histories are more regular than the default primary miss only training stream. Entropy measurements also help corroborate our empirical findings. With extended histories, further benefits can be achieved by triggering prefetches during secondary misses also. In this paper we explore the design space of extended prefetch histories and alternative prefetch trigger points for delta correlation prefetchers. We observe that different prefetch schemes benefit to a different extent with extended histories and alternative trigger points. Also the best performing design point varies on a per-benchmark basis. To meet these requirements, we propose a simple adaptive scheme that identifies the best performing design point for a benchmark-prefetcher combination at runtime. In SPEC2000 benchmarks, using all the L2 accesses as history for prefetcher improves the performance in terms of both IPC and misses reduced over techniques that use only primary misses as history. The adaptive scheme improves the performance of CZone prefetcher over Baseline by 4.6% on an average. These performance gains are accompanied by a moderate reduction in the memory traffic requirements.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Hafnium dioxide (HfO2) films, deposited using electron beam evaporation, are optimized for high performance back-gated graphene transistors. Bilayer graphene is identified on HfO2/Si substrate using optical microscope and subsequently confirmed with Raman spectroscopy. Back-gated graphene transistor, with 32 nm thick HfO2 gate dielectric, has been fabricated with very high transconductance value of 60 mu S. From the hysteresis of the current-voltage characteristics, we estimate the trap density in HfO2 to be in the mid 10(11)/cm(2) range, comparable to SiO2.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The advent of a new class of high-mobility semiconducting polymers opens up a window to address fundamental issues in electrical transport mechanism such as transport between localized states versus extended state conduction. Here, we investigate the origin of the ultralow degree of disorder (E-a similar to 16 meV) and the ``bandlike'' negative temperature (T) coefficient of the field effect electron mobility: mu(e)(FET) (T) in a high performance (mu(e)(FET) > 2.5 cm(2) V-1 s(-1)) diketopyrrolopyrrole based semiconducting polymer. Models based on the framework of mobility edge with exponential density of states are invoked to explain the trends in transport. The temperature window over which the system demonstrates delocalized transport was tuned by a systematic introduction of disorder at the transport interface. Additionally, the Hall mobility (mu(e)(Hall)) extracted from Hall voltage measurements in these devices was found to be comparable to field effect mobility (mu(e)(FET)) in the high T bandlike regime. Comprehensive studies with different combinations of dielectrics and semiconductors demonstrate the effectiveness of rationale molecular design, which emphasizes uniform-energetic landscape and low reorganization energy.