105 resultados para scalable

em Indian Institute of Science - Bangalore - Índia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have implemented a Firewall with this architecture in reconflgurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results in both speed and area improvement when it is implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields. High throughput classification invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly in terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for the worst case packet size. The Firewall rule update involves only memory re-initialization in software without any hardware change.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have Implemented a Firewall with this architecture in reconfigurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using, our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results In both speed and area Improvement when It is Implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields.High throughput classification Invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly In terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware Implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for file worst case packet size. The Firewall rule update Involves only memory re-initialiization in software without any hardware change.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modern database systems incorporate a query optimizer to identify the most efficient "query execution plan" for executing the declarative SQL queries submitted by users. A dynamic-programming-based approach is used to exhaustively enumerate the combinatorially large search space of plan alternatives and, using a cost model, to identify the optimal choice. While dynamic programming (DP) works very well for moderately complex queries with up to around a dozen base relations, it usually fails to scale beyond this stage due to its inherent exponential space and time complexity. Therefore, DP becomes practically infeasible for complex queries with a large number of base relations, such as those found in current decision-support and enterprise management applications. To address the above problem, a variety of approaches have been proposed in the literature. Some completely jettison the DP approach and resort to alternative techniques such as randomized algorithms, whereas others have retained DP by using heuristics to prune the search space to computationally manageable levels. In the latter class, a well-known strategy is "iterative dynamic programming" (IDP) wherein DP is employed bottom-up until it hits its feasibility limit, and then iteratively restarted with a significantly reduced subset of the execution plans currently under consideration. The experimental evaluation of IDP indicated that by appropriate choice of algorithmic parameters, it was possible to almost always obtain "good" (within a factor of twice of the optimal) plans, and in the few remaining cases, mostly "acceptable" (within an order of magnitude of the optimal) plans, and rarely, a "bad" plan. While IDP is certainly an innovative and powerful approach, we have found that there are a variety of common query frameworks wherein it can fail to consistently produce good plans, let alone the optimal choice. This is especially so when star or clique components are present, increasing the complexity of th- e join graphs. Worse, this shortcoming is exacerbated when the number of relations participating in the query is scaled upwards.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

RECONNECT is a Network-on-Chip using a honeycomb topology. In this paper we focus on properties of general rules applicable to a variety of routing algorithms for the NoC which take into account the missing links of the honeycomb topology when compared to a mesh. We also extend the original proposal [5] and show a method to insert and extract data to and from the network. Access Routers at the boundary of the execution fabric establish connections to multiple periphery modules and create a torus to decrease the node distances. Our approach is scalable and ensures homogeneity among the compute elements in the NoC. We synthesized and evaluated the proposed enhancement in terms of power dissipation and area. Our results indicate that the impact of necessary alterations to the fabric is negligible and effects the data transfer between the fabric and the periphery only marginally.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this letter, we propose the design and simulation study of a novel transistor, called HFinFET, which is a hybrid of an HEMT and a FinFET, to obtain excellent performance and good OFF-state control. Followed by the description of the design, 3-D device simulation has been performed to predict the characteristics of the device. The device has been benchmarked against published state of the art HEMT as well as planar and nonplanar Si n-MOSFET data of comparable gate length using standard benchmarking techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we describe an efficient coordinated-checkpointing and recovery algorithm which can work even when the channels are assumed to be non-FIFO, and messages may be lost. Nodes are assumed to be autonomous, and they do not block while taking checkpoints. Based on the local conditions, any process can request the previous coordinator for the 'permission' to initiate a new checkpoint. Allowing multiple initiators of checkpoints avoids the bottleneck associated with a single initiator, but the algorithm permits only a single instance of checkpointing process at any given time, thus reducing much of the overhead associated with multiple initiators of distributed algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper discusses a method for scaling SVM with Gaussian kernel function to handle large data sets by using a selective sampling strategy for the training set. It employs a scalable hierarchical clustering algorithm to construct cluster indexing structures of the training data in the kernel induced feature space. These are then used for selective sampling of the training data for SVM to impart scalability to the training process. Empirical studies made on real world data sets show that the proposed strategy performs well on large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we develop a multithreaded VLSI processor linear array architecture to render complex environments based on the radiosity approach. The processing elements are identical and multithreaded. They work in Single Program Multiple Data (SPMD) mode. A new algorithm to do the radiosity computations based on the progressive refinement approach[2] is proposed. Simulation results indicate that the architecture is latency tolerant and scalable. It is shown that a linear array of 128 uni-threaded processing elements sustains a throughput close to 0.4 million patches/sec.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The study reports the first indication of a lyotropic liquid crystalline phase of an aqueous solution of polysaccharide xanthan gum, as a physical parameter dependent scalable and reversible weak alignment medium, for enantiodiscrimination of water soluble chiral molecules.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With extensive use of dynamic voltage scaling (DVS) there is increasing need for voltage scalable models. Similarly, leakage being very sensitive to temperature motivates the need for a temperature scalable model as well. We characterize standard cell libraries for statistical leakage analysis based on models for transistor stacks. Modeling stacks has the advantage of using a single model across many gates there by reducing the number of models that need to be characterized. Our experiments on 15 different gates show that we needed only 23 models to predict the leakage across 126 input vector combinations. We investigate the use of neural networks for the combined PVT model, for the stacks, which can capture the effect of inter die, intra gate variations, supply voltage(0.6-1.2 V) and temperature (0 - 100degC) on leakage. Results show that neural network based stack models can predict the PDF of leakage current across supply voltage and temperature accurately with the average error in mean being less than 2% and that in standard deviation being less than 5% across a range of voltage, temperature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate the feasibility of developing a comprehensive gate delay and slew models which incorporates output load, input edge slew, supply voltage, temperature, global process variations and local process variations all in the same model. We find that the standard polynomial models cannot handle such a large heterogeneous set of input variables. We instead use neural networks, which are well known for their ability to approximate any arbitrary continuous function. Our initial experiments with a small subset of standard cell gates of an industrial 65 nm library show promising results with error in mean less than 1%, error in standard deviation less than 3% and maximum error less than 11% as compared to SPICE for models covering 0.9- 1.1 V of supply, -40degC to 125degC of temperature, load, slew and global and local process parameters. Enhancing the conventional libraries to be voltage and temperature scalable with similar accuracy requires on an average 4x more SPICE characterization runs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate the feasibility of developing a comprehensive gate delay and slew models which incorporates output load, input edge slew, supply voltage, temperature, global process variations and local process variations all in the same model. We find that the standard polynomial models cannot handle such a large heterogeneous set of input variables. We instead use neural networks, which are well known for their ability to approximate any arbitrary continuous function. Our initial experiments with a small subset of standard cell gates of an industrial 65 nm library show promising results with error in mean less than 1%, error in standard deviation less than 3% and maximum error less than 11% as compared to SPICE for models covering 0.9- 1.1 V of supply, -40degC to 125degC of temperature, load, slew and global and local process parameters. Enhancing the conventional libraries to be voltage and temperature scalable with similar accuracy requires on an average 4x more SPICE characterization runs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Context-sensitive points-to analysis is critical for several program optimizations. However, as the number of contexts grows exponentially, storage requirements for the analysis increase tremendously for large programs, making the analysis non-scalable. We propose a scalable flow-insensitive context-sensitive inclusion-based points-to analysis that uses a specially designed multi-dimensional bloom filter to store the points-to information. Two key observations motivate our proposal: (i) points-to information (between pointer-object and between pointer-pointer) is sparse, and (ii) moving from an exact to an approximate representation of points-to information only leads to reduced precision without affecting correctness of the (may-points-to) analysis. By using an approximate representation a multi-dimensional bloom filter can significantly reduce the memory requirements with a probabilistic bound on loss in precision. Experimental evaluation on SPEC 2000 benchmarks and two large open source programs reveals that with an average storage requirement of 4MB, our approach achieves almost the same precision (98.6%) as the exact implementation. By increasing the average memory to 27MB, it achieves precision upto 99.7% for these benchmarks. Using Mod/Ref analysis as the client, we find that the client analysis is not affected that often even when there is some loss of precision in the points-to representation. We find that the NoModRef percentage is within 2% of the exact analysis while requiring 4MB (maximum 15MB) memory and less than 4 minutes on average for the points-to analysis. Another major advantage of our technique is that it allows to trade off precision for memory usage of the analysis.