443 resultados para Supercomputer


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The effectiveness of the last-level shared cache is crucial to the performance of a multi-core system. In this paper, we observe and make use of the DelinquentPC - Next-Use characteristic to improve shared cache performance. We propose a new PC-centric cache organization, NUcache, for the shared last level cache of multi-cores. NUcache logically partitions the associative ways of a cache set into MainWays and DeliWays. While all lines have access to the MainWays, only lines brought in by a subset of delinquent PCs, selected by a PC selection mechanism, are allowed to enter the DeliWays. The PC selection mechanism is an intelligent cost-benefit analysis based algorithm that utilizes Next-Use information to select the set of PCs that can maximize the hits experienced in DeliWays. Performance evaluation reveals that NUcache improves the performance over a baseline design by 9.6%, 30% and 33% respectively for dual, quad and eight core workloads comprised of SPEC benchmarks. We also show that NUcache is more effective than other well-known cache-partitioning algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A palindrome is a set of characters that reads the same forwards and backwards. Since the discovery of palindromic peptide sequences two decades ago, little effort has been made to understand its structural, functional and evolutionary significance. Therefore, in view of this, an algorithm has been developed to identify all perfect palindromes (excluding the palindromic subset and tandem repeats) in a single protein sequence. The proposed algorithm does not impose any restriction on the number of residues to be given in the input sequence. This avant-garde algorithm will aid in the identification of palindromic peptide sequences of varying lengths in a single protein sequence.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An operator-splitting finite element method for solving high-dimensional parabolic equations is presented. The stability and the error estimates are derived for the proposed numerical scheme. Furthermore, two variants of fully-practical operator-splitting finite element algorithms based on the quadrature points and the nodal points, respectively, are presented. Both the quadrature and the nodal point based operator-splitting algorithms are validated using a three-dimensional (3D) test problem. The numerical results obtained with the full 3D computations and the operator-split 2D + 1D computations are found to be in a good agreement with the analytical solution. Further, the optimal order of convergence is obtained in both variants of the operator-splitting algorithms. (C) 2012 Elsevier Inc. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Network Intrusion Detection Systems (NIDS) intercept the traffic at an organization's network periphery to thwart intrusion attempts. Signature-based NIDS compares the intercepted packets against its database of known vulnerabilities and malware signatures to detect such cyber attacks. These signatures are represented using Regular Expressions (REs) and strings. Regular Expressions, because of their higher expressive power, are preferred over simple strings to write these signatures. We present Cascaded Automata Architecture to perform memory efficient Regular Expression pattern matching using existing string matching solutions. The proposed architecture performs two stage Regular Expression pattern matching. We replace the substring and character class components of the Regular Expression with new symbols. We address the challenges involved in this approach. We augment the Word-based Automata, obtained from the re-written Regular Expressions, with counter-based states and length bound transitions to perform Regular Expression pattern matching. We evaluated our architecture on Regular Expressions taken from Snort rulesets. We were able to reduce the number of automata states between 50% to 85%. Additionally, we could reduce the number of transitions by a factor of 3 leading to further reduction in the memory requirements.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for LVCSR systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication.In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on a 1138 word vocabulary RM1 task using Sphinx 3.7 system show that, for a typical case the matrix multiplication approach leads to overall speedup of 46%. Both the low-rank approximation methods increase the speedup to around 60%, with the former method increasing the word error rate (WER) from 3.2% to 6.6%, while the latter increases the WER from 3.2% to 3.5%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Effects of dynamic contact angle models on the flow dynamics of an impinging droplet in sharp interface simulations are presented in this article. In the considered finite element scheme, the free surface is tracked using the arbitrary Lagrangian-Eulerian approach. The contact angle is incorporated into the model by replacing the curvature with the Laplace-Beltrami operator and integration by parts. Further, the Navier-slip with friction boundary condition is used to avoid stress singularities at the contact line. Our study demonstrates that the contact angle models have almost no influence on the flow dynamics of the non-wetting droplets. In computations of the wetting and partially wetting droplets, different contact angle models induce different flow dynamics, especially during recoiling. It is shown that a large value for the slip number has to be used in computations of the wetting and partially wetting droplets in order to reduce the effects of the contact angle models. Among all models, the equilibrium model is simple and easy to implement. Further, the equilibrium model also incorporates the contact angle hysteresis. Thus, the equilibrium contact angle model is preferred in sharp interface numerical schemes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ferric uptake regulator (Fur) is a transcriptional regulator controlling the expression of genes involved in iron homeostasis and plays an important role in pathogenesis. Fur-regulated sRNAs/CDSs were found to have upstream Fur Binding Sites (FBS). We have constructed a Positional Weight Matrix from 100 known FBS (19 nt) and tracked the `Orphan' FBSs. Possible Fur regulated sRNAs and CDSs were identified by comparing their genomic locations with the `Orphan' FBSs identified. Thirty-eight `novel' and all known Fur regulated sRNAs in nine proteobacteria were identified. In addition, we identified high scoring FBSs in the promoter regions of the 304 CDSs and 68 of them were involved in siderophore biosynthesis, iron-transporters, two-component system, starch/sugar metabolism, sulphur/methane metabolism, etc. The present study shows that the Fur regulator controls the expression of genes involved in diverse metabolic activities and it is not limited to iron metabolism alone. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We report novel resistor grid network based space cloth for application in single and multi layer radar absorbers. The space cloth is analyzed and relations are derived for the sheet resistance in terms of the resistor in the grid network. Design curves are drawn using MATLAB and the space cloth is analyzed using HFSS™ software in a Salisbury screen for S, C and X bands. Next, prediction and simulation results for a three layer Jaumann absorber using square grid resistor network with a Radar Cross Section Reduction (RCSR) of -15 dB over C, X and Ku bands is reported. The simulation results are encouraging and have led to the fabrication of prototype broadband radar absorber and experimental work is under progress.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Wheeler-Feynman (WF) absorber theory of radiation though no more of interest in explaining self interaction of an electron, can be very useful in today's research in small scale optical systems. The significance of the WF absorber is the use of time-symmetrical solution of Maxwell's equations as opposed to only the retarded solution. The radiative coupling of emitters to nano wires in the near field and change in their lifetimes due to small mode volume enclosures have been elucidated with the retarded solutions before. These solutions have also been shown to agree with quantum electrodynamics, thus allowing for classical electromagnetic approaches in such problems. It is here assumed that the radiative coupling of the emitter with a body is in proportion to its contribution to the classical force of radiative reaction as derived in the WF absorber theory. Representing such nano structures as a partial WF absorber acting on the emitter makes the computations considerably easier than conventional electromagnetic solutions for full boundary conditions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we present a fast learning neural network classifier for human action recognition. The proposed classifier is a fully complex-valued neural network with a single hidden layer. The neurons in the hidden layer employ the fully complex-valued hyperbolic secant as an activation function. The parameters of the hidden layer are chosen randomly and the output weights are estimated analytically as a minimum norm least square solution to a set of linear equations. The fast leaning fully complex-valued neural classifier is used for recognizing human actions accurately. Optical flow-based features extracted from the video sequences are utilized to recognize 10 different human actions. The feature vectors are computationally simple first order statistics of the optical flow vectors, obtained from coarse to fine rectangular patches centered around the object. The results indicate the superior performance of the complex-valued neural classifier for action recognition. The superior performance of the complex neural network for action recognition stems from the fact that motion, by nature, consists of two components, one along each of the axes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mobile WiMAX is a burgeoning network technology with diverse applications, one of them being used for VANETs. The performance metrics such as Mean Throughput and Packet Loss Ratio for the operations of VANETs adopting 802.16e are computed through simulation techniques. Next we evaluated the similar performance of VANETs employing 802.11p, also known as WAVE (Wireless Access in Vehicular Environment). The simulation model proposed is close to reality as we have generated mobility traces for both the cases using a traffic simulator (SUMO), and fed it into network simulator (NS2) based on their operations in a typical urban scenario for VANETs. In sequel, a VANET application called `Street Congestion Alert' is developed to assess the performances of these two technologies. For this application, TraCI is used for coupling SUMO and NS2 in a feedback loop to set up a realistic simulation scenario. Our inferences show that the Mobile WiMAX performs better than WAVE for larger network sizes.