105 resultados para BENCHMARK
Resumo:
In this work, we investigate the intrinsic limits of subthreshold slope in a dual gated bilayer graphene transistor using a coupled self-consistent Poisson-bandstructure solver. We benchmark the solver by matching the bias dependent band gap results obtained from the solver against published experimental data. We show that the intrinsic bias dependence of the electronic structure and the self-consistent electrostatics limit the subthreshold slope obtained in such a transistor well above the Boltzmann limit of 60 mV/decade at room temperature, but much below the results experimentally shown till date, indicating room for technological improvement of bilayer graphene.
Resumo:
A new performance metric, Peak-Error Ratio (PER) has been presented to benchmark the performance of a class of neuron circuits to realize neuron activation function (NAF) and its derivative (DNAF). Neuron circuits, biased in subthreshold region, based on the asymmetric cross-coupled differential pair configuration and conventional configuration of applying small external offset voltage at the input have been compared on the basis of PER. It is shown that the technique of using transistor asymmetry in a cross-coupled differential pair performs on-par with that of applying external offset voltage. The neuron circuits have been experimentally prototyped and characterized as a proof of concept on the 1.5 mu m AMI technology.
Resumo:
A new performance metric, Peak-Error Ratio (PER) has been presented to benchmark the performance of a class of neuron circuits to realize neuron activation function (NAF) and its derivative (DNAF). Neuron circuits, biased in subthreshold region, based on the asymmetric cross-coupled differential pair configuration and conventional configuration of applying small external offset voltage at the input have been compared on the basis of PER. It is shown that the technique of using transistor asymmetry in a cross-coupled differential pair performs on-par with that of applying external offset voltage. The neuron circuits have been experimentally prototyped and characterized as a proof of concept on the 1.5 mu m AMI technology.
Resumo:
Partition of unity methods, such as the extended finite element method, allows discontinuities to be simulated independently of the mesh (Int. J. Numer. Meth. Engng. 1999; 45:601-620). This eliminates the need for the mesh to be aligned with the discontinuity or cumbersome re-meshing, as the discontinuity evolves. However, to compute the stiffness matrix of the elements intersected by the discontinuity, a subdivision of the elements into quadrature subcells aligned with the discontinuity is commonly adopted. In this paper, we use a simple integration technique, proposed for polygonal domains (Int. J. Nuttier Meth. Engng 2009; 80(1):103-134. DOI: 10.1002/nme.2589) to suppress the need for element subdivision. Numerical results presented for a few benchmark problems in the context of linear elastic fracture mechanics and a multi-material problem show that the proposed method yields accurate results. Owing to its simplicity, the proposed integration technique can be easily integrated in any existing code. Copyright (C) 2010 John Wiley & Sons, Ltd.
Resumo:
The Integrated Force Method (IFM) is a novel matrix formulation developed for analyzing the civil, mechanical and aerospace engineering structures. In this method all independent/internal forces are treated as unknown variables which are calculated by simultaneously imposing equations of equilibrium and compatibility conditions. This paper presents a new 12-node serendipity quadrilateral plate bending element MQP12 for the analysis of thin and thick plate problems using IFM. The Mindlin-Reissner plate theory has been employed in the formulation which accounts the effect of shear deformation. The performance of this new element with respect to accuracy and convergence is studied by analyzing many standard benchmark plate bending problems. The results of the new element MQP12 are compared with those of displacement-based 12-node plate bending elements available in the literature. The results are also compared with exact solutions. The new element MQP12 is free from shear locking and performs excellent for both thin and moderately thick plate bending situations.
Resumo:
An efficient algorithm within the finite deformation framework is developed for finite element implementation of a recently proposed isotropic, Mohr-Coulomb type material model, which captures the elastic-viscoplastic, pressure sensitive and plastically dilatant response of bulk metallic glasses. The constitutive equations are first reformulated and implemented using an implicit numerical integration procedure based on the backward Euler method. The resulting system of nonlinear algebraic equations is solved by the Newton-Raphson procedure. This is achieved by developing the principal space return mapping technique for the present model which involves simultaneous shearing and dilatation on multiple potential slip systems. The complete stress update algorithm is presented and the expressions for viscoplastic consistent tangent moduli are derived. The stress update scheme and the viscoplastic consistent tangent are implemented in the commercial finite element code ABAQUS/Standard. The accuracy and performance of the numerical implementation are verified by considering several benchmark examples, which includes a simulation of multiple shear bands in a 3D prismatic bar under uniaxial compression.
Resumo:
In this work, we present a new monolithic strategy for solving fluid-structure interaction problems involving incompressible fluids, within the context of the finite element method. This strategy, similar to the continuum dynamics, conserves certain properties, and thus provides a rational basis for the design of the time-stepping strategy; detailed proofs of the conservation of these properties are provided. The proposed algorithm works with displacement and velocity variables for the structure and fluid, respectively, and introduces no new variables to enforce velocity or traction continuity. Any existing structural dynamics algorithm can be used without change in the proposed method. Use of the exact tangent stiffness matrix ensures that the algorithm converges quadratically within each time step. An analytical solution is presented for one of the benchmark problems used in the literature, namely, the piston problem. A number of benchmark problems including problems involving free surfaces such as sloshing and the breaking dam problem are used to demonstrate the good performance of the proposed method. Copyright (C) 2010 John Wiley & Sons, Ltd.
Resumo:
his paper addresses the problem of minimizing the number of columns with superdiagonal nonzeroes (viz., spiked columns) in a square, nonsingular linear system of equations which is to be solved by Gaussian elimination. The exact focus is on a class of min-spike heuristics in which the rows and columns of the coefficient matrix are first permuted to block lower-triangular form. Subsequently, the number of spiked columns in each irreducible block and their heights above the diagonal are minimized heuristically. We show that ifevery column in an irreducible block has exactly two nonzeroes, i.e., is a doubleton, then there is exactly one spiked column. Further, if there is at least one non-doubleton column, there isalways an optimal permutation of rows and columns under whichnone of the doubleton columns are spiked. An analysis of a few benchmark linear programs suggests that singleton and doubleton columns can abound in practice. Hence, it appears that the results of this paper can be practically useful. In the rest of the paper, we develop a polynomial-time min-spike heuristic based on the above results and on a graph-theoretic interpretation of doubleton columns.
Resumo:
We propose a family of 3D versions of a smooth finite element method (Sunilkumar and Roy 2010), wherein the globally smooth shape functions are derivable through the condition of polynomial reproduction with the tetrahedral B-splines (DMS-splines) or tensor-product forms of triangular B-splines and ID NURBS bases acting as the kernel functions. While the domain decomposition is accomplished through tetrahedral or triangular prism elements, an additional requirement here is an appropriate generation of knotclouds around the element vertices or corners. The possibility of sensitive dependence of numerical solutions to the placements of knotclouds is largely arrested by enforcing the condition of polynomial reproduction whilst deriving the shape functions. Nevertheless, given the higher complexity in forming the knotclouds for tetrahedral elements especially when higher demand is placed on the order of continuity of the shape functions across inter-element boundaries, we presently emphasize an exploration of the triangular prism based formulation in the context of several benchmark problems of interest in linear solid mechanics. In the absence of a more rigorous study on the convergence analyses, the numerical exercise, reported herein, helps establish the method as one of remarkable accuracy and robust performance against numerical ill-conditioning (such as locking of different kinds) vis-a-vis the conventional FEM.
Resumo:
The performance of a program will ultimately be limited by its serial (scalar) portion, as pointed out by Amdahl′s Law. Reported studies thus far of instruction-level parallelism have mixed data-parallel program portions with scalar program portions, often leading to contradictory and controversial results. We report an instruction-level behavioral characterization of scalar code containing minimal data-parallelism, extracted from highly vectorized programs of the PERFECT benchmark suite running on a Cray Y-MP system. We classify scalar basic blocks according to their instruction mix, characterize the data dependencies seen in each class, and, as a first step, measure the maximum intrablock instruction-level parallelism available. We observe skewed rather than balanced instruction distributions in scalar code and in individual basic block classes of scalar code; nonuniform distribution of parallelism across instruction classes; and, as expected, limited available intrablock parallelism. We identify frequently occurring data-dependence patterns and discuss new instructions to reduce latency. Toward effective scalar hardware, we study latency-pipelining trade-offs and restricted multiple instruction issue mechanisms.
Resumo:
We propose a novel formulation of the points-to analysis as a system of linear equations. With this, the efficiency of the points-to analysis can be significantly improved by leveraging the advances in solution procedures for solving the systems of linear equations. However, such a formulation is non-trivial and becomes challenging due to various facts, namely, multiple pointer indirections, address-of operators and multiple assignments to the same variable. Further, the problem is exacerbated by the need to keep the transformed equations linear. Despite this, we successfully model all the pointer operations. We propose a novel inclusion-based context-sensitive points-to analysis algorithm based on prime factorization, which can model all the pointer operations. Experimental evaluation on SPEC 2000 benchmarks and two large open source programs reveals that our approach is competitive to the state-of-the-art algorithms. With an average memory requirement of mere 21MB, our context-sensitive points-to analysis algorithm analyzes each benchmark in 55 seconds on an average.
Resumo:
Random Access Scan, which addresses individual flip-flops in a design using a memory array like row and column decoder architecture, has recently attracted widespread attention, due to its potential for lower test application time, test data volume and test power dissipation when compared to traditional Serial Scan. This is because typically only a very limited number of random ``care'' bits in a test response need be modified to create the next test vector. Unlike traditional scan, most flip-flops need not be updated. Test application efficiency can be further improved by organizing the access by word instead of by bit. In this paper we present a new decoder structure that takes advantage of basis vectors and linear algebra to further significantly optimize test application in RAS by performing the write operations on multiple bits consecutively. Simulations performed on benchmark circuits show an average of 2-3 times speed up in test write time compared to conventional RAS.
Resumo:
We describe a compiler for the Flat Concurrent Prolog language on a message passing multiprocessor architecture. This compiler permits symbolic and declarative programming in the syntax of Guarded Horn Rules, The implementation has been verified and tested on the 64-node PARAM parallel computer developed by C-DAC (Centre for the Development of Advanced Computing, India), Flat Concurrent Prolog (FCP) is a logic programming language designed for concurrent programming and parallel execution, It is a process oriented language, which embodies dataflow synchronization and guarded-command as its basic control mechanisms. An identical algorithm is executed on every processor in the network, We assume regular network topologies like mesh, ring, etc, Each node has a local memory, The algorithm comprises of two important parts: reduction and communication, The most difficult task is to integrate the solutions of problems that arise in the implementation in a coherent and efficient manner. We have tested the efficacy of the compiler on various benchmark problems of the ICOT project that have been reported in the recent book by Evan Tick, These problems include Quicksort, 8-queens, and Prime Number Generation, The results of the preliminary tests are favourable, We are currently examining issues like indexing and load balancing to further optimize our compiler.
Resumo:
Genetic Algorithms are robust search and optimization techniques. A Genetic Algorithm based approach for determining the optimal input distributions for generating random test vectors is proposed in the paper. A cost function based on the COP testability measure for determining the efficacy of the input distributions is discussed, A brief overview of Genetic Algorithms (GAs) and the specific details of our implementation are described. Experimental results based on ISCAS-85 benchmark circuits are presented. The performance pf our GA-based approach is compared with previous results. While the GA generates more efficient input distributions than the previous methods which are based on gradient descent search, the overheads of the GA in computing the input distributions are larger. To account for the relatively quick convergence of the gradient descent methods, we analyze the landscape of the COP-based cost function. We prove that the cost function is unimodal in the search space. This feature makes the cost function amenable to optimization by gradient-descent techniques as compared to random search methods such as Genetic Algorithms.
Resumo:
In this paper we develop an analytical heat transfer model, which is capable of analyzing cyclic melting and solidification processes of a phase change material used in the context of electronics cooling systems. The model is essentially based on conduction heat transfer, with treatments for convection and radiation embedded inside. The whole solution domain is first divided into two main sub-domains, namely, the melting sub-domain and the solidification sub-domain. Each sub-domain is then analyzed for a number of temporal regimes. Accordingly, analytical solutions for temperature distribution within each subdomain are formulated either using a semi-infinity consideration, or employing a method of quasi-steady state, depending on the applicability. The solution modules are subsequently united, leading to a closed-form solution for the entire problem. The analytical solutions are then compared with experimental and numerical solutions for a benchmark problem quoted in the literature, and excellent agreements can be observed.