350 resultados para Linear Algebra

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Numerical Linear Algebra (NLA) kernels are at the heart of all computational problems. These kernels require hardware acceleration for increased throughput. NLA Solvers for dense and sparse matrices differ in the way the matrices are stored and operated upon although they exhibit similar computational properties. While ASIC solutions for NLA Solvers can deliver high performance, they are not scalable, and hence are not commercially viable. In this paper, we show how NLA kernels can be accelerated on REDEFINE, a scalable runtime reconfigurable hardware platform. Compared to a software implementation, Direct Solver (Modified Faddeev's algorithm) on REDEFINE shows a 29X improvement on an average and Iterative Solver (Conjugate Gradient algorithm) shows a 15-20% improvement. We further show that solution on REDEFINE is scalable over larger problem sizes without any notable degradation in performance.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Gauss and Fourier have together provided us with the essential techniques for symbolic computation with linear arithmetic constraints over the reals and the rationals. These variable elimination techniques for linear constraints have particular significance in the context of constraint logic programming languages that have been developed in recent years. Variable elimination in linear equations (Guassian Elimination) is a fundamental technique in computational linear algebra and is therefore quite familiar to most of us. Elimination in linear inequalities (Fourier Elimination), on the other hand, is intimately related to polyhedral theory and aspects of linear programming that are not quite as familiar. In addition, the high complexity of elimination in inequalities has forces the consideration of intricate specializations of Fourier's original method. The intent of this survey article is to acquaint the reader with these connections and developments. The latter part of the article dwells on the thesis that variable elimination in linear constraints over the reals extends quite naturally to constraints in certain discrete domains.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We examine the following question: Suppose R is a principal ideal domain, and that F is an n × m matrix with elements in R, with n>m. When does there exist an m × n matrix G such that GF = Im, and such that certain prescribed minors of G equal zero? We show that there is a simple necessary condition for the existence of such a G, but that this condition is not sufficient in general. However, if the set of minors of G that are required to be zero has a certain pattern, then the condition is necessary as well as sufficient. We then show that the pattern mentioned above arises naturally in connection with the question of the existence of decentralized stabilizing controllers for a given plant. Hence our result allows us to derive an extremely simple proof of the fact that a necessary and sufficient condition for the existence of decentralized stabilizing controllers is the absence of unstable decentralized fixed modes, as well as to derive a very clean expression for these fixed modes. In addition to the application to decentralized stabilization, we believe that the result is of independent interest.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

CTRU, a public key cryptosystem was proposed by Gaborit, Ohler and Sole. It is analogue of NTRU, the ring of integers replaced by the ring of polynomials $\mathbb{F}_2[T]$ . It attracted attention as the attacks based on either LLL algorithm or the Chinese Remainder Theorem are avoided on it, which is most common on NTRU. In this paper we presents a polynomial-time algorithm that breaks CTRU for all recommended parameter choices that were derived to make CTRU secure against popov normal form attack. The paper shows if we ascertain the constraints for perfect decryption then either plaintext or private key can be achieved by polynomial time linear algebra attack.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an improved version of Dolezal's theorem, in the area of linear algebra with continuously parametrized elements. An extension of the theorem is also presented, and applications of these results to system theory are indicated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Diffuse optical tomographic image reconstruction uses advanced numerical models that are computationally costly to be implemented in the real time. The graphics processing units (GPUs) offer desktop massive parallelization that can accelerate these computations. An open-source GPU-accelerated linear algebra library package is used to compute the most intensive matrix-matrix calculations and matrix decompositions that are used in solving the system of linear equations. These open-source functions were integrated into the existing frequency-domain diffuse optical image reconstruction algorithms to evaluate the acceleration capability of the GPUs (NVIDIA Tesla C 1060) with increasing reconstruction problem sizes. These studies indicate that single precision computations are sufficient for diffuse optical tomographic image reconstruction. The acceleration per iteration can be up to 40, using GPUs compared to traditional CPUs in case of three-dimensional reconstruction, where the reconstruction problem is more underdetermined, making the GPUs more attractive in the clinical settings. The current limitation of these GPUs in the available onboard memory (4 GB) that restricts the reconstruction of a large set of optical parameters, more than 13, 377. (C) 2010 Society of Photo-Optical Instrumentation Engineers. DOI: 10.1117/1.3506216]

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Random Access Scan, which addresses individual flip-flops in a design using a memory array like row and column decoder architecture, has recently attracted widespread attention, due to its potential for lower test application time, test data volume and test power dissipation when compared to traditional Serial Scan. This is because typically only a very limited number of random ``care'' bits in a test response need be modified to create the next test vector. Unlike traditional scan, most flip-flops need not be updated. Test application efficiency can be further improved by organizing the access by word instead of by bit. In this paper we present a new decoder structure that takes advantage of basis vectors and linear algebra to further significantly optimize test application in RAS by performing the write operations on multiple bits consecutively. Simulations performed on benchmark circuits show an average of 2-3 times speed up in test write time compared to conventional RAS.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An analytical expression for the LL(T) decomposition for the Gaussian Toeplitz matrix with elements T(ij) = [1/(2-pi)1/2-sigma] exp[-(i - j)2/2-sigma-2] is derived. An exact expression for the determinant and bounds on the eigenvalues follows. An analytical expression for the inverse T-1 is also derived.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Analysis of precipitation reactions is extremely important in the technology of production of fine particles from the liquid phase. The control of composition and particle size in precipitation processes requires careful analysis of the several reactions that comprise the precipitation system. Since precipitation systems involve several, rapid ionic dissociation reactions among other slower ones, the faster reactions may be assumed to be nearly at equilibrium. However, the elimination of species, and the consequent reduction of the system of equations, is an aspect of analysis fraught with the possibility of subtle errors related to the violation of conservation principles. This paper shows how such errors may be avoided systematically by relying on the methods of linear algebra. Applications are demonstrated by analyzing the reactions leading to the precipitation of calcium carbonate in a stirred tank reactor as well as in a single emulsion drop. Sample calculations show that supersaturation dynamics can assume forms that can lead to subsequent dissolution of particles that have once been precipitated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We describe a System-C based framework we are developing, to explore the impact of various architectural and microarchitectural level parameters of the on-chip interconnection network elements on its power and performance. The framework enables one to choose from a variety of architectural options like topology, routing policy, etc., as well as allows experimentation with various microarchitectural options for the individual links like length, wire width, pitch, pipelining, supply voltage and frequency. The framework also supports a flexible traffic generation and communication model. We provide preliminary results of using this framework to study the power, latency and throughput of a 4x4 multi-core processing array using mesh, torus and folded torus, for two different communication patterns of dense and sparse linear algebra. The traffic consists of both Request-Response messages (mimicing cache accesses)and One-Way messages. We find that the average latency can be reduced by increasing the pipeline depth, as it enables higher link frequencies. We also find that there exists an optimum degree of pipelining which minimizes energy-delay product.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the world of high performance computing huge efforts have been put to accelerate Numerical Linear Algebra (NLA) kernels like QR Decomposition (QRD) with the added advantage of reconfigurability and scalability. While popular custom hardware solution in form of systolic arrays can deliver high performance, they are not scalable, and hence not commercially viable. In this paper, we show how systolic solutions of QRD can be realized efficiently on REDEFINE, a scalable runtime reconfigurable hardware platform. We propose various enhancements to REDEFINE to meet the custom need of accelerating NLA kernels. We further do the design space exploration of the proposed solution for any arbitrary application of size n × n. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we approach the problem of computing the characteristic polynomial of a matrix from the combinatorial viewpoint. We present several combinatorial characterizations of the coefficients of the characteristic polynomial, in terms of walks and closed walks of different kinds in the underlying graph. We develop algorithms based on these characterizations, and show that they tally with well-known algorithms arrived at independently from considerations in linear algebra.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We introduce the defect sequence for a contractive tuple of Hilbert space operators and investigate its properties. The defect sequence is a sequence of numbers, called defect dimensions associated with a contractive tuple. We show that there are upper bounds for the defect dimensions. The tuples for which these upper bounds are obtained, are called maximal contractive tuples. The upper bounds are different in the non-commutative and in the commutative case. We show that the creation operators on the full Fock space and the coordinate multipliers on the Drury-Arveson space are maximal. We also study pure tuples and see how the defect dimensions play a role in their irreducibility. (C) 2012 Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

3-Dimensional Diffuse Optical Tomographic (3-D DOT) image reconstruction algorithm is computationally complex and requires excessive matrix computations and thus hampers reconstruction in real time. In this paper, we present near real time 3D DOT image reconstruction that is based on Broyden approach for updating Jacobian matrix. The Broyden method simplifies the algorithm by avoiding re-computation of the Jacobian matrix in each iteration. We have developed CPU and heterogeneous CPU/GPU code for 3D DOT image reconstruction in C and MatLab programming platform. We have used Compute Unified Device Architecture (CUDA) programming framework and CUDA linear algebra library (CULA) to utilize the massively parallel computational power of GPUs (NVIDIA Tesla K20c). The computation time achieved for C program based implementation for a CPU/GPU system for 3 planes measurement and FEM mesh size of 19172 tetrahedral elements is 806 milliseconds for an iteration.