168 resultados para general matrix-matrix multiplication

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A parallel matrix multiplication algorithm is presented, and studies of its performance and estimation are discussed. The algorithm is implemented on a network of transputers connected in a ring topology. An efficient scheme for partitioning the input matrices is introduced which enables overlapping computation with communication. This makes the algorithm achieve near-ideal speed-up for reasonably large matrices. Analytical expressions for the execution time of the algorithm have been derived by analysing its computation and communication characteristics. These expressions are validated by comparing the theoretical results of the performance with the experimental values obtained on a four-transputer network for both square and irregular matrices. The analytical model is also used to estimate the performance of the algorithm for a varying number of transputers and varying problem sizes. Although the algorithm is implemented on transputers, the methodology and the partitioning scheme presented in this paper are quite general and can be implemented on other processors which have the capability of overlapping computation with communication. The equations for performance prediction can also be extended to other multiprocessor systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of maintaining information about the rank of a matrix $M$ under changes to its entries. For an $n \times n$ matrix $M$, we show an amortized upper bound of $O(n^{\omega-1})$ arithmetic operations per change for this problem, where $\omega < 2.376$ is the exponent for matrix multiplication, under the assumption that there is a {\em lookahead} of up to $\Theta(n)$ locations. That is, we know up to the next $\Theta(n)$ locations $(i_1,j_1),(i_2,j_2),\ldots,$ whose entries are going to change, in advance; however we do not know the new entries in these locations in advance. We get the new entries in these locations in a dynamic manner.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for LVCSR systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication.In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on a 1138 word vocabulary RM1 task using Sphinx 3.7 system show that, for a typical case the matrix multiplication approach leads to overall speedup of 46%. Both the low-rank approximation methods increase the speedup to around 60%, with the former method increasing the word error rate (WER) from 3.2% to 6.6%, while the latter increases the WER from 3.2% to 3.5%.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data flow computers are high-speed machines in which an instruction is executed as soon as all its operands are available. This paper describes the EXtended MANchester (EXMAN) data flow computer which incorporates three major extensions to the basic Manchester machine. As extensions we provide a multiple matching units scheme, an efficient, implementation of array data structure, and a facility to concurrently execute reentrant routines. A simulator for the EXMAN computer has been coded in the discrete event simulation language, SIMULA 67, on the DEC 1090 system. Performance analysis studies have been conducted on the simulated EXMAN computer to study the effectiveness of the proposed extensions. The performance experiments have been carried out using three sample problems: matrix multiplication, Bresenham's line drawing algorithm, and the polygon scan-conversion algorithm.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The velocity ratio algorithm developed from a heuristic study of transfer matrix multiplication has been employed to bring out the relative effects of the elements constituting a linear, one-dimensional acoustic filter, the overall dimensions of which are fixed, and synthesize a suitable straight-through configuration for a low-pass, wide-band, non-dissipative acoustic filter. The potential of the foregoing approach in applications to the rational design of practical acoustic filters such as automotive mufflers is indicated.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The transfer matrix method is known to be well suited for a complete analysis of a lumped as well as distributed element, one-dimensional, linear dynamical system with a marked chain topology. However, general subroutines of the type available for classical matrix methods are not available in the current literature on transfer matrix methods. In the present article, general expressions for various aspects of analysis-viz., natural frequency equation, modal vectors, forced response and filter performance—have been evaluated in terms of a single parameter, referred to as velocity ratio. Subprograms have been developed for use with the transfer matrix method for the evaluation of velocity ratio and related parameters. It is shown that a given system, branched or straight-through, can be completely analysed in terms of these basic subprograms, on a stored program digital computer. It is observed that the transfer matrix method with the velocity ratio approach has certain advantages over the existing general matrix methods in the analysis of one-dimensional systems.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In an earlier paper [1], it has been shown that velocity ratio, defined with reference to the analogous circuit, is a basic parameter in the complete analysis of a linear one-dimensional dynamical system. In this paper it is shown that the terms constituting velocity ratio can be readily determined by means of an algebraic algorithm developed from a heuristic study of the process of transfer matrix multiplication. The algorithm permits the set of most significant terms at a particular frequency of interest to be identified from a knowledge of the relative magnitudes of the impedances of the constituent elements of a proposed configuration. This feature makes the algorithm a potential tool in a first approach to a rational design of a complex dynamical filter. This algorithm is particularly suited for the desk analysis of a medium size system with lumped as well as distributed elements.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider the problem of computing an approximate minimum cycle basis of an undirected edge-weighted graph G with m edges and n vertices; the extension to directed graphs is also discussed. In this problem, a {0,1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of the weights of the cycles is minimum is called a minimum cycle basis of G. Cycle bases of low weight are useful in a number of contexts, e.g. the analysis of electrical networks, structural engineering, chemistry, and surface reconstruction. We present two new algorithms to compute an approximate minimum cycle basis. For any integer k >= 1, we give (2k - 1)-approximation algorithms with expected running time 0(kmn(1+2/k) + mn((1+1/k)(omega-1))) and deterministic running time 0(n(3+2/k)), respectively. Here omega is the best exponent of matrix multiplication. It is presently known that omega < 2.376. Both algorithms are o(m(omega)) for dense graphs. This is the first time that any algorithm which computes sparse cycle bases with a guarantee drops below the Theta(m(omega)) bound. We also present a 2-approximation algorithm with O(m(omega) root n log n) expected running time, a linear time 2-approximation algorithm for planar graphs and an O(n(3)) time 2.42-approximation algorithm for the complete Euclidean graph in the plane.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider the problem of computing an approximate minimum cycle basis of an undirected non-negative edge-weighted graph G with m edges and n vertices; the extension to directed graphs is also discussed. In this problem, a {0,1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of the weights of the cycles is minimum is called a minimum cycle basis of G. Cycle bases of low weight are useful in a number of contexts, e.g. the analysis of electrical networks, structural engineering, chemistry, and surface reconstruction. Although in most such applications any cycle basis can be used, a low weight cycle basis often translates to better performance and/or numerical stability. Despite the fact that the problem can be solved exactly in polynomial time, we design approximation algorithms since the performance of the exact algorithms may be too expensive for some practical applications. We present two new algorithms to compute an approximate minimum cycle basis. For any integer k >= 1, we give (2k - 1)-approximation algorithms with expected running time O(kmn(1+2/k) + mn((1+1/k)(omega-1))) and deterministic running time O(n(3+2/k) ), respectively. Here omega is the best exponent of matrix multiplication. It is presently known that omega < 2.376. Both algorithms are o(m(omega)) for dense graphs. This is the first time that any algorithm which computes sparse cycle bases with a guarantee drops below the Theta(m(omega) ) bound. We also present a 2-approximation algorithm with expected running time O(M-omega root n log n), a linear time 2-approximation algorithm for planar graphs and an O(n(3)) time 2.42-approximation algorithm for the complete Euclidean graph in the plane.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract. Let G = (V,E) be a weighted undirected graph, with non-negative edge weights. We consider the problem of efficiently computing approximate distances between all pairs of vertices in G. While many efficient algorithms are known for this problem in unweighted graphs, not many results are known for this problem in weighted graphs. Zwick [14] showed that for any fixed ε> 0, stretch 1 1 + ε distances between all pairs of vertices in a weighted directed graph on n vertices can be computed in Õ(n ω) time, where ω < 2.376 is the exponent of matrix multiplication and n is the number of vertices. It is known that finding distances of stretch less than 2 between all pairs of vertices in G is at least as hard as Boolean matrix multiplication of two n×n matrices. It is also known that all-pairs stretch 3 distances can be computed in Õ(n 2) time and all-pairs stretch 7/3 distances can be computed in Õ(n 7/3) time. Here we consider efficient algorithms for the problem of computing all-pairs stretch (2+ε) distances in G, for any 0 < ε < 1. We show that all pairs stretch (2 + ε) distances for any fixed ε> 0 in G can be computed in expected time O(n 9/4 logn). This algorithm uses a fast rectangular matrix multiplication subroutine. We also present a combinatorial algorithm (that is, it does not use fast matrix multiplication) with expected running time O(n 9/4) for computing all-pairs stretch 5/2 distances in G. 1

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider the problem of computing a minimum cycle basis in a directed graph G. The input to this problem is a directed graph whose arcs have positive weights. In this problem a {- 1, 0, 1} incidence vector is associated with each cycle and the vector space over Q generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of weights of the cycles is minimum is called a minimum cycle basis of G. The current fastest algorithm for computing a minimum cycle basis in a directed graph with m arcs and n vertices runs in O(m(w+1)n) time (where w < 2.376 is the exponent of matrix multiplication). If one allows randomization, then an (O) over tilde (m(3)n) algorithm is known for this problem. In this paper we present a simple (O) over tilde (m(2)n) randomized algorithm for this problem. The problem of computing a minimum cycle basis in an undirected graph has been well-studied. In this problem a {0, 1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of the graph. The fastest known algorithm for computing a minimum cycle basis in an undirected graph runs in O(m(2)n + mn(2) logn) time and our randomized algorithm for directed graphs almost matches this running time.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Writing the hindered rotor (hr) partition function as the trace of (rho) over cap = e(-beta(H) over cap hr), we approximate it by the sum of contributions from a set of points in position space. The contribution of the density matrix from each point is approximated by performing a local harmonic expansion around it. The highlight of this method is that it can be easily extended to multidimensional systems. Local harmonic expansion leads to a breakdown of the method a low temperatures. In order to calculate the partition function at low temperatures, we suggest a matrix multiplication procedure. The results obtained using these methods closely agree with the exact partition function at all temperature ranges. Our method bypasses the evaluation of eigenvalues and eigenfunctions and evaluates the density matrix for internal rotation directly. We also suggest a procedure to account for the antisymmetry of the total wavefunction in the same. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Coarse Grained Reconfigurable Architectures (CGRA) are emerging as embedded application processing units in computing platforms for Exascale computing. Such CGRAs are distributed memory multi- core compute elements on a chip that communicate over a Network-on-chip (NoC). Numerical Linear Algebra (NLA) kernels are key to several high performance computing applications. In this paper we propose a systematic methodology to obtain the specification of Compute Elements (CE) for such CGRAs. We analyze block Matrix Multiplication and block LU Decomposition algorithms in the context of a CGRA, and obtain theoretical bounds on communication requirements, and memory sizes for a CE. Support for high performance custom computations common to NLA kernels are met through custom function units (CFUs) in the CEs. We present results to justify the merits of such CFUs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A computational scheme for determining the dynamic stiffness coefficients of a linear, inclined, translating and viscously/hysteretically damped cable element is outlined. Also taken into account is the coupling between inplane transverse and longitudinal forms of cable vibration. The scheme is based on conversion of the governing set of quasistatic boundary value problems into a larger equivalent set of initial value problems, which are subsequently numerically integrated in a spatial domain using marching algorithms. Numerical results which bring out the nature of the dynamic stiffness coefficients are presented. A specific example of random vibration analysis of a long span cable subjected to earthquake support motions modeled as vector gaussian random processes is also discussed. The approach presented is versatile and capable of handling many complicating effects in cable dynamics in a unified manner.