991 resultados para general matrix-matrix multiplication


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The mapping of matrix multiplied by matrix multiplication onto both word and bit level systolic arrays has been investigated. It has been found that well defined word and bit level data flow constraints must be satisfied within such circuits. An efficient and highly regular bit level array has been generated by exploiting the basic compatibilities in data flow symmetries at each level of the problem. A description of the circuit which emerges is given and some details relating to its practical implementation are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We give a general matrix formula for computing the second-order skewness of maximum likelihood estimators. The formula was firstly presented in a tensorial version by Bowman and Shenton (1998). Our matrix formulation has numerical advantages, since it requires only simple operations on matrices and vectors. We apply the second-order skewness formula to a normal model with a generalized parametrization and to an ARMA model. (c) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the growing popularity of cloud computing, outsourced computing has attracted much research effort recently. A computationally weak client is capable of delegating its heavy computing tasks, such as large matrix multiplications, to the cloud server. Critical requirements for such tasks include the need to guarantee the unforgeability of computing results and the preservation of the privacy of clients. On one hand, the result computed by the cloud server needs to be verified since the cloud server cannot be fully honest. On the other hand, as the data involved in computing may contain some sensitive information of the client, the data should not be identified by the cloud server. In this paper, we address these above issues by developing an Efficient and Secure Outsourcing scheme for Large Matrix Multiplication, named ESO- LMM. Security analysis demonstrates that ESO-LMM achieves the security requirements in terms of unforgeability of proof and privacy protection of outsourced data. Furthermore, performance evaluation indicates that ESO-LMM is much more efficient compared with the existing works in terms of computation, communication and storage overhead.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

对GOTOBLAS库(GOTO)的实现机制,尤其是其中的一般矩阵乘法部分的实现进行了分析.结合近年来的一些研究成果,讨论了如何高效地实现矩阵相乘操作,把存储层次对程序性能的影响提高到计算模型的高度.对比实验表明,GOTO库的性能远远高于没有考虑存储层次的一般BLAS库.证明了GOTO库性能上的优越性和将存储层次引入计算模型的必要性.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Authenticated Encryption (AE) is the cryptographic process of providing simultaneous confidentiality and integrity protection to messages. This approach is more efficient than applying a two-step process of providing confidentiality for a message by encrypting the message, and in a separate pass providing integrity protection by generating a Message Authentication Code (MAC). AE using symmetric ciphers can be provided by either stream ciphers with built in authentication mechanisms or block ciphers using appropriate modes of operation. However, stream ciphers have the potential for higher performance and smaller footprint in hardware and/or software than block ciphers. This property makes stream ciphers suitable for resource constrained environments, where storage and computational power are limited. There have been several recent stream cipher proposals that claim to provide AE. These ciphers can be analysed using existing techniques that consider confidentiality or integrity separately; however currently there is no existing framework for the analysis of AE stream ciphers that analyses these two properties simultaneously. This thesis introduces a novel framework for the analysis of AE using stream cipher algorithms. This thesis analyzes the mechanisms for providing confidentiality and for providing integrity in AE algorithms using stream ciphers. There is a greater emphasis on the analysis of the integrity mechanisms, as there is little in the public literature on this, in the context of authenticated encryption. The thesis has four main contributions as follows. The first contribution is the design of a framework that can be used to classify AE stream ciphers based on three characteristics. The first classification applies Bellare and Namprempre's work on the the order in which encryption and authentication processes take place. The second classification is based on the method used for accumulating the input message (either directly or indirectly) into the into the internal states of the cipher to generate a MAC. The third classification is based on whether the sequence that is used to provide encryption and authentication is generated using a single key and initial vector, or two keys and two initial vectors. The second contribution is the application of an existing algebraic method to analyse the confidentiality algorithms of two AE stream ciphers; namely SSS and ZUC. The algebraic method is based on considering the nonlinear filter (NLF) of these ciphers as a combiner with memory. This method enables us to construct equations for the NLF that relate the (inputs, outputs and memory of the combiner) to the output keystream. We show that both of these ciphers are secure from this type of algebraic attack. We conclude that using a keydependent SBox in the NLF twice, and using two different SBoxes in the NLF of ZUC, prevents this type of algebraic attack. The third contribution is a new general matrix based model for MAC generation where the input message is injected directly into the internal state. This model describes the accumulation process when the input message is injected directly into the internal state of a nonlinear filter generator. We show that three recently proposed AE stream ciphers can be considered as instances of this model; namely SSS, NLSv2 and SOBER-128. Our model is more general than a previous investigations into direct injection. Possible forgery attacks against this model are investigated. It is shown that using a nonlinear filter in the accumulation process of the input message when either the input message or the initial states of the register is unknown prevents forgery attacks based on collisions. The last contribution is a new general matrix based model for MAC generation where the input message is injected indirectly into the internal state. This model uses the input message as a controller to accumulate a keystream sequence into an accumulation register. We show that three current AE stream ciphers can be considered as instances of this model; namely ZUC, Grain-128a and Sfinks. We establish the conditions under which the model is susceptible to forgery and side-channel attacks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A common problem with the use of tensor modeling in generating quality recommendations for large datasets is scalability. In this paper, we propose the Tensor-based Recommendation using Probabilistic Ranking method that generates the reconstructed tensor using block-striped parallel matrix multiplication and then probabilistically calculates the preferences of user to rank the recommended items. Empirical analysis on two real-world datasets shows that the proposed method is scalable for large tensor datasets and is able to outperform the benchmarking methods in terms of accuracy.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data flow computers are high-speed machines in which an instruction is executed as soon as all its operands are available. This paper describes the EXtended MANchester (EXMAN) data flow computer which incorporates three major extensions to the basic Manchester machine. As extensions we provide a multiple matching units scheme, an efficient, implementation of array data structure, and a facility to concurrently execute reentrant routines. A simulator for the EXMAN computer has been coded in the discrete event simulation language, SIMULA 67, on the DEC 1090 system. Performance analysis studies have been conducted on the simulated EXMAN computer to study the effectiveness of the proposed extensions. The performance experiments have been carried out using three sample problems: matrix multiplication, Bresenham's line drawing algorithm, and the polygon scan-conversion algorithm.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The velocity ratio algorithm developed from a heuristic study of transfer matrix multiplication has been employed to bring out the relative effects of the elements constituting a linear, one-dimensional acoustic filter, the overall dimensions of which are fixed, and synthesize a suitable straight-through configuration for a low-pass, wide-band, non-dissipative acoustic filter. The potential of the foregoing approach in applications to the rational design of practical acoustic filters such as automotive mufflers is indicated.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The transfer matrix method is known to be well suited for a complete analysis of a lumped as well as distributed element, one-dimensional, linear dynamical system with a marked chain topology. However, general subroutines of the type available for classical matrix methods are not available in the current literature on transfer matrix methods. In the present article, general expressions for various aspects of analysis-viz., natural frequency equation, modal vectors, forced response and filter performance—have been evaluated in terms of a single parameter, referred to as velocity ratio. Subprograms have been developed for use with the transfer matrix method for the evaluation of velocity ratio and related parameters. It is shown that a given system, branched or straight-through, can be completely analysed in terms of these basic subprograms, on a stored program digital computer. It is observed that the transfer matrix method with the velocity ratio approach has certain advantages over the existing general matrix methods in the analysis of one-dimensional systems.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In an earlier paper [1], it has been shown that velocity ratio, defined with reference to the analogous circuit, is a basic parameter in the complete analysis of a linear one-dimensional dynamical system. In this paper it is shown that the terms constituting velocity ratio can be readily determined by means of an algebraic algorithm developed from a heuristic study of the process of transfer matrix multiplication. The algorithm permits the set of most significant terms at a particular frequency of interest to be identified from a knowledge of the relative magnitudes of the impedances of the constituent elements of a proposed configuration. This feature makes the algorithm a potential tool in a first approach to a rational design of a complex dynamical filter. This algorithm is particularly suited for the desk analysis of a medium size system with lumped as well as distributed elements.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider the problem of computing an approximate minimum cycle basis of an undirected edge-weighted graph G with m edges and n vertices; the extension to directed graphs is also discussed. In this problem, a {0,1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of the weights of the cycles is minimum is called a minimum cycle basis of G. Cycle bases of low weight are useful in a number of contexts, e.g. the analysis of electrical networks, structural engineering, chemistry, and surface reconstruction. We present two new algorithms to compute an approximate minimum cycle basis. For any integer k >= 1, we give (2k - 1)-approximation algorithms with expected running time 0(kmn(1+2/k) + mn((1+1/k)(omega-1))) and deterministic running time 0(n(3+2/k)), respectively. Here omega is the best exponent of matrix multiplication. It is presently known that omega < 2.376. Both algorithms are o(m(omega)) for dense graphs. This is the first time that any algorithm which computes sparse cycle bases with a guarantee drops below the Theta(m(omega)) bound. We also present a 2-approximation algorithm with O(m(omega) root n log n) expected running time, a linear time 2-approximation algorithm for planar graphs and an O(n(3)) time 2.42-approximation algorithm for the complete Euclidean graph in the plane.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider the problem of computing an approximate minimum cycle basis of an undirected non-negative edge-weighted graph G with m edges and n vertices; the extension to directed graphs is also discussed. In this problem, a {0,1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of the weights of the cycles is minimum is called a minimum cycle basis of G. Cycle bases of low weight are useful in a number of contexts, e.g. the analysis of electrical networks, structural engineering, chemistry, and surface reconstruction. Although in most such applications any cycle basis can be used, a low weight cycle basis often translates to better performance and/or numerical stability. Despite the fact that the problem can be solved exactly in polynomial time, we design approximation algorithms since the performance of the exact algorithms may be too expensive for some practical applications. We present two new algorithms to compute an approximate minimum cycle basis. For any integer k >= 1, we give (2k - 1)-approximation algorithms with expected running time O(kmn(1+2/k) + mn((1+1/k)(omega-1))) and deterministic running time O(n(3+2/k) ), respectively. Here omega is the best exponent of matrix multiplication. It is presently known that omega < 2.376. Both algorithms are o(m(omega)) for dense graphs. This is the first time that any algorithm which computes sparse cycle bases with a guarantee drops below the Theta(m(omega) ) bound. We also present a 2-approximation algorithm with expected running time O(M-omega root n log n), a linear time 2-approximation algorithm for planar graphs and an O(n(3)) time 2.42-approximation algorithm for the complete Euclidean graph in the plane.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract. Let G = (V,E) be a weighted undirected graph, with non-negative edge weights. We consider the problem of efficiently computing approximate distances between all pairs of vertices in G. While many efficient algorithms are known for this problem in unweighted graphs, not many results are known for this problem in weighted graphs. Zwick [14] showed that for any fixed ε> 0, stretch 1 1 + ε distances between all pairs of vertices in a weighted directed graph on n vertices can be computed in Õ(n ω) time, where ω < 2.376 is the exponent of matrix multiplication and n is the number of vertices. It is known that finding distances of stretch less than 2 between all pairs of vertices in G is at least as hard as Boolean matrix multiplication of two n×n matrices. It is also known that all-pairs stretch 3 distances can be computed in Õ(n 2) time and all-pairs stretch 7/3 distances can be computed in Õ(n 7/3) time. Here we consider efficient algorithms for the problem of computing all-pairs stretch (2+ε) distances in G, for any 0 < ε < 1. We show that all pairs stretch (2 + ε) distances for any fixed ε> 0 in G can be computed in expected time O(n 9/4 logn). This algorithm uses a fast rectangular matrix multiplication subroutine. We also present a combinatorial algorithm (that is, it does not use fast matrix multiplication) with expected running time O(n 9/4) for computing all-pairs stretch 5/2 distances in G. 1

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider the problem of computing a minimum cycle basis in a directed graph G. The input to this problem is a directed graph whose arcs have positive weights. In this problem a {- 1, 0, 1} incidence vector is associated with each cycle and the vector space over Q generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of weights of the cycles is minimum is called a minimum cycle basis of G. The current fastest algorithm for computing a minimum cycle basis in a directed graph with m arcs and n vertices runs in O(m(w+1)n) time (where w < 2.376 is the exponent of matrix multiplication). If one allows randomization, then an (O) over tilde (m(3)n) algorithm is known for this problem. In this paper we present a simple (O) over tilde (m(2)n) randomized algorithm for this problem. The problem of computing a minimum cycle basis in an undirected graph has been well-studied. In this problem a {0, 1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of the graph. The fastest known algorithm for computing a minimum cycle basis in an undirected graph runs in O(m(2)n + mn(2) logn) time and our randomized algorithm for directed graphs almost matches this running time.