990 resultados para sparse matrix-vector multiplication


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we discuss a fast Bayesian extension to kriging algorithms which has been used successfully for fast, automatic mapping in emergency conditions in the Spatial Interpolation Comparison 2004 (SIC2004) exercise. The application of kriging to automatic mapping raises several issues such as robustness, scalability, speed and parameter estimation. Various ad-hoc solutions have been proposed and used extensively but they lack a sound theoretical basis. In this paper we show how observations can be projected onto a representative subset of the data, without losing significant information. This allows the complexity of the algorithm to grow as O(n m 2), where n is the total number of observations and m is the size of the subset of the observations retained for prediction. The main contribution of this paper is to further extend this projective method through the application of space-limited covariance functions, which can be used as an alternative to the commonly used covariance models. In many real world applications the correlation between observations essentially vanishes beyond a certain separation distance. Thus it makes sense to use a covariance model that encompasses this belief since this leads to sparse covariance matrices for which optimised sparse matrix techniques can be used. In the presence of extreme values we show that space-limited covariance functions offer an additional benefit, they maintain the smoothness locally but at the same time lead to a more robust, and compact, global model. We show the performance of this technique coupled with the sparse extension to the kriging algorithm on synthetic data and outline a number of computational benefits such an approach brings. To test the relevance to automatic mapping we apply the method to the data used in a recent comparison of interpolation techniques (SIC2004) to map the levels of background ambient gamma radiation. © Springer-Verlag 2007.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An embedded architecture of optical vector matrix multiplier (OVMM) is presented. The embedded architecture is aimed at optimising the data flow of vector matrix multiplier (VMM) to promote its performance. Data dependence is discussed when the OVMM is connected to a cluster system. A simulator is built to analyse the performance according to the architecture. According to the simulation, Amdahl's law is used to analyse the hybrid opto-electronic system. It is found that the electronic part and its interaction with optical part form the bottleneck of system.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In this paper we introduce a new algorithm, based on the successful work of Fathi and Alexandrov, on hybrid Monte Carlo algorithms for matrix inversion and solving systems of linear algebraic equations. This algorithm consists of two parts, approximate inversion by Monte Carlo and iterative refinement using a deterministic method. Here we present a parallel hybrid Monte Carlo algorithm, which uses Monte Carlo to generate an approximate inverse and that improves the accuracy of the inverse with an iterative refinement. The new algorithm is applied efficiently to sparse non-singular matrices. When we are solving a system of linear algebraic equations, Bx = b, the inverse matrix is used to compute the solution vector x = B(-1)b. We present results that show the efficiency of the parallel hybrid Monte Carlo algorithm in the case of sparse matrices.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Least square problem with l1 regularization has been proposed as a promising method for sparse signal reconstruction (e.g., basis pursuit de-noising and compressed sensing) and feature selection (e.g., the Lasso algorithm) in signal processing, statistics, and related fields. These problems can be cast as l1-regularized least-square program (LSP). In this paper, we propose a novel monotonic fixed point method to solve large-scale l1-regularized LSP. And we also prove the stability and convergence of the proposed method. Furthermore we generalize this method to least square matrix problem and apply it in nonnegative matrix factorization (NMF). The method is illustrated on sparse signal reconstruction, partner recognition and blind source separation problems, and the method tends to convergent faster and sparser than other l1-regularized algorithms.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Nonnegative matrix factorization (NMF) is a widely used method for blind spectral unmixing (SU), which aims at obtaining the endmembers and corresponding fractional abundances, knowing only the collected mixing spectral data. It is noted that the abundance may be sparse (i.e., the endmembers may be with sparse distributions) and sparse NMF tends to lead to a unique result, so it is intuitive and meaningful to constrain NMF with sparseness for solving SU. However, due to the abundance sum-to-one constraint in SU, the traditional sparseness measured by L0/L1-norm is not an effective constraint any more. A novel measure (termed as S-measure) of sparseness using higher order norms of the signal vector is proposed in this paper. It features the physical significance. By using the S-measure constraint (SMC), a gradient-based sparse NMF algorithm (termed as NMF-SMC) is proposed for solving the SU problem, where the learning rate is adaptively selected, and the endmembers and abundances are simultaneously estimated. In the proposed NMF-SMC, there is no pure index assumption and no need to know the exact sparseness degree of the abundance in prior. Yet, it does not require the preprocessing of dimension reduction in which some useful information may be lost. Experiments based on synthetic mixtures and real-world images collected by AVIRIS and HYDICE sensors are performed to evaluate the validity of the proposed method.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper describes a methodology for solving efficiently the sparse network equations on multiprocessor computers. The methodology is based on the matrix inverse factors (W-matrix) approach to the direct solution phase of A(x) = b systems. A partitioning scheme of W-matrix , based on the leaf-nodes of the factorization path tree, is proposed. The methodology allows the performance of all the updating operations on vector b in parallel, within each partition, using a row-oriented processing. The approach takes advantage of the processing power of the individual processors. Performance results are presented and discussed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In automatic facial expression recognition, an increasing number of techniques had been proposed for in the literature that exploits the temporal nature of facial expressions. As all facial expressions are known to evolve over time, it is crucially important for a classifier to be capable of modelling their dynamics. We establish that the method of sparse representation (SR) classifiers proves to be a suitable candidate for this purpose, and subsequently propose a framework for expression dynamics to be efficiently incorporated into its current formulation. We additionally show that for the SR method to be applied effectively, then a certain threshold on image dimensionality must be enforced (unlike in facial recognition problems). Thirdly, we determined that recognition rates may be significantly influenced by the size of the projection matrix \Phi. To demonstrate these, a battery of experiments had been conducted on the CK+ dataset for the recognition of the seven prototypic expressions - anger, contempt, disgust, fear, happiness, sadness and surprise - and comparisons have been made between the proposed temporal-SR against the static-SR framework and state-of-the-art support vector machine.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The generation of a correlation matrix for set of genomic sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. Each sequence may be millions of bases long and there may be thousands of such sequences which we wish to compare, so not all sequences may fit into main memory at the same time. Each sequence needs to be compared with every other sequence, so we will generally need to page some sequences in and out more than once. In order to minimize execution time we need to minimize this I/O. This paper develops an approach for faster and scalable computing of large-size correlation matrices through the maximal exploitation of available memory and reducing the number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different bioinformatics problems with different correlation matrix sizes. The significant performance improvement of the approach over previous work is demonstrated through benchmark examples.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The efficient computation of matrix function vector products has become an important area of research in recent times, driven in particular by two important applications: the numerical solution of fractional partial differential equations and the integration of large systems of ordinary differential equations. In this work we consider a problem that combines these two applications, in the form of a numerical solution algorithm for fractional reaction diffusion equations that after spatial discretisation, is advanced in time using the exponential Euler method. We focus on the efficient implementation of the algorithm on Graphics Processing Units (GPU), as we wish to make use of the increased computational power available with this hardware. We compute the matrix function vector products using the contour integration method in [N. Hale, N. Higham, and L. Trefethen. Computing Aα, log(A), and related matrix functions by contour integrals. SIAM J. Numer. Anal., 46(5):2505–2523, 2008]. Multiple levels of preconditioning are applied to reduce the GPU memory footprint and to further accelerate convergence. We also derive an error bound for the convergence of the contour integral method that allows us to pre-determine the appropriate number of quadrature points. Results are presented that demonstrate the effectiveness of the method for large two-dimensional problems, showing a speedup of more than an order of magnitude compared to a CPU-only implementation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We consider the problem of computing an approximate minimum cycle basis of an undirected edge-weighted graph G with m edges and n vertices; the extension to directed graphs is also discussed. In this problem, a {0,1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of the weights of the cycles is minimum is called a minimum cycle basis of G. Cycle bases of low weight are useful in a number of contexts, e.g. the analysis of electrical networks, structural engineering, chemistry, and surface reconstruction. We present two new algorithms to compute an approximate minimum cycle basis. For any integer k >= 1, we give (2k - 1)-approximation algorithms with expected running time 0(kmn(1+2/k) + mn((1+1/k)(omega-1))) and deterministic running time 0(n(3+2/k)), respectively. Here omega is the best exponent of matrix multiplication. It is presently known that omega < 2.376. Both algorithms are o(m(omega)) for dense graphs. This is the first time that any algorithm which computes sparse cycle bases with a guarantee drops below the Theta(m(omega)) bound. We also present a 2-approximation algorithm with O(m(omega) root n log n) expected running time, a linear time 2-approximation algorithm for planar graphs and an O(n(3)) time 2.42-approximation algorithm for the complete Euclidean graph in the plane.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Mandelstam�s argument that PCAC follows from assigning Lorentz quantum numberM=1 to the massless pion is examined in the context of multiparticle dual resonance model. We construct a factorisable dual model for pions which is formulated operatorially on the harmonic oscillator Fock space along the lines of Neveu-Schwarz model. The model has bothm ? andm ? as arbitrary parameters unconstrained by the duality requirement. Adler self-consistency condition is satisfied if and only if the conditionm?2?m?2=1/2 is imposed, in which case the model reduces to the chiral dual pion model of Neveu and Thorn, and Schwarz. The Lorentz quantum number of the pion in the dual model is shown to beM=0.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We consider the problem of computing an approximate minimum cycle basis of an undirected non-negative edge-weighted graph G with m edges and n vertices; the extension to directed graphs is also discussed. In this problem, a {0,1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of the weights of the cycles is minimum is called a minimum cycle basis of G. Cycle bases of low weight are useful in a number of contexts, e.g. the analysis of electrical networks, structural engineering, chemistry, and surface reconstruction. Although in most such applications any cycle basis can be used, a low weight cycle basis often translates to better performance and/or numerical stability. Despite the fact that the problem can be solved exactly in polynomial time, we design approximation algorithms since the performance of the exact algorithms may be too expensive for some practical applications. We present two new algorithms to compute an approximate minimum cycle basis. For any integer k >= 1, we give (2k - 1)-approximation algorithms with expected running time O(kmn(1+2/k) + mn((1+1/k)(omega-1))) and deterministic running time O(n(3+2/k) ), respectively. Here omega is the best exponent of matrix multiplication. It is presently known that omega < 2.376. Both algorithms are o(m(omega)) for dense graphs. This is the first time that any algorithm which computes sparse cycle bases with a guarantee drops below the Theta(m(omega) ) bound. We also present a 2-approximation algorithm with expected running time O(M-omega root n log n), a linear time 2-approximation algorithm for planar graphs and an O(n(3)) time 2.42-approximation algorithm for the complete Euclidean graph in the plane.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A parallel matrix multiplication algorithm is presented, and studies of its performance and estimation are discussed. The algorithm is implemented on a network of transputers connected in a ring topology. An efficient scheme for partitioning the input matrices is introduced which enables overlapping computation with communication. This makes the algorithm achieve near-ideal speed-up for reasonably large matrices. Analytical expressions for the execution time of the algorithm have been derived by analysing its computation and communication characteristics. These expressions are validated by comparing the theoretical results of the performance with the experimental values obtained on a four-transputer network for both square and irregular matrices. The analytical model is also used to estimate the performance of the algorithm for a varying number of transputers and varying problem sizes. Although the algorithm is implemented on transputers, the methodology and the partitioning scheme presented in this paper are quite general and can be implemented on other processors which have the capability of overlapping computation with communication. The equations for performance prediction can also be extended to other multiprocessor systems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A direct twos-complement parallel array multiplication algorithm is introduced and modified for digital optical numerical computation. The modified version overcomes the problems encountered in the conventional optical twos-complement algorithm. In the array, all the summands are generated in parallel, and the relevant summands having the same weights are added simultaneously without carries, resulting in the product expressed in a mixed twos-complement system. In a two-stage array, complex multiplication is possible with using four real subarrays. Furthermore, with a three-stage array architecture, complex matrix operation is straightforwardly accomplished. In the experiment, parallel two-stage array complex multiplication with liquid-crystal panels is demonstrated.