943 resultados para sparse matrices
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
Miralls deformables més i més grans, amb cada cop més actuadors estan sent utilitzats actualment en aplicacions d'òptica adaptativa. El control dels miralls amb centenars d'actuadors és un tema de gran interès, ja que les tècniques de control clàssiques basades en la seudoinversa de la matriu de control del sistema es tornen massa lentes quan es tracta de matrius de dimensions tan grans. En aquesta tesi doctoral es proposa un mètode per l'acceleració i la paral.lelitzacó dels algoritmes de control d'aquests miralls, a través de l'aplicació d'una tècnica de control basada en la reducció a zero del components més petits de la matriu de control (sparsification), seguida de l'optimització de l'ordenació dels accionadors de comandament atenent d'acord a la forma de la matriu, i finalment de la seva posterior divisió en petits blocs tridiagonals. Aquests blocs són molt més petits i més fàcils de fer servir en els càlculs, el que permet velocitats de càlcul molt superiors per l'eliminació dels components nuls en la matriu de control. A més, aquest enfocament permet la paral.lelització del càlcul, donant una com0onent de velocitat addicional al sistema. Fins i tot sense paral. lelització, s'ha obtingut un augment de gairebé un 40% de la velocitat de convergència dels miralls amb només 37 actuadors, mitjançant la tècnica proposada. Per validar això, s'ha implementat un muntatge experimental nou complet , que inclou un modulador de fase programable per a la generació de turbulència mitjançant pantalles de fase, i s'ha desenvolupat un model complert del bucle de control per investigar el rendiment de l'algorisme proposat. Els resultats, tant en la simulació com experimentalment, mostren l'equivalència total en els valors de desviació després de la compensació dels diferents tipus d'aberracions per als diferents algoritmes utilitzats, encara que el mètode proposat aquí permet una càrrega computacional molt menor. El procediment s'espera que sigui molt exitós quan s'aplica a miralls molt grans.
Resumo:
We show that the Hausdorff dimension of the spectral measure of a class of deterministic, i.e. nonrandom, block-Jacobi matrices may be determined with any degree of precision, improving a result of Zlatos [Andrej Zlatos,. Sparse potentials with fractional Hausdorff dimension, J. Funct. Anal. 207 (2004) 216-252]. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
This paper describes a methodology for solving a linear system of equations on vector computer. The methodology combines direct and inverse factors. The decomposition and implementation of the direct solution in a CRAY Y-MPZE/232, and the performance results are discussed.
Resumo:
Using methods of statistical physics, we study the average number and kernel size of general sparse random matrices over GF(q), with a given connectivity profile, in the thermodynamical limit of large matrices. We introduce a mapping of GF(q) matrices onto spin systems using the representation of the cyclic group of order q as the q-th complex roots of unity. This representation facilitates the derivation of the average kernel size of random matrices using the replica approach, under the replica symmetric ansatz, resulting in saddle point equations for general connectivity distributions. Numerical solutions are then obtained for particular cases by population dynamics. Similar techniques also allow us to obtain an expression for the exact and average number of random matrices for any general connectivity profile. We present numerical results for particular distributions.
Resumo:
Typical properties of sparse random matrices over finite (Galois) fields are studied, in the limit of large matrices, using techniques from the physics of disordered systems. For the case of a finite field GF(q) with prime order q, we present results for the average kernel dimension, average dimension of the eigenvector spaces and the distribution of the eigenvalues. The number of matrices for a given distribution of entries is also calculated for the general case. The significance of these results to error-correcting codes and random graphs is also discussed.
Resumo:
We detail the automatic construction of R matrices corresponding to (the tensor products of) the (O-m\alpha(n)) families of highest-weight representations of the quantum superalgebras Uq[gl(m\n)]. These representations are irreducible, contain a free complex parameter a, and are 2(mn)-dimensional. Our R matrices are actually (sparse) rank 4 tensors, containing a total of 2(4mn) components, each of which is in general an algebraic expression in the two complex variables q and a. Although the constructions are straightforward, we describe them in full here, to fill a perceived gap in the literature. As the algorithms are generally impracticable for manual calculation, we have implemented the entire process in MATHEMATICA; illustrating our results with U-q [gl(3\1)]. (C) 2002 Published by Elsevier Science B.V.
Resumo:
In this work, we consider the numerical solution of a large eigenvalue problem resulting from a finite rank discretization of an integral operator. We are interested in computing a few eigenpairs, with an iterative method, so a matrix representation that allows for fast matrix-vector products is required. Hierarchical matrices are appropriate for this setting, and also provide cheap LU decompositions required in the spectral transformation technique. We illustrate the use of freely available software tools to address the problem, in particular SLEPc for the eigensolvers and HLib for the construction of H-matrices. The numerical tests are performed using an astrophysics application. Results show the benefits of the data-sparse representation compared to standard storage schemes, in terms of computational cost as well as memory requirements.
Resumo:
Sparse coding aims to find a more compact representation based on a set of dictionary atoms. A well-known technique looking at 2D sparsity is the low rank representation (LRR). However, in many computer vision applications, data often originate from a manifold, which is equipped with some Riemannian geometry. In this case, the existing LRR becomes inappropriate for modeling and incorporating the intrinsic geometry of the manifold that is potentially important and critical to applications. In this paper, we generalize the LRR over the Euclidean space to the LRR model over a specific Rimannian manifold—the manifold of symmetric positive matrices (SPD). Experiments on several computer vision datasets showcase its noise robustness and superior performance on classification and segmentation compared with state-of-the-art approaches.
Resumo:
The modern GPUs are well suited for intensive computational tasks and massive parallel computation. Sparse matrix multiplication and linear triangular solver are the most important and heavily used kernels in scientific computation, and several challenges in developing a high performance kernel with the two modules is investigated. The main interest it to solve linear systems derived from the elliptic equations with triangular elements. The resulting linear system has a symmetric positive definite matrix. The sparse matrix is stored in the compressed sparse row (CSR) format. It is proposed a CUDA algorithm to execute the matrix vector multiplication using directly the CSR format. A dependence tree algorithm is used to determine which variables the linear triangular solver can determine in parallel. To increase the number of the parallel threads, a coloring graph algorithm is implemented to reorder the mesh numbering in a pre-processing phase. The proposed method is compared with parallel and serial available libraries. The results show that the proposed method improves the computation cost of the matrix vector multiplication. The pre-processing associated with the triangular solver needs to be executed just once in the proposed method. The conjugate gradient method was implemented and showed similar convergence rate for all the compared methods. The proposed method showed significant smaller execution time.
Resumo:
Let P be a system of n linear nonhomogeneous ordinary differential polynomials in a set U of n-1 differential indeterminates. Differential resultant formulas are presented to eliminate the differential indeterminates in U from P. These formulas are determinants of coefficient matrices of appropriate sets of derivatives of the differential polynomials in P, or in a linear perturbation Pe of P. In particular, the formula dfres(P) is the determinant of a matrix M(P) having no zero columns if the system P is ``super essential". As an application, if the system PP is sparse generic, such formulas can be used to compute the differential resultant dres(PP) introduced by Li, Gao and Yuan.
Resumo:
Differential resultant formulas are defined, for a system $\cP$ of $n$ ordinary Laurent differential polynomials in $n-1$ differential variables. These are determinants of coefficient matrices of an extended system of polynomials obtained from $\cP$ through derivations and multiplications by Laurent monomials. To start, through derivations, a system $\ps(\cP)$ of $L$ polynomials in $L-1$ algebraic variables is obtained, which is non sparse in the order of derivation. This enables the use of existing formulas for the computation of algebraic resultants, of the multivariate sparse algebraic polynomials in $\ps(\cP)$, to obtain polynomials in the differential elimination ideal generated by $\cP$. The formulas obtained are multiples of the sparse differential resultant defined by Li, Yuan and Gao, and provide order and degree bounds in terms of mixed volumes in the generic case.
Resumo:
Mode of access: Internet.
Resumo:
The basic reproduction number is a key parameter in mathematical modelling of transmissible diseases. From the stability analysis of the disease free equilibrium, by applying Routh-Hurwitz criteria, a threshold is obtained, which is called the basic reproduction number. However, the application of spectral radius theory on the next generation matrix provides a different expression for the basic reproduction number, that is, the square root of the previously found formula. If the spectral radius of the next generation matrix is defined as the geometric mean of partial reproduction numbers, however the product of these partial numbers is the basic reproduction number, then both methods provide the same expression. In order to show this statement, dengue transmission modelling incorporating or not the transovarian transmission is considered as a case study. Also tuberculosis transmission and sexually transmitted infection modellings are taken as further examples.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física