156 resultados para Speedup
Resumo:
In this paper we present a novel macroblock mode decision algorithm to speedup H.264/SVC Intra frame encoding. We replace the complex mode-decision calculations by a classifier which has been trained specifically to minimize the reduction in RD performance. This results in a significant speedup in encoding. The results show that machine learning has a great potential and can reduce the complexity substantially with negligible impact on quality. The results show that the proposed method reduces encoding time to about 70% in base layer and up to 50% in enhancement layer of the reference implementation with a negligible loss in quality.
Resumo:
Instruction reuse is a microarchitectural technique that improves the execution time of a program by removing redundant computations at run-time. Although this is the job of an optimizing compiler, they do not succeed many a time due to limited knowledge of run-time data. In this paper we examine instruction reuse of integer ALU and load instructions in network processing applications. Specifically, this paper attempts to answer the following questions: (1) How much of instruction reuse is inherent in network processing applications?, (2) Can reuse be improved by reducing interference in the reuse buffer?, (3) What characteristics of network applications can be exploited to improve reuse?, and (4) What is the effect of reuse on resource contention and memory accesses? We propose an aggregation scheme that combines the high-level concept of network traffic i.e. "flows" with a low level microarchitectural feature of programs i.e. repetition of instructions and data along with an architecture that exploits temporal locality in incoming packet data to improve reuse. We find that for the benchmarks considered, 1% to 50% of instructions are reused while the speedup achieved varies between 1% and 24%. As a side effect, instruction reuse reduces memory traffic and can therefore be considered as a scheme for low power.
Resumo:
We consider one-dimensional random walks in random environment which are transient to the right. Our main interest is in the study of the sub-ballistic regime, where at time n the particle is typically at a distance of order O(n (kappa) ) from the origin, kappa is an element of (0, 1). We investigate the probabilities of moderate deviations from this behaviour. Specifically, we are interested in quenched and annealed probabilities of slowdown (at time n, the particle is at a distance of order O (n (nu 0)) from the origin, nu(0) is an element of (0, kappa)), and speedup (at time n, the particle is at a distance of order n (nu 1) from the origin , nu(1) is an element of (kappa, 1)), for the current location of the particle and for the hitting times. Also, we study probabilities of backtracking: at time n, the particle is located around (-n (nu) ), thus making an unusual excursion to the left. For the slowdown, our results are valid in the ballistic case as well.
Resumo:
Although cluster environments have an enormous potential processing power, real applications that take advantage of this power remain an elusive goal. This is due, in part, to the lack of understanding about the characteristics of the applications best suited for these environments. This paper focuses on Master/Slave applications for large heterogeneous clusters. It defines application, cluster and execution models to derive an analytic expression for the execution time. It defines speedup and derives speedup bounds based on the inherent parallelism of the application and the aggregated computing power of the cluster. The paper derives an analytical expression for efficiency and uses it to define scalability of the algorithm-cluster combination based on the isoefficiency metric. Furthermore, the paper establishes necessary and sufficient conditions for an algorithm-cluster combination to be scalable which are easy to verify and use in practice. Finally, it covers the impact of network contention as the number of processors grow. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Virtual platforms are of paramount importance for design space exploration and their usage in early software development and verification is crucial. In particular, enabling accurate and fast simulation is specially useful, but such features are usually conflicting and tradeoffs have to be made. In this paper we describe how we integrated TLM communication mechanisms into a state-of-the-art, cycle-accurate, MPSoC simulation platform. More specifically, we show how we adapted ArchC fast functional instruction set simulators to the MPARM platform in order to achieve both fast simulation speed and accuracy. Our implementation led to a much faster hybrid platform, reaching speedups of up to 2.9 and 2.1x on average with negligible impact on power estimation accuracy (average 3.26% and 2.25% of standard deviation). © 2011 IEEE.
Resumo:
In general, pattern recognition techniques require a high computational burden for learning the discriminating functions that are responsible to separate samples from distinct classes. As such, there are several studies that make effort to employ machine learning algorithms in the context of big data classification problems. The research on this area ranges from Graphics Processing Units-based implementations to mathematical optimizations, being the main drawback of the former approaches to be dependent on the graphic video card. Here, we propose an architecture-independent optimization approach for the optimum-path forest (OPF) classifier, that is designed using a theoretical formulation that relates the minimum spanning tree with the minimum spanning forest generated by the OPF over the training dataset. The experiments have shown that the approach proposed can be faster than the traditional one in five public datasets, being also as accurate as the original OPF. (C) 2014 Elsevier B. V. All rights reserved.
Resumo:
We use interferometric synthetic aperture radar observations recorded in a land-terminating sector of western Greenland to characterise the ice sheet surface hydrology and to quantify spatial variations in the seasonality of ice sheet flow. Our data reveal a non-uniform pattern of late-summer ice speedup that, in places, extends over 100 km inland. We show that the degree of late-summer speedup is positively correlated with modelled runoff within the 10 glacier catchments of our survey, and that the pattern of late-summer speedup follows that of water routed at the ice sheet surface. In late-summer, ice within the largest catchment flows on average 48% faster than during winter, whereas changes in smaller catchments are less pronounced. Our observations show that the routing of seasonal runoff at the ice sheet surface plays an important role in shaping the magnitude and extent of seasonal ice sheet speedup.
Resumo:
A unified solution framework is presented for one-, two- or three-dimensional complex non-symmetric eigenvalue problems, respectively governing linear modal instability of incompressible fluid flows in rectangular domains having two, one or no homogeneous spatial directions. The solution algorithm is based on subspace iteration in which the spatial discretization matrix is formed, stored and inverted serially. Results delivered by spectral collocation based on the Chebyshev-Gauss-Lobatto (CGL) points and a suite of high-order finite-difference methods comprising the previously employed for this type of work Dispersion-Relation-Preserving (DRP) and Padé finite-difference schemes, as well as the Summationby- parts (SBP) and the new high-order finite-difference scheme of order q (FD-q) have been compared from the point of view of accuracy and efficiency in standard validation cases of temporal local and BiGlobal linear instability. The FD-q method has been found to significantly outperform all other finite difference schemes in solving classic linear local, BiGlobal, and TriGlobal eigenvalue problems, as regards both memory and CPU time requirements. Results shown in the present study disprove the paradigm that spectral methods are superior to finite difference methods in terms of computational cost, at equal accuracy, FD-q spatial discretization delivering a speedup of ð (10 4). Consequently, accurate solutions of the three-dimensional (TriGlobal) eigenvalue problems may be solved on typical desktop computers with modest computational effort.
Resumo:
Originally presented as the author's thesis, University of Illinois at Urbana-Champaign.
Resumo:
Vita.
Resumo:
Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and P4 classes of microprocessors. By fully exploiting SSE, parallel algorithms can be implemented on a standard personal computer and a theoretical speedup of four can be achieved. In this paper, we demonstrate the implementation of a parallel LU matrix decomposition algorithm for solving power systems network equations with SSE and discuss advantages and disadvantages of this approach.
Resumo:
Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III class of microprocessors. By fully exploiting SSE, parallel algorithms can be implemented on a standard personal computer and a theoretical speedup of four can be achieved. In this paper, we demonstrate the implementation of a parallel LU matrix decomposition algorithm for solving power systems network equations with SSE and discuss advantages and disadvantages of this approach.
Resumo:
Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and IV classes of microprocessors. By fully exploiting SSE, parallel algorithms can be implemented on a standard personal computer and a theoretical speedup of four can be achieved. In this paper, we demonstrate the implementation of a parallel LU matrix decomposition algorithm for solving linear systems with SSE and discuss advantages and disadvantages of this approach based on our experimental study.
Resumo:
Symmetric multi-processor (SMP) systems, or multiple-CPU servers, are suitable for implementing parallel algorithms because they employ dedicated communication devices to enhance the inter-processor communication bandwidth, so that a better performance can be obtained. However, the cost for a multiple-CPU server is high and therefore, the server is usually shared among many users. The work-load due to other users will certainly affect the performance of the parallel programs so it is desirable to derive a method to optimize parallel programs under different loading conditions. In this paper, we present a simple method, which can be applied in SPMD type parallel programs, to improve the speedup by controlling the number of threads within the programs.
Resumo:
Abstract Computer simulation is a versatile and commonly used tool for the design and evaluation of systems with different degrees of complexity. Power distribution systems and electric railway network are areas for which computer simulations are being heavily applied. A dominant factor in evaluating the performance of a software simulator is its processing time, especially in the cases of real-time simulation. Parallel processing provides a viable mean to reduce the computing time and is therefore suitable for building real-time simulators. In this paper, we present different issues related to solving the power distribution system with parallel computing based on a multiple-CPU server and we will concentrate, in particular, on the speedup performance of such an approach.