103 resultados para Speedup


Relevância:

20.00% 20.00%

Publicador:

Resumo:

El present treball fa un anàlisi i desenvolupament sobre les millores en la velocitat i en l’escalabilitat d'un simulador distribuït de grups de peixos. Aquests resultats s’han obtingut fent servir una nova estratègia de comunicació per als processos lògics (LPs) i canvis en l'algoritme de selecció de veïns que s'aplica a cadascun dels peixos en cada pas de simulació. L’idea proposada permet que cada procés lògic anticipi futures necessitats de dades pels seus veïns reduint el temps de comunicació al limitar la quantitat de missatges intercanviats entre els LPs. El nou algoritme de selecció dels veïns es va desenvolupar amb l'objectiu d'evitar treball innecessari permetent la disminució de les instruccions executades en cada pas de simulació i per cadascun del peixos simulats reduint de forma significativa el temps de simulació.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider one-dimensional random walks in random environment which are transient to the right. Our main interest is in the study of the sub-ballistic regime, where at time n the particle is typically at a distance of order O(n (kappa) ) from the origin, kappa is an element of (0, 1). We investigate the probabilities of moderate deviations from this behaviour. Specifically, we are interested in quenched and annealed probabilities of slowdown (at time n, the particle is at a distance of order O (n (nu 0)) from the origin, nu(0) is an element of (0, kappa)), and speedup (at time n, the particle is at a distance of order n (nu 1) from the origin , nu(1) is an element of (kappa, 1)), for the current location of the particle and for the hitting times. Also, we study probabilities of backtracking: at time n, the particle is located around (-n (nu) ), thus making an unusual excursion to the left. For the slowdown, our results are valid in the ballistic case as well.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although cluster environments have an enormous potential processing power, real applications that take advantage of this power remain an elusive goal. This is due, in part, to the lack of understanding about the characteristics of the applications best suited for these environments. This paper focuses on Master/Slave applications for large heterogeneous clusters. It defines application, cluster and execution models to derive an analytic expression for the execution time. It defines speedup and derives speedup bounds based on the inherent parallelism of the application and the aggregated computing power of the cluster. The paper derives an analytical expression for efficiency and uses it to define scalability of the algorithm-cluster combination based on the isoefficiency metric. Furthermore, the paper establishes necessary and sufficient conditions for an algorithm-cluster combination to be scalable which are easy to verify and use in practice. Finally, it covers the impact of network contention as the number of processors grow. (C) 2007 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Virtual platforms are of paramount importance for design space exploration and their usage in early software development and verification is crucial. In particular, enabling accurate and fast simulation is specially useful, but such features are usually conflicting and tradeoffs have to be made. In this paper we describe how we integrated TLM communication mechanisms into a state-of-the-art, cycle-accurate, MPSoC simulation platform. More specifically, we show how we adapted ArchC fast functional instruction set simulators to the MPARM platform in order to achieve both fast simulation speed and accuracy. Our implementation led to a much faster hybrid platform, reaching speedups of up to 2.9 and 2.1x on average with negligible impact on power estimation accuracy (average 3.26% and 2.25% of standard deviation). © 2011 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In general, pattern recognition techniques require a high computational burden for learning the discriminating functions that are responsible to separate samples from distinct classes. As such, there are several studies that make effort to employ machine learning algorithms in the context of big data classification problems. The research on this area ranges from Graphics Processing Units-based implementations to mathematical optimizations, being the main drawback of the former approaches to be dependent on the graphic video card. Here, we propose an architecture-independent optimization approach for the optimum-path forest (OPF) classifier, that is designed using a theoretical formulation that relates the minimum spanning tree with the minimum spanning forest generated by the OPF over the training dataset. The experiments have shown that the approach proposed can be faster than the traditional one in five public datasets, being also as accurate as the original OPF. (C) 2014 Elsevier B. V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We use interferometric synthetic aperture radar observations recorded in a land-terminating sector of western Greenland to characterise the ice sheet surface hydrology and to quantify spatial variations in the seasonality of ice sheet flow. Our data reveal a non-uniform pattern of late-summer ice speedup that, in places, extends over 100 km inland. We show that the degree of late-summer speedup is positively correlated with modelled runoff within the 10 glacier catchments of our survey, and that the pattern of late-summer speedup follows that of water routed at the ice sheet surface. In late-summer, ice within the largest catchment flows on average 48% faster than during winter, whereas changes in smaller catchments are less pronounced. Our observations show that the routing of seasonal runoff at the ice sheet surface plays an important role in shaping the magnitude and extent of seasonal ice sheet speedup.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A unified solution framework is presented for one-, two- or three-dimensional complex non-symmetric eigenvalue problems, respectively governing linear modal instability of incompressible fluid flows in rectangular domains having two, one or no homogeneous spatial directions. The solution algorithm is based on subspace iteration in which the spatial discretization matrix is formed, stored and inverted serially. Results delivered by spectral collocation based on the Chebyshev-Gauss-Lobatto (CGL) points and a suite of high-order finite-difference methods comprising the previously employed for this type of work Dispersion-Relation-Preserving (DRP) and Padé finite-difference schemes, as well as the Summationby- parts (SBP) and the new high-order finite-difference scheme of order q (FD-q) have been compared from the point of view of accuracy and efficiency in standard validation cases of temporal local and BiGlobal linear instability. The FD-q method has been found to significantly outperform all other finite difference schemes in solving classic linear local, BiGlobal, and TriGlobal eigenvalue problems, as regards both memory and CPU time requirements. Results shown in the present study disprove the paradigm that spectral methods are superior to finite difference methods in terms of computational cost, at equal accuracy, FD-q spatial discretization delivering a speedup of ð (10 4). Consequently, accurate solutions of the three-dimensional (TriGlobal) eigenvalue problems may be solved on typical desktop computers with modest computational effort.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Originally presented as the author's thesis, University of Illinois at Urbana-Champaign.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Vita.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The existence of quantum correlation (as revealed by quantum discord), other than entanglement and its role in quantum-information processing (QIP), is a current subject for discussion. In particular, it has been suggested that this nonclassical correlation may provide computational speedup for some quantum algorithms. In this regard, bulk nuclear magnetic resonance (NMR) has been successfully used as a test bench for many QIP implementations, although it has also been continuously criticized for not presenting entanglement in most of the systems used so far. In this paper, we report a theoretical and experimental study on the dynamics of quantum and classical correlations in an NMR quadrupolar system. We present a method for computing the correlations from experimental NMR deviation-density matrices and show that, given the action of the nuclear-spin environment, the relaxation produces a monotonic time decay in the correlations. Although the experimental realizations were performed in a specific quadrupolar system, the main results presented here can be applied to whichever system uses a deviation-density matrix formalism.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. We show in this paper that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coset enumeration is a most important procedure for investigating finitely presented groups. We present a practical parallel procedure for coset enumeration on shared memory processors. The shared memory architecture is particularly interesting because such parallel computation is both faster and cheaper. The lower cost comes when the program requires large amounts of memory, and additional CPU's. allow us to lower the time that the expensive memory is being used. Rather than report on a suite of test cases, we take a single, typical case, and analyze the performance factors in-depth. The parallelization is achieved through a master-slave architecture. This results in an interesting phenomenon, whereby the CPU time is divided into a sequential and a parallel portion, and the parallel part demonstrates a speedup that is linear in the number of processors. We describe an early version for which only 40% of the program was parallelized, and we describe how this was modified to achieve 90% parallelization while using 15 slave processors and a master. In the latter case, a sequential time of 158 seconds was reduced to 29 seconds using 15 slaves.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The QU-GENE Computing Cluster (QCC) is a hardware and software solution to the automation and speedup of large QU-GENE (QUantitative GENEtics) simulation experiments that are designed to examine the properties of genetic models, particularly those that involve factorial combinations of treatment levels. QCC automates the management of the distribution of components of the simulation experiments among the networked single-processor computers to achieve the speedup.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Esta pesquisa est?? inserida na tem??tica da descentraliza????o da gest??o ambiental no Brasil e trata de analisar os principais motivos que t??m levado os munic??pios a institucionalizarem a ??rea ambiental no ??mbito local. Para tanto, foram selecionados munic??pios da regi??o sul catarinense, tendo em vista o processo recente de cria????o de ??rg??os locais de gest??o ambiental e tamb??m a peculiaridade de a grande maioria desses munic??pios estarem optando pelo estabelecimento de funda????es p??blicas municipais de meio ambiente. Como resultado da investiga????o, se percebeu que um dos fatores principais para a cria????o de organiza????es municipais de gest??o ambiental tem sido a necessidade de maior agilidade nos processos de licenciamento ambiental. Em rela????o ?? op????o pela figura jur??dica das funda????es, o principal argumento dos munic??pios tem sido a maior autonomia e independ??ncia em rela????o ao Executivo municipal para a execu????o de suas a????es.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).