Biblioteca Digital

836 resultados para Overhead conductors

A Novel Scalable Deblocking-Filter Architecture for H.264/AVC and SVC Video Codecs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A highly parallel and scalable Deblocking Filter (DF) hardware architecture for H.264/AVC and SVC video codecs is presented in this paper. The proposed architecture mainly consists on a coarse grain systolic array obtained by replicating a unique and homogeneous Functional Unit (FU), in which a whole Deblocking-Filter unit is implemented. The proposal is also based on a novel macroblock-level parallelization strategy of the filtering algorithm which improves the final performance by exploiting specific data dependences. This way communication overhead is reduced and a more intensive parallelism in comparison with the existing state-of-the-art solutions is obtained. Furthermore, the architecture is completely flexible, since the level of parallelism can be changed, according to the application requirements. The design has been implemented in a Virtex-5 FPGA, and it allows filtering 4CIF (704 × 576 pixels @30 fps) video sequences in real-time at frequencies lower than 10.16 Mhz.

A methodology for granularity-based control of parallelism in logic programs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also,a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with medium to large parallel execution overheads.

Scalable software architecture for on-line multi-camera video processing

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade oﬀ between computational power, scalability and ﬂexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to eﬃciently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under diﬀerent load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with diﬀerent image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead

Designing a high performance parallel logic programming system

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Compilation techniques such as those portrayed by the Warren Abstract Machine(WAM) have greatly improved the speed of execution of logic programs. The research presented herein is geared towards providing additional performance to logic programs through the use of parallelism, while preserving the conventional semantics of logic languages. Two áreas to which special attention is given are the preservation of sequential performance and storage efficiency, and the use of low overhead mechanisms for controlling parallel execution. Accordingly, the techniques used for supporting parallelism are efficient extensions of those which have brought high inferencing speeds to sequential implementations. At a lower level, special attention is also given to design and simulation detail and to the architectural implications of the execution model behavior. This paper offers an overview of the basic concepts and techniques used in the parallel design, simulation tools used, and some of the results obtained to date.

Efficient local unfolding with ancestor stacks

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The most successful unfolding rules used nowadays in the partial evaluation of logic programs are based on well quasi orders (wqo) applied over (covering) ancestors, i.e., a subsequence of the atoms selected during a derivation. Ancestor (sub)sequences are used to increase the specialization power of unfolding while still guaranteeing termination and also to reduce the number of atoms for which the wqo has to be checked. Unfortunately, maintaining the structure of the ancestor relation during unfolding introduces significant overhead. We propose an efficient, practical local unfolding rule based on the notion of covering ancestors which can be used in combination with a wqo and allows a stack-based implementation without losing any opportunities for specialization. Using our technique, certain non-leftmost unfoldings are allowed as long as local unfolding is performed, i.e., we cover depth-first strategies. To deal with practical programs, we propose assertion-based techniques which allow our approach to treat programs that include (Prolog) built-ins and external predicates in a very extensible manner, for the case of leftmost unfolding. Finally, we report on our mplementation of these techniques embedded in a practical partial evaluator, which shows that our techniques, in addition to dealing with practical programs, are also significantly more efficient in time and somewhat more efficient in memory than traditional tree-based implementations. To appear in Theory and Practice of Logic Programming (TPLP).

Global flow analysis as a practical compilation tool

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper addresses the issue of the practicality of global flow analysis in logic program compilation, in terms of speed of the analysis, precisión, and usefulness of the information obtained. To this end, design and implementation aspects are discussed for two practical abstract interpretation-based flow analysis systems: MA , the MCC And-parallel Analyzer and Annotator; and Ms, an experimental mode inference system developed for SB-Prolog. The paper also provides performance data obtained (rom these implementations and, as an example of an application, a study of the usefulness of the mode information obtained in reducing run-time checks in independent and-parallelism.Based on the results obtained, it is concluded that the overhead of global flow analysis is not prohibitive, while the results of analysis can be quite precise and useful.

StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In recent years, applications in domains such as telecommunications, network security or large scale sensor networks showed the limits of the traditional store-then-process paradigm. In this context, Stream Processing Engines emerged as a candidate solution for all these applications demanding for high processing capacity with low processing latency guarantees. With Stream Processing Engines, data streams are not persisted but rather processed on the fly, producing results continuously. Current Stream Processing Engines, either centralized or distributed, do not scale with the input load due to single-node bottlenecks. Moreover, they are based on static configurations that lead to either under or over-provisioning. This Ph.D. thesis discusses StreamCloud, an elastic paralleldistributed stream processing engine that enables for processing of large data stream volumes. Stream- Cloud minimizes the distribution and parallelization overhead introducing novel techniques that split queries into parallel subqueries and allocate them to independent sets of nodes. Moreover, Stream- Cloud elastic and dynamic load balancing protocols enable for effective adjustment of resources depending on the incoming load. Together with the parallelization and elasticity techniques, Stream- Cloud defines a novel fault tolerance protocol that introduces minimal overhead while providing fast recovery. StreamCloud has been fully implemented and evaluated using several real word applications such as fraud detection applications or network analysis applications. The evaluation, conducted using a cluster with more than 300 cores, demonstrates the large scalability, the elasticity and fault tolerance effectiveness of StreamCloud. Resumen En los útimos años, aplicaciones en dominios tales como telecomunicaciones, seguridad de redes y redes de sensores de gran escala se han encontrado con múltiples limitaciones en el paradigma tradicional de bases de datos. En este contexto, los sistemas de procesamiento de flujos de datos han emergido como solución a estas aplicaciones que demandan una alta capacidad de procesamiento con una baja latencia. En los sistemas de procesamiento de flujos de datos, los datos no se persisten y luego se procesan, en su lugar los datos son procesados al vuelo en memoria produciendo resultados de forma continua. Los actuales sistemas de procesamiento de flujos de datos, tanto los centralizados, como los distribuidos, no escalan respecto a la carga de entrada del sistema debido a un cuello de botella producido por la concentración de flujos de datos completos en nodos individuales. Por otra parte, éstos están basados en configuraciones estáticas lo que conducen a un sobre o bajo aprovisionamiento. Esta tesis doctoral presenta StreamCloud, un sistema elástico paralelo-distribuido para el procesamiento de flujos de datos que es capaz de procesar grandes volúmenes de datos. StreamCloud minimiza el coste de distribución y paralelización por medio de una técnica novedosa la cual particiona las queries en subqueries paralelas repartiéndolas en subconjuntos de nodos independientes. Ademas, Stream- Cloud posee protocolos de elasticidad y equilibrado de carga que permiten una optimización de los recursos dependiendo de la carga del sistema. Unidos a los protocolos de paralelización y elasticidad, StreamCloud define un protocolo de tolerancia a fallos que introduce un coste mínimo mientras que proporciona una rápida recuperación. StreamCloud ha sido implementado y evaluado mediante varias aplicaciones del mundo real tales como aplicaciones de detección de fraude o aplicaciones de análisis del tráfico de red. La evaluación ha sido realizada en un cluster con más de 300 núcleos, demostrando la alta escalabilidad y la efectividad tanto de la elasticidad, como de la tolerancia a fallos de StreamCloud.

Integrating software testing and run-time checking in an assertion verification framework

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We have designed and implemented a framework that unifies unit testing and run-time verification (as well as static verification and static debugging). A key contribution of our approach is that a unified assertion language is used for all of these tasks. We first propose methods for compiling runtime checks for (parts of) assertions which cannot be verified at compile-time via program transformation. This transformation allows checking preconditions and postconditions, including conditional postconditions, properties at arbitrary program points, and certain computational properties. The implemented transformation includes several optimizations to reduce run-time overhead. We also propose a minimal addition to the assertion language which allows defining unit tests to be run in order to detect possible violations of the (partial) specifications expressed by the assertions. This language can express for example the input data for performing the unit tests or the number of times that the unit tests should be repeated. We have implemented the framework within the Ciao/CiaoPP system and effectively applied it to the verification of ISO-prolog compliance and to the detection of different types of bugs in the Ciao system source code. Several experimental results are presented that ¡Ilústrate different trade-offs among program size, running time, or levéis of verbosity of the messages shown to the user.

A syntactic approach to combining functional notation, lazy evaluation, and higher-order in LP systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nondeterminism and partially instantiated data structures give logic programming expressive power beyond that of functional programming. However, functional programming often provides convenient syntactic features, such as having a designated implicit output argument, which allow function cali nesting and sometimes results in more compact code. Functional programming also sometimes allows a more direct encoding of lazy evaluation, with its ability to deal with infinite data structures. We present a syntactic functional extensión, used in the Ciao system, which can be implemented in ISO-standard Prolog systems and covers function application, predefined evaluable functors, functional definitions, quoting, and lazy evaluation. The extensión is also composable with higher-order features and can be combined with other extensions to ISO-Prolog such as constraints. We also highlight the features of the Ciao system which help implementation and present some data on the overhead of using lazy evaluation with respect to eager evaluation.

Efficient local unfolding with ancestor stacks for full prolog

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The integration of powerful partial evaluation methods into practical compilers for logic programs is still far from reality. This is related both to 1) efficiency issues and to 2) the complications of dealing with practical programs. Regarding efnciency, the most successful unfolding rules used nowadays are based on structural orders applied over (covering) ancestors, i.e., a subsequence of the atoms selected during a derivation. Unfortunately, maintaining the structure of the ancestor relation during unfolding introduces significant overhead. We propose an efficient, practical local unfolding rule based on the notion of covering ancestors which can be used in combination with any structural order and allows a stack-based implementation without losing any opportunities for specialization. Regarding the second issue, we propose assertion-based techniques which allow our approach to deal with real programs that include (Prolog) built-ins and external predicates in a very extensible manner. Finally, we report on our implementation of these techniques in a practical partial evaluator, embedded in a state of the art compiler which uses global analysis extensively (the Ciao compiler and, specifically, its preprocessor CiaoPP). The performance analysis of the resulting system shows that our techniques, in addition to dealing with practical programs, are also significantly more efficient in time and somewhat more efficient in memory than traditional tree-based implementations.

Towards granularity based control of parallelism in logic programs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also, a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with médium to large parallel execution overheads.

Analyzing logic programs with dynamic scheduling

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional logic programming languages, such as Prolog, use a fixed left-to-right atom scheduling rule. Recent logic programming languages, however, usually provide more flexible scheduling in which computation generally proceeds leftto- right but in which some calis are dynamically "delayed" until their arguments are sufRciently instantiated to allow the cali to run efficiently. Such dynamic scheduling has a significant cost. We give a framework for the global analysis of logic programming languages with dynamic scheduling and show that program analysis based on this framework supports optimizations which remove much of the overhead of dynamic scheduling.

Task granularity analysis in logic programs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

While logic programming languages offer a great deal of scope for parallelism, there is usually some overhead associated with the execution of goals in parallel because of the work involved in task creation and scheduling. In practice, therefore, the "granularity" of a goal, i.e. an estimate of the work available under it, should be taken into account when deciding whether or not to execute a goal concurrently as a sepárate task. This paper describes a method for estimating the granularity of a goal at compile time. The runtime overhead associated with our approach is usually quite small, and the performance improvements resulting from the incorporation of grainsize control can be quite good. This is shown by means of experimental results.

The CDG, UDG, and MEL methods for automatic compile-time parallelization of logic programs for independent and-parallelism

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There has been significant interest in parallel execution models for logic programs which exploit Independent And-Parallelism (IAP). In these models, it is necessary to determine which goals are independent and therefore eligible for parallel execution and which goals have to wait for which others during execution. Although this can be done at run-time, it can imply a very heavy overhead. In this paper, we present three algorithms for automatic compiletime parallelization of logic programs using IAP. This is done by converting a clause into a graph-based computational form and then transforming this graph into linear expressions based on &-Prolog, a language for IAP. We also present an algorithm which, given a clause, determines if there is any loss of parallelism due to linearization, for the case in which only unconditional parallelism is desired. Finally, the performance of these annotation algorithms is discussed for some benchmark programs.

Complete and efficient methods for supporting side effects in independent/restricted and-parallelism

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It has been shown that it is possible to exploit Independent/Restricted And-parallelism in logic programs while retaining the conventional "don't know" semantics of such programs. In particular, it is possible to parallelize pure Prolog programs while maintaining the semantics of the language. However, when builtin side-effects (such as write or assert) appear in the program, if an identical observable behaviour to that of sequential Prolog implementations is to be preserved, such side-effects have to be properly sequenced. Previously proposed solutions to this problem are either incomplete (lacking, for example, backtracking semantics) or they force sequentialization of significant portions of the execution graph which could otherwise run in parallel. In this paper a series of side-effect synchronization methods are proposed which incur lower overhead and allow more parallelism than those previously proposed. Most importantly, and unlike previous proposals, they have well-defined backward execution behaviour and require only a small modification to a given (And-parallel) Prolog implementation.

«
1
2
...
39
40
41
42
43
44
45
...
55
56
»