43 resultados para Parallel programming (computer)

em Universidad Politécnica de Madrid


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Although studies of a number of parallel implementations of logic programming languages are now available, their results are difficult to interpret due to the multiplicity of factors involved, the effect of each of which is difficult to seprate. In this paper we present the results of a high-level simulation study of or- and independent and-parallelism with a wide selection of Prolog programs that aims to determine the intrinsic amount of parallelism, independently of implementation factors, thus facilitating this separation. We expect this study will be instrumental in better understanding and comparing results from actual implementations, as shown by some examples provided in the paper. In addition, the paper examines some of the issues and tradeoffs associated with the combination of and- and or-parallelism and proposes reasonable solutions based on the simulation data obtained.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A raz de la aparicin de los procesadores dotados de varios cores, la programacin paralela, un concepto que, por otra parte no era nada nuevo y se conoca desde hace dcadas, sufri un nuevo impulso, pues se crea que se poda superar el techo tecnolgico que haba estado limitando el rendimiento de esta programacin durante aos. Este impulso se ha ido manteniendo hasta la actualidad, movido por la necesidad de sistemas cada vez ms potentes y gracias al abaratamiento de los costes de fabricacin. Esta tendencia ha motivado la aparicin de nuevo software y lenguajes con componentes orientados precisamente al campo de la programacin paralela. Este es el caso del lenguaje Go, desarrollado por Google y lanzado en 2009. Este lenguaje se basa en modelos de concurrencia que lo hacen muy adecuados para abordar desarrollos de naturaleza paralela. Sin embargo, la programacin paralela es un campo complejo y heterogneo, y los programadores son reticentes a utilizar herramientas nuevas, en beneficio de aquellas que ya conocen y les son familiares. Un buen ejemplo son aquellas implementaciones de lenguajes conocidos, pero orientadas a programacin paralela, y que siguen las directrices de un estndar ampliamente reconocido y aceptado. Este es el caso del estndar OpenMP, un Interfaz de Programacin de Aplicaciones (API) flexible, portable y escalable, orientado a la programacin paralela multiproceso en arquitecturas multi-core o multinucleo. Dicho estndar posee actualmente implementaciones en los lenguajes C, C++ y Fortran. Este proyecto nace como un intento de aunar ambos conceptos: un lenguaje emergente con interesantes posibilidades en el campo de la programacin paralela, y un estndar reputado y ampliamente extendido, con el que los programadores se encuentran familiarizados. El objetivo principal es el desarrollo de un conjunto de libreras del sistema (que engloben directivas de compilacin o pragmas, libreras de ejecucin y variables de entorno), soportadas por las caractersticas y los modelos de concurrencia propios de Go; y que aadan funcionalidades propias del estndar OpenMP. La idea es aadir funcionalidades que permitan programar en lenguaje Go utilizando la sintaxis que OpenMP proporciona para otros lenguajes, como Fortan y C/C++ (concretamente, similar a esta ltima), y, de esta forma, dotar al usuario de Go de herramientas para programar estructuras de procesamiento paralelo de forma sencilla y transparente, de la misma manera que lo hara utilizando C/C++.---ABSTRACT---As a result of the appearance of processors equipped with multiple "cores ", parallel programming, a concept which, moreover, it was not new and it was known for decades, suffered a new impulse, because it was believed they could overcome the technological ceiling had been limiting the performance of this program for years. This impulse has been maintained until today, driven by the need for ever more powerful systems and thanks to the decrease in manufacturing costs. This trend has led to the emergence of new software and languages with components guided specifically to the field of parallel programming. This is the case of Go language, developed by Google and released in 2009. This language is based on concurrency models that make it well suited to tackle developments in parallel nature. However, parallel programming is a complex and heterogeneous field, and developers are reluctant to use new tools to benefit those who already know and are familiar. A good example are those implementations from well-known languages, but parallel programming oriented, and witch follow the guidelines of a standard widely recognized and accepted. This is the case of the OpenMP standard, an application programming interface (API), flexible, portable and scalable, parallel programming oriented, and designed for multi-core architectures. This standard currently has implementations in C, C ++ and Fortran. This project was born as an attempt to combine two concepts: an emerging language, with interesting possibilities in the field of parallel programming, and a reputed and widespread standard, with which programmers are familiar with. The main objective is to develop a set of system libraries (which includes compiler directives or pragmas, runtime libraries and environment variables), supported by the characteristics and concurrency patterns of Go; and that add custom features from the OpenMP standard. The idea is to add features that allow programming in Go language using the syntax OpenMP provides for other languages, like Fortran and C / C ++ (specifically, similar to the latter ), and, in this way, provide Go users with tools for programming parallel structures easily and, in the same way they would using C / C ++.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Compilation techniques such as those portrayed by the Warren Abstract Machine(WAM) have greatly improved the speed of execution of logic programs. The research presented herein is geared towards providing additional performance to logic programs through the use of parallelism, while preserving the conventional semantics of logic languages. Two reas to which special attention is given are the preservation of sequential performance and storage efficiency, and the use of low overhead mechanisms for controlling parallel execution. Accordingly, the techniques used for supporting parallelism are efficient extensions of those which have brought high inferencing speeds to sequential implementations. At a lower level, special attention is also given to design and simulation detail and to the architectural implications of the execution model behavior. This paper offers an overview of the basic concepts and techniques used in the parallel design, simulation tools used, and some of the results obtained to date.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved since the advent of the first interpreters. However, higher inference speeds are still required in order to meet the demands of applications such as those contemplated for next generation computer systems. The execution of logic programs in parallel is currently considered a promising strategy for attaining such inference speeds. Logic Programming in turn appears as a suitable programming paradigm for parallel architectures because of the many opportunities for parallel execution present in the implementation of logic programs. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an "Abstract Machine" level suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and therefore the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-Parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set. This design is based on extending to a parallel environment the techniques introduced by the Warren Abstract Machine, which have already made very fast and space efficient sequential systems a reality. Therefore, the model herein presented is capable of retaining sequential execution speed similar to that of high performance sequential systems, while extracting additional gains in speed by efficiently implementing parallel execution. These claims are supported by simulations of the Abstract Machine on sample programs.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the mximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a technique to estimate accurate speedups for parallel logic programs with relative independence from characteristics of a given implementation or underlying parallel hardware. The proposed technique is based on gathering accurate data describing one execution at run-time, which is fed to a simulator. Alternative schedulings are then simulated and estimates computed for the corresponding speedups. A tool implementing the aforementioned techniques is presented, and its predictions are compared to the performance of real systems, showing good correlation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Incorporating the possibility of attaching attributes to variables in a logic programming system has been shown to allow the addition of general constraint solving capabilities to it. This approach is very attractive in that by adding a few primitives any logic programming system can be turned into a generic constraint logic programming system in which constraint solving can be user dened, and at source level - an extreme example of the "glass box" approach. In this paper we propose a different and novel use for the concept of attributed variables: developing a generic parallel/concurrent (constraint) logic programming system, using the same "glass box" flavor. We arge that a system which implements attributed variables and a few additional primitives can be easily customized at source level to implement many of the languages and execution models of parallelism and concurrency currently proposed, in both shared memory and distributed systems. We illustrate this through examples and report on an implementation of our ideas.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We propose a computational methodology -"B-LOG"-, which offers the potential for an effective implementation of Logic Programming in a parallel computer. We also propose a weighting scheme to guide the search process through the graph and we apply the concepts of parallel "branch and bound" algorithms in order to perform a "best-first" search using an information theoretic bound. The concept of "session" is used to speed up the search process in a succession of similar queries. Within a session, we strongly modify the bounds in a local database, while bounds kept in a global database are weakly modified to provide a better initial condition for other sessions. We also propose an implementation scheme based on a database machine using "semantic paging", and the "B-LOG processor" based on a scoreboard driven controller.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Incorporating the possibility of attaching attributes to variables in a logic programming system has been shown to allow the addition of general constraint solving capabilities to it. This approach is very attractive in that by adding a few primitives any logic programming system can be turned into a generic constraint logic programming system in which constraint solving can be user defined, and at source level - an extreme example of the "glass box" approach. In this paper we propose a different and novel use for the concept of attributed variables: developing a generic parallel/concurrent (constraint) logic programming system, using the same "glass box" flavor. We arge that a system which implements attributed variables and a few additional primitives can be easily customized at source level to implement many of the languages and execution models of parallelism and concurrency currently proposed, in both shared memory and distributed systems. We illustrate this through examples.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Computer programming is known to be one of the most difficult courses for students in the first year of engineering. They are faced with the challenge of abstract thinking and gaining programming skills for the first time. These skills are acquired by continuous practicing from the start of the course. In order to enhance the motivation and dynamism of the learning and assessment processes, we have proposed the use of three educational resources namely screencasts, self-assessment questionnaires and automated grading of assignments. These resources have been made available in Moodle which is a Learning Management System widely used in education environments and adopted by the Telecommunications Engineering School at the Universidad Politcnica de Madrid (UPM). Both teachers and students can enhance the learning and assessment processes through the use of new educational activities such as self-assessment questionnaires and automated grading of assignments. On the other hand, multimedia resources such as screencasts can guide students in complex topics. The resources proposed allow teachers to improve their tutorial actions since they provide immediate feedback and comments to students without the enormous effort of manual correction and evaluation by teachers specially taking into account the large number of students enrolled in the course. In this paper we present the case study where three proposed educational resources were applied. We describe the special features of the course and explain why the use of these resources can both enhance the students? motivation and improve the teaching and learning processes. Our research work was carried out on students attending the "Computer programming" course offered in the first year of a Telecommunications Engineering degree at UPM. This course is mandatory and has more than 450 enrolled students. Our purpose is to encourage the motivation and dynamism of the learning and assessment processes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

La evolucin de los telfonos mviles inteligentes, dotados de cmaras digitales, est provocando una creciente demanda de aplicaciones cada vez ms complejas que necesitan algoritmos de visin artificial en tiempo real; puesto que el tamao de las seales de vdeo no hace sino aumentar y en cambio el rendimiento de los procesadores de un solo ncleo se ha estancado, los nuevos algoritmos que se diseen para visin artificial han de ser paralelos para poder ejecutarse en mltiples procesadores y ser computacionalmente escalables. Una de las clases de procesadores ms interesantes en la actualidad se encuentra en las tarjetas grficas (GPU), que son dispositivos que ofrecen un alto grado de paralelismo, un excelente rendimiento numrico y una creciente versatilidad, lo que los hace interesantes para llevar a cabo computacin cientfica. En esta tesis se exploran dos aplicaciones de visin artificial que revisten una gran complejidad computacional y no pueden ser ejecutadas en tiempo real empleando procesadores tradicionales. En cambio, como se demuestra en esta tesis, la paralelizacin de las distintas subtareas y su implementacin sobre una GPU arrojan los resultados deseados de ejecucin con tasas de refresco interactivas. Asimismo, se propone una tcnica para la evaluacin rpida de funciones de complejidad arbitraria especialmente indicada para su uso en una GPU. En primer lugar se estudia la aplicacin de tcnicas de sntesis de imgenes virtuales a partir de nicamente dos cmaras lejanas y no paralelasen contraste con la configuracin habitual en TV 3D de cmaras cercanas y paralelascon informacin de color y profundidad. Empleando filtros de mediana modificados para la elaboracin de un mapa de profundidad virtual y proyecciones inversas, se comprueba que estas tcnicas son adecuadas para una libre eleccin del punto de vista. Adems, se demuestra que la codificacin de la informacin de profundidad con respecto a un sistema de referencia global es sumamente perjudicial y debera ser evitada. Por otro lado se propone un sistema de deteccin de objetos mviles basado en tcnicas de estimacin de densidad con funciones locales. Este tipo de tcnicas es muy adecuada para el modelado de escenas complejas con fondos multimodales, pero ha recibido poco uso debido a su gran complejidad computacional. El sistema propuesto, implementado en tiempo real sobre una GPU, incluye propuestas para la estimacin dinmica de los anchos de banda de las funciones locales, actualizacin selectiva del modelo de fondo, actualizacin de la posicin de las muestras de referencia del modelo de primer plano empleando un filtro de partculas multirregin y seleccin automtica de regiones de inters para reducir el coste computacional. Los resultados, evaluados sobre diversas bases de datos y comparados con otros algoritmos del estado del arte, demuestran la gran versatilidad y calidad de la propuesta. Finalmente se propone un mtodo para la aproximacin de funciones arbitrarias empleando funciones continuas lineales a tramos, especialmente indicada para su implementacin en una GPU mediante el uso de las unidades de filtraje de texturas, normalmente no utilizadas para cmputo numrico. La propuesta incluye un riguroso anlisis matemtico del error cometido en la aproximacin en funcin del nmero de muestras empleadas, as como un mtodo para la obtencin de una particin cuasiptima del dominio de la funcin para minimizar el error. ABSTRACT The evolution of smartphones, all equipped with digital cameras, is driving a growing demand for ever more complex applications that need to rely on real-time computer vision algorithms. However, video signals are only increasing in size, whereas the performance of single-core processors has somewhat stagnated in the past few years. Consequently, new computer vision algorithms will need to be parallel to run on multiple processors and be computationally scalable. One of the most promising classes of processors nowadays can be found in graphics processing units (GPU). These are devices offering a high parallelism degree, excellent numerical performance and increasing versatility, which makes them interesting to run scientific computations. In this thesis, we explore two computer vision applications with a high computational complexity that precludes them from running in real time on traditional uniprocessors. However, we show that by parallelizing subtasks and implementing them on a GPU, both applications attain their goals of running at interactive frame rates. In addition, we propose a technique for fast evaluation of arbitrarily complex functions, specially designed for GPU implementation. First, we explore the application of depth-imagebased rendering techniques to the unusual configuration of two convergent, wide baseline cameras, in contrast to the usual configuration used in 3D TV, which are narrow baseline, parallel cameras. By using a backward mapping approach with a depth inpainting scheme based on median filters, we show that these techniques are adequate for free viewpoint video applications. In addition, we show that referring depth information to a global reference system is ill-advised and should be avoided. Then, we propose a background subtraction system based on kernel density estimation techniques. These techniques are very adequate for modelling complex scenes featuring multimodal backgrounds, but have not been so popular due to their huge computational and memory complexity. The proposed system, implemented in real time on a GPU, features novel proposals for dynamic kernel bandwidth estimation for the background model, selective update of the background model, update of the position of reference samples of the foreground model using a multi-region particle filter, and automatic selection of regions of interest to reduce computational cost. The results, evaluated on several databases and compared to other state-of-the-art algorithms, demonstrate the high quality and versatility of our proposal. Finally, we propose a general method for the approximation of arbitrarily complex functions using continuous piecewise linear functions, specially formulated for GPU implementation by leveraging their texture filtering units, normally unused for numerical computation. Our proposal features a rigorous mathematical analysis of the approximation error in function of the number of samples, as well as a method to obtain a suboptimal partition of the domain of the function to minimize approximation error.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Since the early days of logic programming, researchers in the field realized the potential for exploitation of parallelism present in the execution of logic programs. Their high-level nature, the presence of nondeterminism, and their referential transparency, among other characteristics, make logic programs interesting candidates for obtaining speedups through parallel execution. At the same time, the fact that the typical applications of logic programming frequently involve irregular computations, make heavy use of dynamic data structures with logical variables, and involve search and speculation, makes the techniques used in the corresponding parallelizing compilers and run-time systems potentially interesting even outside the field. The objective of this article is to provide a comprehensive survey of the issues arising in parallel execution of logic programming languages along with the most relevant approaches explored to date in the field. Focus is mostly given to the challenges emerging from the parallel execution of Prolog programs. The article describes the major techniques used for shared memory implementation of Or-parallelism, And-parallelism, and combinations of the two. We also explore some related issues, such as memory management, compile-time analysis, and execution visualization.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes new improvements for BB-MaxClique (San Segundo et al. in Comput Oper Resour 38(2):571581, 2011 ), a leading maximum clique algorithm which uses bit strings to efficiently compute basic operations during search by bit masking. Improvements include a recently described recoloring strategy in Tomita et al. (Proceedings of the 4th International Workshop on Algorithms and Computation. Lecture Notes in Computer Science, vol 5942. Springer, Berlin, pp 191203, 2010 ), which is now integrated in the bit string framework, as well as different optimization strategies for fast bit scanning. Reported results over DIMACS and random graphs show that the new variants improve over previous BB-MaxClique for a vast majority of cases. It is also established that recoloring is mainly useful for graphs with high densities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Irregular computations pose sorne of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures, which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. Starting in the mid 80s there has been significant progress in the development of parallelizing compilers for logic programming (and more recently, constraint programming) resulting in quite capable parallelizers. The typical applications of these paradigms frequently involve irregular computations, and make heavy use of dynamic data structures with pointers, since logical variables represent in practice a well-behaved form of pointers. This arguably makes the techniques used in these compilers potentially interesting. In this paper, we introduce in a tutoral way, sorne of the problems faced by parallelizing compilers for logic and constraint programs and provide pointers to sorne of the significant progress made in the area. In particular, this work has resulted in a series of achievements in the areas of inter-procedural pointer aliasing analysis for independence detection, cost models and cost analysis, cactus-stack memory management, techniques for managing speculative and irregular computations through task granularity control and dynamic task allocation such as work-stealing schedulers), etc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present the design and implementation of the and-parallel component of ACE. ACE is a computational model for the full Prolog language that simultaneously exploits both or-parallelism and independent and-parallelism. A high performance implementation of the ACE model has been realized and its performance reported in this paper. We discuss how some of the standard problems which appear when implementing and-parallel systems are solved in ACE. We then propose a number of optimizations aimed at reducing the overheads and the increased memory consumption which occur in such systems when using previously proposed solutions. Finally, we present results from an implementation of ACE which includes the optimizations proposed. The results show that ACE exploits and-parallelism with high efficiency and high speedups. Furthermore, they also show that the proposed optimizations, which are applicable to many other and-parallel systems, significantly decrease memory consumption and increase speedups and absolute performance both in forwards execution and during backtracking.