997 resultados para structured parallel computations


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computers of a non-dedicated cluster are often idle (users attend meetings, have lunch or coffee breaks) or lightly loaded (users carry out simple computations). These underutilized computers can be employed to execute parallel applications not only during weekends and at nights but also during office hours. Thus, they have to be shared by parallel and sequential applications which could lead to the improvement of their execution performance. However, there is a lack of experimental study showing the behavior and performance of parallel and sequential applications executing concurrently on clusters. We present here the result of an experimental study into load balancing based scheduling of a mixture of parallel and sequential applications on a non-dedicated cluster.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computers of a non-dedicated cluster are often idle (users attend meetings, have lunch or coffee breaks) or lightly loaded (users carry out simple computations to support problem solving activities). These underutilised computers can be employed to execute parallel applications. Thus, these computers can be shared by parallel and sequential applications, which could lead to the improvement of their execution performance. However, there is a lack of experimental study showing the applications’ performance and the system utilization of executing parallel and sequential applications concurrently and concurrent execution of multiple parallel applications on a non-dedicated cluster. Here we present the result of an experimental study into load balancing based scheduling of mixtures of NAS Parallel Benchmarks and BYTE sequential applications on a very low cost non-dedicated cluster. This study showed that the proposed sharing provided performance boost as compared to the execution of the parallel load in isolation on a reduced number of computers and better cluster utilization. The results of this research were used not only to validate other researchers’ result generated by simulation but also to support our research mission of widening the use of non-dedicated clusters. Our promising results obtained could promote further research studies to convince universities, business and industry, which require a large amount of computing resources, to run parallel applications on their already owned non-dedicated clusters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Stable wakefulness requires orexin/hypocretin neurons (OHNs) and OHR2 receptors. OHNs sense diverse environmental cues and control arousal accordingly. For unknown reasons, OHNs contain multiple excitatory transmitters, including OH peptides and glutamate. To analyze their cotransmission within computational frameworks for control, we optogenetically stimulated OHNs and examined resulting outputs (spike patterns) in a downstream arousal regulator, the histamine neurons (HANs). OHR2s were essential for sustained HAN outputs. OHR2-dependent HAN output increased linearly during constant OHN input, suggesting that the OHN→HAN(OHR2) module may function as an integral controller. OHN stimulation evoked OHR2-dependent slow postsynaptic currents, similar to midnanomolar OH concentrations. Conversely, glutamate-dependent output transiently communicated OHN input onset, peaking rapidly then decaying alongside OHN→HAN glutamate currents. Blocking glutamate-driven spiking did not affect OH-driven spiking and vice versa, suggesting isolation (low cross-modulation) of outputs. Therefore, in arousal regulators, cotransmitters may translate distinct features of OHN activity into parallel, nonredundant control signals for downstream effectors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Since the early days of logic programming, researchers in the field realized the potential for exploitation of parallelism present in the execution of logic programs. Their high-level nature, the presence of nondeterminism, and their referential transparency, among other characteristics, make logic programs interesting candidates for obtaining speedups through parallel execution. At the same time, the fact that the typical applications of logic programming frequently involve irregular computations, make heavy use of dynamic data structures with logical variables, and involve search and speculation, makes the techniques used in the corresponding parallelizing compilers and run-time systems potentially interesting even outside the field. The objective of this article is to provide a comprehensive survey of the issues arising in parallel execution of logic programming languages along with the most relevant approaches explored to date in the field. Focus is mostly given to the challenges emerging from the parallel execution of Prolog programs. The article describes the major techniques used for shared memory implementation of Or-parallelism, And-parallelism, and combinations of the two. We also explore some related issues, such as memory management, compile-time analysis, and execution visualization.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In parallel to the effort of creating Open Linked Data for the World Wide Web there is a number of projects aimed for developing the same technologies but in the context of their usage in closed environments such as private enterprises. In the paper, we present results of research on interlinking structured data for use in Idea Management Systems - a still rare breed of knowledge management systems dedicated to innovation management. In our study, we show the process of extending an ontology that initially covers only the Idea Management System structure towards the concept of linking with distributed enterprise data and public data using Semantic Web technologies. Furthermore we point out how the established links can help to solve the key problems of contemporary Idea Management Systems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Irregular computations pose sorne of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures, which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. Starting in the mid 80s there has been significant progress in the development of parallelizing compilers for logic pro­gramming (and more recently, constraint programming) resulting in quite capable paralle­lizers. The typical applications of these paradigms frequently involve irregular computations, and make heavy use of dynamic data structures with pointers, since logical variables represent in practice a well-behaved form of pointers. This arguably makes the techniques used in these compilers potentially interesting. In this paper, we introduce in a tutoríal way, sorne of the problems faced by parallelizing compilers for logic and constraint programs and provide pointers to sorne of the significant progress made in the area. In particular, this work has resulted in a series of achievements in the areas of inter-procedural pointer aliasing analysis for independence detection, cost models and cost analysis, cactus-stack memory management, techniques for managing speculative and irregular computations through task granularity control and dynamic task allocation such as work-stealing schedulers), etc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a parallel graph narrowing machine, which is used to implement a functional logic language on a shared memory multiprocessor. It is an extensión of an abstract machine for a purely functional language. The result is a programmed graph reduction machine which integrates the mechanisms of unification, backtracking, and independent and-parallelism. In the machine, the subexpressions of an expression can run in parallel. In the case of backtracking, the structure of an expression is used to avoid the reevaluation of subexpressions as far as possible. Deterministic computations are detected. Their results are maintained and need not be reevaluated after backtracking.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Irregular computations pose some of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. In the past decade there has been significant progress in the development of parallelizing compilers for logic programming and, more recently, constraint programming. The typical applications of these paradigms frequently involve irregular computations, which arguably makes the techniques used in these compilers potentially interesting. In this paper we introduce in a tutorial way some of the problems faced by parallelizing compilers for logic and constraint programs. These include the need for inter-procedural pointer aliasing analysis for independence detection and having to manage speculative and irregular computations through task granularity control and dynamic task allocation. We also provide pointers to some of the progress made in these áreas. In the associated talk we demónstrate representatives of several generations of these parallelizing compilers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is very often the case that programs require passing, maintaining, and updating some notion of state. Prolog programs often implement such stateful computations by carrying this state in predicate arguments (or, alternatively, in the internal datábase). This often causes code obfuscation, complicates code reuse, introduces dependencies on the data model, and is prone to incorrect propagation of the state information among predicate calis. To partly solve these problems, we introduce contexts as a consistent mechanism for specifying implicit arguments and its threading in clause goals. We propose a notation and an interpretation for contexts, ranging from single goals to complete programs, give an intuitive semantics, and describe a translation into standard Prolog. We also discuss a particular light-weight implementation in Ciao Prolog, and we show the usefulness of our proposals on a series of examples and applications, including code directiy using contexts, DCGs, extended DCGs, logical loops and other custom control structures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the problem of supporting goal-level, independent andparallelism (IAP) in the presence of non-determinism. IAP is exploited when two or more goals which will not interfere at run time are scheduled for simultaneous execution. Backtracking over non-deterministic parallel goals runs into the wellknown trapped goal and garbage slot problems. The proposed solutions for these problems generally require complex low-level machinery which makes systems difficult to maintain and extend, and in some cases can even affect sequential execution performance. In this paper we propose a novel solution to the problem of trapped nondeterministic goals and garbage slots which is based on a single stack reordering operation and offers several advantages over previous proposals. While the implementation of this operation itself is not simple, in return it does not impose constraints on the scheduler. As a result, the scheduler and the rest of the run-time machinery can safely ignore the trapped goal and garbage slot problems and their implementation is greatly simplified. Also, standard sequential execution remains unaffected. In addition to describing the solution we report on an implementation and provide performance results. We also suggest other possible applications of the proposed approach beyond parallel execution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays robots have made their way into real applications that were prohibitive and unthinkable thirty years ago. This is mainly due to the increase in power computations and the evolution in the theoretical field of robotics and control. Even though there is plenty of information in the current literature on this topics, it is not easy to find clear concepts of how to proceed in order to design and implement a controller for a robot. In general, the design of a controller requires of a complete understanding and knowledge of the system to be controlled. Therefore, for advanced control techniques the systems must be first identified. Once again this particular objective is cumbersome and is never straight forward requiring of great expertise and some criteria must be adopted. On the other hand, the particular problem of designing a controller is even more complex when dealing with Parallel Manipulators (PM), since their closed-loop structures give rise to a highly nonlinear system. Under this basis the current work is developed, which intends to resume and gather all the concepts and experiences involve for the control of an Hydraulic Parallel Manipulator. The main objective of this thesis is to provide a guide remarking all the steps involve in the designing of advanced control technique for PMs. The analysis of the PM under study is minced up to the core of the mechanism: the hydraulic actuators. The actuators are modeled and experimental identified. Additionally, some consideration regarding traditional PID controllers are presented and an adaptive controller is finally implemented. From a macro perspective the kinematic and dynamic model of the PM are presented. Based on the model of the system and extending the adaptive controller of the actuator, a control strategy for the PM is developed and its performance is analyzed with simulation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La evolución de los teléfonos móviles inteligentes, dotados de cámaras digitales, está provocando una creciente demanda de aplicaciones cada vez más complejas que necesitan algoritmos de visión artificial en tiempo real; puesto que el tamaño de las señales de vídeo no hace sino aumentar y en cambio el rendimiento de los procesadores de un solo núcleo se ha estancado, los nuevos algoritmos que se diseñen para visión artificial han de ser paralelos para poder ejecutarse en múltiples procesadores y ser computacionalmente escalables. Una de las clases de procesadores más interesantes en la actualidad se encuentra en las tarjetas gráficas (GPU), que son dispositivos que ofrecen un alto grado de paralelismo, un excelente rendimiento numérico y una creciente versatilidad, lo que los hace interesantes para llevar a cabo computación científica. En esta tesis se exploran dos aplicaciones de visión artificial que revisten una gran complejidad computacional y no pueden ser ejecutadas en tiempo real empleando procesadores tradicionales. En cambio, como se demuestra en esta tesis, la paralelización de las distintas subtareas y su implementación sobre una GPU arrojan los resultados deseados de ejecución con tasas de refresco interactivas. Asimismo, se propone una técnica para la evaluación rápida de funciones de complejidad arbitraria especialmente indicada para su uso en una GPU. En primer lugar se estudia la aplicación de técnicas de síntesis de imágenes virtuales a partir de únicamente dos cámaras lejanas y no paralelas—en contraste con la configuración habitual en TV 3D de cámaras cercanas y paralelas—con información de color y profundidad. Empleando filtros de mediana modificados para la elaboración de un mapa de profundidad virtual y proyecciones inversas, se comprueba que estas técnicas son adecuadas para una libre elección del punto de vista. Además, se demuestra que la codificación de la información de profundidad con respecto a un sistema de referencia global es sumamente perjudicial y debería ser evitada. Por otro lado se propone un sistema de detección de objetos móviles basado en técnicas de estimación de densidad con funciones locales. Este tipo de técnicas es muy adecuada para el modelado de escenas complejas con fondos multimodales, pero ha recibido poco uso debido a su gran complejidad computacional. El sistema propuesto, implementado en tiempo real sobre una GPU, incluye propuestas para la estimación dinámica de los anchos de banda de las funciones locales, actualización selectiva del modelo de fondo, actualización de la posición de las muestras de referencia del modelo de primer plano empleando un filtro de partículas multirregión y selección automática de regiones de interés para reducir el coste computacional. Los resultados, evaluados sobre diversas bases de datos y comparados con otros algoritmos del estado del arte, demuestran la gran versatilidad y calidad de la propuesta. Finalmente se propone un método para la aproximación de funciones arbitrarias empleando funciones continuas lineales a tramos, especialmente indicada para su implementación en una GPU mediante el uso de las unidades de filtraje de texturas, normalmente no utilizadas para cómputo numérico. La propuesta incluye un riguroso análisis matemático del error cometido en la aproximación en función del número de muestras empleadas, así como un método para la obtención de una partición cuasióptima del dominio de la función para minimizar el error. ABSTRACT The evolution of smartphones, all equipped with digital cameras, is driving a growing demand for ever more complex applications that need to rely on real-time computer vision algorithms. However, video signals are only increasing in size, whereas the performance of single-core processors has somewhat stagnated in the past few years. Consequently, new computer vision algorithms will need to be parallel to run on multiple processors and be computationally scalable. One of the most promising classes of processors nowadays can be found in graphics processing units (GPU). These are devices offering a high parallelism degree, excellent numerical performance and increasing versatility, which makes them interesting to run scientific computations. In this thesis, we explore two computer vision applications with a high computational complexity that precludes them from running in real time on traditional uniprocessors. However, we show that by parallelizing subtasks and implementing them on a GPU, both applications attain their goals of running at interactive frame rates. In addition, we propose a technique for fast evaluation of arbitrarily complex functions, specially designed for GPU implementation. First, we explore the application of depth-image–based rendering techniques to the unusual configuration of two convergent, wide baseline cameras, in contrast to the usual configuration used in 3D TV, which are narrow baseline, parallel cameras. By using a backward mapping approach with a depth inpainting scheme based on median filters, we show that these techniques are adequate for free viewpoint video applications. In addition, we show that referring depth information to a global reference system is ill-advised and should be avoided. Then, we propose a background subtraction system based on kernel density estimation techniques. These techniques are very adequate for modelling complex scenes featuring multimodal backgrounds, but have not been so popular due to their huge computational and memory complexity. The proposed system, implemented in real time on a GPU, features novel proposals for dynamic kernel bandwidth estimation for the background model, selective update of the background model, update of the position of reference samples of the foreground model using a multi-region particle filter, and automatic selection of regions of interest to reduce computational cost. The results, evaluated on several databases and compared to other state-of-the-art algorithms, demonstrate the high quality and versatility of our proposal. Finally, we propose a general method for the approximation of arbitrarily complex functions using continuous piecewise linear functions, specially formulated for GPU implementation by leveraging their texture filtering units, normally unused for numerical computation. Our proposal features a rigorous mathematical analysis of the approximation error in function of the number of samples, as well as a method to obtain a suboptimal partition of the domain of the function to minimize approximation error.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose – The purpose of this paper is to investigate an underexplored aspect of outsourcing involving a mixed strategy in which parallel production is continued in-house at the same time as outsourcing occurs. Design/methodology/approach – The study applied a multiple case study approach and drew on qualitative data collected through in-depth interviews with wood product manufacturing companies. Findings – The paper posits that there should be a variety of mixed strategies between the two governance forms of “make” or “buy.” In order to address how companies should consider the extent to which they outsource, the analysis was structured around two ends of a continuum: in-house dominance or outsourcing dominance. With an in-house-dominant strategy, outsourcing complements an organization's own production to optimize capacity utilization and outsource less cost-efficient production, or is used as a tool to learn how to outsource. With an outsourcing-dominant strategy, in-house production helps maintain complementary competencies and avoids lock-in risk. Research limitations/implications – This paper takes initial steps toward an exploration of different mixed strategies. Additional research is required to understand the costs of different mixed strategies compared with insourcing and outsourcing, and to study parallel production from a supplier viewpoint. Practical implications – This paper suggests that managers should think twice before rushing to a “me too” outsourcing strategy in which in-house capacities are completely closed. It is important to take a dynamic view of outsourcing that maintains a mixed strategy as an option, particularly in situations that involve an underdeveloped supplier market and/or as a way to develop resources over the long term. Originality/value – The concept of combining both “make” and “buy” is not new. However, little if any research has focussed explicitly on exploring the variety of different types of mixed strategies that exist on the continuum between insourcing and outsourcing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A large class of computational problems are characterised by frequent synchronisation, and computational requirements which change as a function of time. When such a problem is solved on a message passing multiprocessor machine [5], the combination of these characteristics leads to system performance which deteriorate in time. As the communication performance of parallel hardware steadily improves so load balance becomes a dominant factor in obtaining high parallel efficiency. Performance can be improved with periodic redistribution of computational load; however, redistribution can sometimes be very costly. We study the issue of deciding when to invoke a global load re-balancing mechanism. Such a decision policy must actively weigh the costs of remapping against the performance benefits, and should be general enough to apply automatically to a wide range of computations. This paper discusses a generic strategy for Dynamic Load Balancing (DLB) in unstructured mesh computational mechanics applications. The strategy is intended to handle varying levels of load changes throughout the run. The major issues involved in a generic dynamic load balancing scheme will be investigated together with techniques to automate the implementation of a dynamic load balancing mechanism within the Computer Aided Parallelisation Tools (CAPTools) environment, which is a semi-automatic tool for parallelisation of mesh based FORTRAN codes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A method is outlined for optimising graph partitions which arise in mapping un- structured mesh calculations to parallel computers. The method employs a combination of iterative techniques to both evenly balance the workload and minimise the number and volume of interprocessor communications. They are designed to work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. The algorithms can also be used for dynamic load-balancing and a clustering technique can additionally be employed to speed up the whole process. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds.