936 resultados para Thread safe parallel run-time
Resumo:
Logic programming systems which exploit and-parallelism among non-deterministic goals rely on notions of independence among those goals in order to ensure certain efficiency properties. "Non-strict" independence (NSI) is a more relaxed notion than the traditional notion of "strict" independence (SI) which still ensures the relevant efficiency properties and can allow considerable more parallelism than SI. However, all compilation technology developed to date has been based on SI, because of the intrinsic complexity of exploiting NSI. This is related to the fact that NSI cannot be determined "a priori" as SI. This paper filis this gap by developing a technique for compile-time detection and annotation of NSI. It also proposes algorithms for combined compiletime/ run-time detection, presenting novel run-time checks for this type of parallelism. Also, a transformation procedure to eliminate shared variables among parallel goals is presented, aimed at performing as much work as possible at compile-time. The approach is based on the knowledge of certain properties regarding the run-time instantiations of program variables —sharing and freeness— for which compile-time technology is available, with new approaches being currently proposed. Thus, the paper does not deal with the analysis itself, but rather with how the analysis results can be used to parallelize programs.
Resumo:
This paper presents and develops a generalized concept of Non-Strict Independent And Parallelism (NSIAP). NSIAP extends the applicability of Independent And- Parallelism (IAP) by enlarging the class of goals which are eligible for parallel execution. At the same time it maintains IAP's ability to run non-deterministic goals in parallel and to preserve the computational complexity expected in the execution of the program by the programmer. First, a parallel execution framework is defined and some fundamental correctness results, in the sense of equivalence of solutions with the sequential model, are discussed for this framework. The issue of efficiency is then considered. Two new definitions of NSI are given for the cases of puré and impure goals respectively and efficiency results are provided for programs parallelized under these definitions which include treatment of the case of goal failure: not only is reduction of execution time guaranteed (modulo run-time overheads) in the absence of failure but it is also shown that in the worst case of failure no speed-down will occur. In addition to applying to NSI, these results carry over and complete previous results shown in the context of IAP which did not deal with the case of goal failure. Finally, some practical examples of the application of the NSIAP concept to the parallelization of a set of programs are presented and performance results, showing the advantage of using NSI, are given.
Resumo:
This paper presents and proves some fundamental results for independent and-parallelism (IAP). First, the paper treats the issues of correctness and efficiency: after defining strict and non-strict goal independence, it is proved that if strictly independent goals are executed in parallel the solutions obtained are the same as those produced by standard sequential execution. It is also shown that, in the absence of failure, the parallel proof procedure doesn't genérate any additional work (with respect to standard SLDresolution) while the actual execution time is reduced. The same results hold even if non-strictly independent goals are executed in parallel, provided a trivial rewriting of such goals is performed. In addition, and most importantly, treats the issue of compile-time generation of IAP by proposing conditions, to be written at compile-time, to efficiently check strict and non-strict goal independence at run-time and proving the sufficiency of such conditions. It is also shown how simpler conditions can be constructed if some information regarding the binding context of the goals to be executed in parallel is available to the compiler trough either local or program-level analysis. These results therefore provide a formal basis for the automatic compile-time generation of IAP. As a corollary of such results, the paper also proves that negative goals are always non-strictly independent, and that goals which share a first occurrence of an existential variable are never independent.
Resumo:
Traditional schemes for abstract interpretation-based global analysis of logic programs generally focus on obtaining procedure argument mode and type information. Variable sharing information is often given only the attention needed to preserve the correctness of the analysis. However, such sharing information can be very useful. In particular, it can be used for predicting run-time goal independence, which can eliminate costly run-time checks in and-parallel execution. In this paper, a new algorithm for doing abstract interpretation in logic programs is described which infers the dependencies of the terms bound to program variables with increased precisión and at all points in the execution of the program, rather than just at a procedure level. Algorithms are presented for computing abstract entry and success substitutions which extensively keep track of variable aliasing and term dependence information. The algorithms are illustrated with examples.
Resumo:
Distributed parallel execution systems speed up applications by splitting tasks into processes whose execution is assigned to different receiving nodes in a high-bandwidth network. On the distributing side, a fundamental problem is grouping and scheduling such tasks such that each one involves sufñcient computational cost when compared to the task creation and communication costs and other such practical overheads. On the receiving side, an important issue is to have some assurance of the correctness and characteristics of the code received and also of the kind of load the particular task is going to pose, which can be specified by means of certificates. In this paper we present in a tutorial way a number of general solutions to these problems, and illustrate them through their implementation in the Ciao multi-paradigm language and program development environment. This system includes facilities for parallel and distributed execution, an assertion language for specifying complex programs properties (including safety and resource-related properties), and compile-time and run-time tools for performing automated parallelization and resource control, as well as certification of programs with resource consumption assurances and efñcient checking of such certificates.
Resumo:
Esta tesis doctoral se enmarca dentro de la computación con membranas. Se trata de un tipo de computación bio-inspirado, concretamente basado en las células de los organismos vivos, en las que se producen múltiples reacciones de forma simultánea. A partir de la estructura y funcionamiento de las células se han definido diferentes modelos formales, denominados P sistemas. Estos modelos no tratan de modelar el comportamiento biológico de una célula, sino que abstraen sus principios básicos con objeto de encontrar nuevos paradigmas computacionales. Los P sistemas son modelos de computación no deterministas y masivamente paralelos. De ahí el interés que en los últimos años estos modelos han suscitado para la resolución de problemas complejos. En muchos casos, consiguen resolver de forma teórica problemas NP-completos en tiempo polinómico o lineal. Por otra parte, cabe destacar también la aplicación que la computación con membranas ha tenido en la investigación de otros muchos campos, sobre todo relacionados con la biología. Actualmente, una gran cantidad de estos modelos de computación han sido estudiados desde el punto de vista teórico. Sin embargo, el modo en que pueden ser implementados es un reto de investigación todavía abierto. Existen varias líneas en este sentido, basadas en arquitecturas distribuidas o en hardware dedicado, que pretenden acercarse en lo posible a su carácter no determinista y masivamente paralelo, dentro de un contexto de viabilidad y eficiencia. En esta tesis doctoral se propone la realización de un análisis estático del P sistema, como vía para optimizar la ejecución del mismo en estas plataformas. Se pretende que la información recogida en tiempo de análisis sirva para configurar adecuadamente la plataforma donde se vaya a ejecutar posteriormente el P sistema, obteniendo como consecuencia una mejora en el rendimiento. Concretamente, en esta tesis se han tomado como referencia los P sistemas de transiciones para llevar a cabo el estudio de dicho análisis estático. De manera un poco más específica, el análisis estático propuesto en esta tesis persigue que cada membrana sea capaz de determinar sus reglas activas de forma eficiente en cada paso de evolución, es decir, aquellas reglas que reúnen las condiciones adecuadas para poder ser aplicadas. En esta línea, se afronta el problema de los estados de utilidad de una membrana dada, que en tiempo de ejecución permitirán a la misma conocer en todo momento las membranas con las que puede comunicarse, cuestión que determina las reglas que pueden aplicarse en cada momento. Además, el análisis estático propuesto en esta tesis se basa en otra serie de características del P sistema como la estructura de membranas, antecedentes de las reglas, consecuentes de las reglas o prioridades. Una vez obtenida toda esta información en tiempo de análisis, se estructura en forma de árbol de decisión, con objeto de que en tiempo de ejecución la membrana obtenga las reglas activas de la forma más eficiente posible. Por otra parte, en esta tesis se lleva a cabo un recorrido por un número importante de arquitecturas hardware y software que diferentes autores han propuesto para implementar P sistemas. Fundamentalmente, arquitecturas distribuidas, hardware dedicado basado en tarjetas FPGA y plataformas basadas en microcontroladores PIC. El objetivo es proponer soluciones que permitan implantar en dichas arquitecturas los resultados obtenidos del análisis estático (estados de utilidad y árboles de decisión para reglas activas). En líneas generales, se obtienen conclusiones positivas, en el sentido de que dichas optimizaciones se integran adecuadamente en las arquitecturas sin penalizaciones significativas. Summary Membrane computing is the focus of this doctoral thesis. It can be considered a bio-inspired computing type. Specifically, it is based on living cells, in which many reactions take place simultaneously. From cell structure and operation, many different formal models have been defined, named P systems. These models do not try to model the biological behavior of the cell, but they abstract the basic principles of the cell in order to find out new computational paradigms. P systems are non-deterministic and massively parallel computational models. This is why, they have aroused interest when dealing with complex problems nowadays. In many cases, they manage to solve in theory NP problems in polynomial or lineal time. On the other hand, it is important to note that membrane computing has been successfully applied in many researching areas, specially related to biology. Nowadays, lots of these computing models have been sufficiently characterized from a theoretical point of view. However, the way in which they can be implemented is a research challenge, that it is still open nowadays. There are some lines in this way, based on distributed architectures or dedicated hardware. All of them are trying to approach to its non-deterministic and parallel character as much as possible, taking into account viability and efficiency. In this doctoral thesis it is proposed carrying out a static analysis of the P system in order to optimize its performance in a computing platform. The general idea is that after data are collected in analysis time, they are used for getting a suitable configuration of the computing platform in which P system is going to be performed. As a consequence, the system throughput will improve. Specifically, this thesis has made use of Transition P systems for carrying out the study in static analysis. In particular, the static analysis proposed in this doctoral thesis tries to achieve that every membrane can efficiently determine its active rules in every evolution step. These rules are the ones that can be applied depending on the system configuration at each computational step. In this line, we are going to tackle the problem of the usefulness states for a membrane. This state will allow this membrane to know the set of membranes with which communication is possible at any time. This is a very important issue in determining the set of rules that can be applied. Moreover, static analysis in this thesis is carried out taking into account other properties such as membrane structure, rule antecedents, rule consequents and priorities among rules. After collecting all data in analysis time, they are arranged in a decision tree structure, enabling membranes to obtain the set of active rules as efficiently as possible in run-time system. On the other hand, in this doctoral thesis is going to carry out an overview of hardware and software architectures, proposed by different authors in order to implement P systems, such as distributed architectures, dedicated hardware based on PFGA, and computing platforms based on PIC microcontrollers. The aim of this overview is to propose solutions for implementing the results of the static analysis, that is, usefulness states and decision trees for active rules. In general, conclusions are satisfactory, because these optimizations can be properly integrated in most of the architectures without significant penalties.
Resumo:
We consider the problem of supporting goal-level, independent andparallelism (IAP) in the presence of non-determinism. IAP is exploited when two or more goals which will not interfere at run time are scheduled for simultaneous execution. Backtracking over non-deterministic parallel goals runs into the wellknown trapped goal and garbage slot problems. The proposed solutions for these problems generally require complex low-level machinery which makes systems difficult to maintain and extend, and in some cases can even affect sequential execution performance. In this paper we propose a novel solution to the problem of trapped nondeterministic goals and garbage slots which is based on a single stack reordering operation and offers several advantages over previous proposals. While the implementation of this operation itself is not simple, in return it does not impose constraints on the scheduler. As a result, the scheduler and the rest of the run-time machinery can safely ignore the trapped goal and garbage slot problems and their implementation is greatly simplified. Also, standard sequential execution remains unaffected. In addition to describing the solution we report on an implementation and provide performance results. We also suggest other possible applications of the proposed approach beyond parallel execution.
Resumo:
Logic programming systems which exploit and-parallelism among non-deterministic goals rely on notions of independence among those goals in order to ensure certain efficiency properties. "Non-strict" independence (NSI) is a more relaxed notion than the traditional notion of "strict" independence (SI) which still ensures the relevant efficiency properties and can allow considerable more parallelism than SI. However, all compilation technology developed to date has been based on SI, presumably because of the intrinsic complexity of exploiting NSI. This is related to the fact that NSI cannot be determined "a priori" as SI. This paper fills this gap by developing a technique for compile-time detection and annotation of NSI. It also proposes algorithms for combined compile- time/run-time detection, presenting novel run-time checks for this type of parallelism. Also, a transformation procedure to eliminate shared variables among parallel goals is presented, attempting to perform as much work as possible at compiletime. The approach is based on the knowledge of certain properties about run-time instantiations of program variables —sharing and freeness— for which compile-time technology is available, with new approaches being currently proposed.
Resumo:
Future high-quality consumer electronics will contain a number of applications running in a highly dynamic environment, and their execution will need to be efficiently arbitrated by the underlying platform software. The multimedia applications that currently execute in such similar contexts face frequent run-time variations in their resource demands, originated by the greedy nature of the multimedia processing itself. Changes in resource demands are triggered by numerous reasons (e.g. a switch in the input media compression format). Such situations require real-time adaptation mechanisms to adjust the system operation to the new requirements, and this must be done seamlessly to satisfy the user experience. One solution for efficiently managing application execution is to apply quality of service resource management techniques, based on assigning and enforcing resource contracts to applications. Most resource management solutions provide temporal isolation by enforcing resource assignments and avoiding any resource overruns. However, this has a clear limitation over the cost-effective resource usage. This paper presents a simple priority assignment scheme based on uniform priority bands to allow that greedy multimedia tasks incur in safe overruns that increase resource usage and do not threaten the timely execution of non-overrunning tasks. Experimental results show that the proposed priority assignment scheme in combination with a resource accounting mechanism preserves timely multimedia execution and delivery, achieves a higher cost-effective processor usage, and guarantees the execution isolation of non-overrunning tasks.
Resumo:
Dynamically Reconfigurable Systems are attracting a growing interest, mainly due to the emergence of novel applications based on this technology. However, commercial tools do not provide enough flexibility to design solutions, while keeping an acceptable design productivity. In this paper, a novel design flow is proposed, targeting dynamically reconfigurable systems. It is fully supported by a tool called Dreams, which is able to implement flexible systems, starting from a set of netlists corresponding to the modules, as well as a system description provided by the user. The tool automatically post-processes the nets, implementing a solution for the communications between reconfigurable regions, as well as the handling of routing conflicts, by means of a custom router. Since the design process of every module and the static system are independent, the proposed flow is compatible with system upgrade at run-time. In this paper, a use case corresponding to the design of a highly regular and parallel mesh-type architecture is described, in order to show the architectural flexibility offered by the tool.
Resumo:
We discuss experiences gained by porting a Software Validation Facility (SVF) and a satellite Central Software (CSW) to a platform with support for Time and Space Partitioning (TSP). The SVF and CSW are part of the EagleEye Reference mission of the European Space Agency (ESA). As a reference mission, EagleEye is a perfect candidate to evaluate practical aspects of developing satellite CSW for and on TSP platforms. The specific TSP platform we used consists of a simulate D LEON3 CPU controlled by the XtratuM separation micro-kernel. On top of this, we run five separate partitions. Each partition ru n s its own real-time operating system or Ada run-time kernel, which in turn are running the application software of the CSW. We describe issues related to partitioning; inter-partition communication; scheduling; I/O; and fault-detection, isolation, and recovery (FDIR)
Resumo:
In this paper, an architecture based on a scalable and flexible set of Evolvable Processing arrays is presented. FPGA-native Dynamic Partial Reconfiguration (DPR) is used for evolution, which is done intrinsically, letting the system to adapt autonomously to variable run-time conditions, including the presence of transient and permanent faults. The architecture supports different modes of operation, namely: independent, parallel, cascaded or bypass mode. These modes of operation can be used during evolution time or during normal operation. The evolvability of the architecture is combined with fault-tolerance techniques, to enhance the platform with self-healing features, making it suitable for applications which require both high adaptability and reliability. Experimental results show that such a system may benefit from accelerated evolution times, increased performance and improved dependability, mainly by increasing fault tolerance for transient and permanent faults, as well as providing some fault identification possibilities. The evolvable HW array shown is tailored for window-based image processing applications.
Resumo:
This thesis concerns mixed flows (which are characterized by the simultaneous occurrence of free-surface and pressurized flow in sewers, tunnels, culverts or under bridges), and contributes to the improvement of the existing numerical tools for modelling these phenomena. The classic Preissmann slot approach is selected due to its simplicity and capability of predicting results comparable to those of a more recent and complex two-equation model, as shown here with reference to a laboratory test case. In order to enhance the computational efficiency, a local time stepping strategy is implemented in a shock-capturing Godunov-type finite volume numerical scheme for the integration of the de Saint-Venant equations. The results of different numerical tests show that local time stepping reduces run time significantly (between −29% and −85% CPU time for the test cases considered) compared to the conventional global time stepping, especially when only a small region of the flow field is surcharged, while solution accuracy and mass conservation are not impaired. The second part of this thesis is devoted to the modelling of the hydraulic effects of potentially pressurized structures, such as bridges and culverts, inserted in open channel domains. To this aim, a two-dimensional mixed flow model is developed first. The classic conservative formulation of the 2D shallow water equations for free-surface flow is adapted by assuming that two fictitious vertical slots, normally intersecting, are added on the ceiling of each integration element. Numerical results show that this schematization is suitable for the prediction of 2D flooding phenomena in which the pressurization of crossing structures can be expected. Given that the Preissmann model does not allow for the possibility of bridge overtopping, a one-dimensional model is also presented in this thesis to handle this particular condition. The flows below and above the deck are considered as parallel, and linked to the upstream and downstream reaches of the channel by introducing suitable internal boundary conditions. The comparison with experimental data and with the results of HEC-RAS simulations shows that the proposed model can be a useful and effective tool for predicting overtopping and backwater effects induced by the presence of bridges and culverts.
Resumo:
A new method for solving some hard combinatorial optimization problems is suggested, admitting a certain reformulation. Considering such a problem, several different similar problems are prepared which have the same set of solutions. They are solved on computer in parallel until one of them will be solved, and that solution is accepted. Notwithstanding the evident overhead, the whole run-time could be significantly reduced due to dispersion of velocities of combinatorial search in regarded cases. The efficiency of this approach is investigated on the concrete problem of finding short solutions of non-deterministic system of linear logical equations.
Resumo:
FPGAs and GPUs are often used when real-time performance in video processing is required. An accelerated processor is chosen based on task-specific priorities (power consumption, processing time and detection accuracy), and this decision is normally made once at design time. All three characteristics are important, particularly in battery-powered systems. Here we propose a method for moving selection of processing platform from a single design-time choice to a continuous run time one.We implement Histogram of Oriented Gradients (HOG) detectors for cars and people and Mixture of Gaussians (MoG) motion detectors running across FPGA, GPU and CPU in a heterogeneous system. We use this to detect illegally parked vehicles in urban scenes. Power, time and accuracy information for each detector is characterised. An anomaly measure is assigned to each detected object based on its trajectory and location, when compared to learned contextual movement patterns. This drives processor and implementation selection, so that scenes with high behavioural anomalies are processed with faster but more power hungry implementations, but routine or static time periods are processed with power-optimised, less accurate, slower versions. Real-time performance is evaluated on video datasets including i-LIDS. Compared to power-optimised static selection, automatic dynamic implementation mapping is 10% more accurate but draws 12W extra power in our testbed desktop system.