54 resultados para PDE-based parallel preconditioner


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, applications in domains such as telecommunications, network security or large scale sensor networks showed the limits of the traditional store-then-process paradigm. In this context, Stream Processing Engines emerged as a candidate solution for all these applications demanding for high processing capacity with low processing latency guarantees. With Stream Processing Engines, data streams are not persisted but rather processed on the fly, producing results continuously. Current Stream Processing Engines, either centralized or distributed, do not scale with the input load due to single-node bottlenecks. Moreover, they are based on static configurations that lead to either under or over-provisioning. This Ph.D. thesis discusses StreamCloud, an elastic paralleldistributed stream processing engine that enables for processing of large data stream volumes. Stream- Cloud minimizes the distribution and parallelization overhead introducing novel techniques that split queries into parallel subqueries and allocate them to independent sets of nodes. Moreover, Stream- Cloud elastic and dynamic load balancing protocols enable for effective adjustment of resources depending on the incoming load. Together with the parallelization and elasticity techniques, Stream- Cloud defines a novel fault tolerance protocol that introduces minimal overhead while providing fast recovery. StreamCloud has been fully implemented and evaluated using several real word applications such as fraud detection applications or network analysis applications. The evaluation, conducted using a cluster with more than 300 cores, demonstrates the large scalability, the elasticity and fault tolerance effectiveness of StreamCloud. Resumen En los útimos años, aplicaciones en dominios tales como telecomunicaciones, seguridad de redes y redes de sensores de gran escala se han encontrado con múltiples limitaciones en el paradigma tradicional de bases de datos. En este contexto, los sistemas de procesamiento de flujos de datos han emergido como solución a estas aplicaciones que demandan una alta capacidad de procesamiento con una baja latencia. En los sistemas de procesamiento de flujos de datos, los datos no se persisten y luego se procesan, en su lugar los datos son procesados al vuelo en memoria produciendo resultados de forma continua. Los actuales sistemas de procesamiento de flujos de datos, tanto los centralizados, como los distribuidos, no escalan respecto a la carga de entrada del sistema debido a un cuello de botella producido por la concentración de flujos de datos completos en nodos individuales. Por otra parte, éstos están basados en configuraciones estáticas lo que conducen a un sobre o bajo aprovisionamiento. Esta tesis doctoral presenta StreamCloud, un sistema elástico paralelo-distribuido para el procesamiento de flujos de datos que es capaz de procesar grandes volúmenes de datos. StreamCloud minimiza el coste de distribución y paralelización por medio de una técnica novedosa la cual particiona las queries en subqueries paralelas repartiéndolas en subconjuntos de nodos independientes. Ademas, Stream- Cloud posee protocolos de elasticidad y equilibrado de carga que permiten una optimización de los recursos dependiendo de la carga del sistema. Unidos a los protocolos de paralelización y elasticidad, StreamCloud define un protocolo de tolerancia a fallos que introduce un coste mínimo mientras que proporciona una rápida recuperación. StreamCloud ha sido implementado y evaluado mediante varias aplicaciones del mundo real tales como aplicaciones de detección de fraude o aplicaciones de análisis del tráfico de red. La evaluación ha sido realizada en un cluster con más de 300 núcleos, demostrando la alta escalabilidad y la efectividad tanto de la elasticidad, como de la tolerancia a fallos de StreamCloud.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract machines provide a certain separation between platformdependent and platform-independent concerns in compilation. Many of the differences between architectures are encapsulated in the speciflc abstract machine implementation and the bytecode is left largely architecture independent. Taking advantage of this fact, we present a framework for estimating upper and lower bounds on the execution times of logic programs running on a bytecode-based abstract machine. Our approach includes a one-time, programindependent proflling stage which calculates constants or functions bounding the execution time of each abstract machine instruction. Then, a compile-time cost estimation phase, using the instruction timing information, infers expressions giving platform-dependent upper and lower bounds on actual execution time as functions of input data sizes for each program. Working at the abstract machine level makes it possible to take into account low-level issues in new architectures and platforms by just reexecuting the calibration stage instead of having to tailor the analysis for each architecture and platform. Applications of such predicted execution times include debugging/veriflcation of time properties, certiflcation of time properties in mobile code, granularity control in parallel/distributed computing, and resource-oriented specialization.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a technique to estimate accurate speedups for parallel logic programs with relative independence from characteristics of a given implementation or underlying parallel hardware. The proposed technique is based on gathering accurate data describing one execution at run-time, which is fed to a simulator. Alternative schedulings are then simulated and estimates computed for the corresponding speedups. A tool implementing the aforementioned techniques is presented, and its predictions are compared to the performance of real systems, showing good correlation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years a lot of research has been invested in parallel processing of numerical applications. However, parallel processing of Symbolic and AI applications has received less attention. This paper presents a system for parallel symbolic computitig, narned ACE, based on the logic programming paradigm. ACE is a computational model for the full Prolog language, capable of exploiting Or-parall< lism and Independent And-parallelism. In this paper vve focus on the implementation of the and-parallel part of the ACE system (ralled &ACE) on a shared memory multiprocessor, d< scribing its organization, some optimizations, and presenting some performance figures, proving the abilhy of &ACE to efficiently exploit parallelism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also, a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with médium to large parallel execution overheads.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We informally discuss several issues related to the parallel execution of logic programming systems and concurrent logic programming systems, and their generalization to constraint programming. We propose a new view of these systems, based on a particular definition of parallelism. We argüe that, under this view, a large number of the actual systems and models can be explained through the application, at different levéis of granularity, of only a few basic principies: determinism, non-failure, independence (also referred to as stability), granularity, etc. Also, and based on the convergence of concepts that this view brings, we sketch a model for the implementation of several parallel constraint logic programming source languages and models based on a common, generic abstract machine and an intermedíate kernel language.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the design of visual paradigms for observing the parallel execution of logic programs. First, an intuitive method is proposed for arriving at the design of a paradigm and its implementation as a tool for a given model of parallelism. This method is based on stepwise reñnement starting from the deñnition of basic notions such as events and observables and some precedence relationships among events which hold for the given model of parallelism. The method is then applied to several types of parallel execution models for logic programs (Orparallelism, Determinate Dependent And parallelism, Restricted and-parallelism) for which visualization paradigms are designed. Finally, VisAndOr, a tool which implements all of these paradigms is presented, together with a discussion of its usefulness through examples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a computational methodology -"B-LOG"-, which offers the potential for an effective implementation of Logic Programming in a parallel computer. We also propose a weighting scheme to guide the search process through the graph and we apply the concepts of parallel "branch and bound" algorithms in order to perform a "best-first" search using an information theoretic bound. The concept of "session" is used to speed up the search process in a succession of similar queries. Within a session, we strongly modify the bounds in a local database, while bounds kept in a global database are weakly modified to provide a better initial condition for other sessions. We also propose an implementation scheme based on a database machine using "semantic paging", and the "B-LOG processor" based on a scoreboard driven controller.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Irregular computations pose some of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. In the past decade there has been significant progress in the development of parallelizing compilers for logic programming and, more recently, constraint programming. The typical applications of these paradigms frequently involve irregular computations, which arguably makes the techniques used in these compilers potentially interesting. In this paper we introduce in a tutorial way some of the problems faced by parallelizing compilers for logic and constraint programs. These include the need for inter-procedural pointer aliasing analysis for independence detection and having to manage speculative and irregular computations through task granularity control and dynamic task allocation. We also provide pointers to some of the progress made in these áreas. In the associated talk we demónstrate representatives of several generations of these parallelizing compilers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an overview of the stack-based memory management techniques that we used in our non-deterministic and-parallel Prolog systems: &-Prolog and DASWAM. We believe that the problems associated with non-deterministic and-parallel systems are more general than those encountered in or-parallel and deterministic and-parallel systems, which can be seen as subsets of this more general case. We develop on the previously proposed "marker scheme", lifting some of the restrictions associated with the selection of goals while keeping (virtual) memory consumption down. We also review some of the other problems associated with the stack-based management scheme, such as handling of forward and backward execution, cut, and roll-backs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We informally discuss several issues related to the parallel execution of logic programming systems and concurrent logic programming systems, and their generalization to constraint programming. We propose a new view of these systems, based on a particular definition of parallelism. We argüe that, under this view, a large number of the actual systems and models can be explained through the application, at different levéis of granularity, of only a few basic principies: determinism, non-failure, independence (also referred to as stability), granularity, etc. Also, and based on the convergence of concepts that this view brings, we sketch a model for the implementation of several parallel constraint logic programming source languages and models based on a common, generic abstract machine and an intermedíate kernel language.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are many the requirements that modern power converters should fulfill. Most of the applications where these converters are used, demand smaller converters with high efficiency, improved power density and a fast dynamic response. For instance, loads like microprocessors demand aggressive current steps with very high slew rates (100A/mus and higher); besides, during these load steps, the supply voltage of the microprocessor should be kept within tight limits in order to ensure its correct performance. The accomplishment of these requirements is not an easy task; complex solutions like advanced topologies - such as multiphase converters- as well as advanced control strategies are often needed. Besides, it is also necessary to operate the converter at high switching frequencies and to use capacitors with high capacitance and low ESR. Improving the dynamic response of power converters does not rely only on the control strategy but also the power topology should be suited to enable a fast dynamic response. Moreover, in later years, a fast dynamic response does not only mean accomplishing fast load steps but output voltage steps are gaining importance as well. At least, two applications that require fast voltage changes can be named: Low power microprocessors. In these devices, the voltage supply is changed according to the workload and the operating frequency of the microprocessor is changed at the same time. An important reduction in voltage dependent losses can be achieved with such changes. This technique is known as Dynamic Voltage Scaling (DVS). Another application where important energy savings can be achieved by means of changing the supply voltage are Radio Frequency Power Amplifiers. For example, RF architectures based on ‘Envelope Tracking’ and ‘Envelope Elimination and Restoration’ techniques can take advantage of voltage supply modulation and accomplish important energy savings in the power amplifier. However, in order to achieve these efficiency improvements, a power converter with high efficiency and high enough bandwidth (hundreds of kHz or even tens of MHz) is necessary in order to ensure an adequate supply voltage. The main objective of this Thesis is to improve the dynamic response of DC-DC converters from the point of view of the power topology. And the term dynamic response refers both to the load steps and the voltage steps; it is also interesting to modulate the output voltage of the converter with a specific bandwidth. In order to accomplish this, the question of what is it that limits the dynamic response of power converters should be answered. Analyzing this question leads to the conclusion that the dynamic response is limited by the power topology and specifically, by the filter inductance of the converter which is found in series between the input and the output of the converter. The series inductance is the one that determines the gain of the converter and provides the regulation capability. Although the energy stored in the filter inductance enables the regulation and the capability of filtering the output voltage, it imposes a limitation which is the concern of this Thesis. The series inductance stores energy and prevents the current from changing in a fast way, limiting the slew rate of the current through this inductor. Different solutions are proposed in the literature in order to reduce the limit imposed by the filter inductor. Many publications proposing new topologies and improvements to known topologies can be found in the literature. Also, complex control strategies are proposed with the objective of improving the dynamic response in power converters. In the proposed topologies, the energy stored in the series inductor is reduced; examples of these topologies are Multiphase converters, Buck converter operating at very high frequency or adding a low impedance path in parallel with the series inductance. Control techniques proposed in the literature, focus on adjusting the output voltage as fast as allowed by the power stage; examples of these control techniques are: hysteresis control, V 2 control, and minimum time control. In some of the proposed topologies, a reduction in the value of the series inductance is achieved and with this, the energy stored in this magnetic element is reduced; less stored energy means a faster dynamic response. However, in some cases (as in the high frequency Buck converter), the dynamic response is improved at the cost of worsening the efficiency. In this Thesis, a drastic solution is proposed: to completely eliminate the series inductance of the converter. This is a more radical solution when compared to those proposed in the literature. If the series inductance is eliminated, the regulation capability of the converter is limited which can make it difficult to use the topology in one-converter solutions; however, this topology is suitable for power architectures where the energy conversion is done by more than one converter. When the series inductor is eliminated from the converter, the current slew rate is no longer limited and it can be said that the dynamic response of the converter is independent from the switching frequency. This is the main advantage of eliminating the series inductor. The main objective, is to propose an energy conversion strategy that is done without series inductance. Without series inductance, no energy is stored between the input and the output of the converter and the dynamic response would be instantaneous if all the devices were ideal. If the energy transfer from the input to the output of the converter is done instantaneously when a load step occurs, conceptually it would not be necessary to store energy at the output of the converter (no output capacitor COUT would be needed) and if the input source is ideal, the input capacitor CIN would not be necessary. This last feature (no CIN with ideal VIN) is common to all power converters. However, when the concept is actually implemented, parasitic inductances such as leakage inductance of the transformer and the parasitic inductance of the PCB, cannot be avoided because they are inherent to the implementation of the converter. These parasitic elements do not affect significantly to the proposed concept. In this Thesis, it is proposed to operate the converter without series inductance in order to improve the dynamic response of the converter; however, on the other side, the continuous regulation capability of the converter is lost. It is said continuous because, as it will be explained throughout the Thesis, it is indeed possible to achieve discrete regulation; a converter without filter inductance and without energy stored in the magnetic element, is capable to achieve a limited number of output voltages. The changes between these output voltage levels are achieved in a fast way. The proposed energy conversion strategy is implemented by means of a multiphase converter where the coupling of the phases is done by discrete two-winding transformers instead of coupledinductors since transformers are, ideally, no energy storing elements. This idea is the main contribution of this Thesis. The feasibility of this energy conversion strategy is first analyzed and then verified by simulation and by the implementation of experimental prototypes. Once the strategy is proved valid, different options to implement the magnetic structure are analyzed. Three different discrete transformer arrangements are studied and implemented. A converter based on this energy conversion strategy would be designed with a different approach than the one used to design classic converters since an additional design degree of freedom is available. The switching frequency can be chosen according to the design specifications without penalizing the dynamic response or the efficiency. Low operating frequencies can be chosen in order to favor the efficiency; on the other hand, high operating frequencies (MHz) can be chosen in order to favor the size of the converter. For this reason, a particular design procedure is proposed for the ‘inductorless’ conversion strategy. Finally, applications where the features of the proposed conversion strategy (high efficiency with fast dynamic response) are advantageus, are proposed. For example, in two-stage power architectures where a high efficiency converter is needed as the first stage and there is a second stage that provides the fine regulation. Another example are RF power amplifiers where the voltage is modulated following an envelope reference in order to save power; in this application, a high efficiency converter, capable of achieving fast voltage steps is required. The main contributions of this Thesis are the following: The proposal of a conversion strategy that is done, ideally, without storing energy in the magnetic element. The validation and the implementation of the proposed energy conversion strategy. The study of different magnetic structures based on discrete transformers for the implementation of the proposed energy conversion strategy. To elaborate and validate a design procedure. To identify and validate applications for the proposed energy conversion strategy. It is important to remark that this work is done in collaboration with Intel. The particular features of the proposed conversion strategy enable the possibility of solving the problems related to microprocessor powering in a different way. For example, the high efficiency achieved with the proposed conversion strategy enables it as a good candidate to be used for power conditioning, as a first stage in a two-stage power architecture for powering microprocessors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Adaptive embedded systems are required in various applications. This work addresses these needs in the area of adaptive image compression in FPGA devices. A simplified version of an evolution strategy is utilized to optimize wavelet filters of a Discrete Wavelet Transform algorithm. We propose an adaptive image compression system in FPGA where optimized memory architecture, parallel processing and optimized task scheduling allow reducing the time of evolution. The proposed solution has been extensively evaluated in terms of the quality of compression as well as the processing time. The proposed architecture reduces the time of evolution by 44% compared to our previous reports while maintaining the quality of compression unchanged with respect to existing implementations. The system is able to find an optimized set of wavelet filters in less than 2 min whenever the input type of data changes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When non linear physical systems of infinite extent are modelled, such as tunnels and perforations, it is necessary to simulate suitably the solution in the infinite as well as the non linearity. The finite element method (FEM) is a well known procedure for simulating the non linear behavior. However, the treatment of the infinite field with domain truncations is often questionable. On the other hand, the boundary element method (BEM) is suitable to simulate the infinite behavior without truncations. Because of this, by the combination of both methods, suitable use of the advantages of each one may be obtained. Several possibilities of FEM-BEM coupling and their performance in some practical cases are discussed in this paper. Parallelizable coupling algorithms based on domain decomposition are developed and compared with the most traditional coupling methods.