54 resultados para PDE-based parallel preconditioner
em Universidad Politécnica de Madrid
Resumo:
A contribution is presented, intended to provide theoretical foundations for the ongoing efforts to employ global instability theory for the analysis of the classic boundary-layer flow, and address the associated issue of appropriate inflow/outflow boundary conditions to close the PDE-based global eigenvalue problem in open flows. Starting from a theoretically clean and numerically simple application, in which results are also known analytically and thus serve as a guidance for the assessment of the performance of the numerical methods employed herein, a sequence of issues is systematically built into the target application, until we arrive at one representative of open systems whose instability is presently addressed by global linear theory applied to open flows, the latter application being neither tractable theoretically nor straightforward to solve by numerical means. Experience gained along the way is documented. It regards quantification of the depar- ture of the numerical solution from the analytical one in the simple problem, the generation of numerical boundary layers at artificially truncated boundaries, no matter how far the latter are placed from the region of highest flow gradients and, ultimately the impracti- cally large number of (direct and adjoint) modes necessary to project an arbitrary initial perturbation and follow its temporal evolution by a global analysis approach, a finding which may question the purported robustness reported in the literature of the recovery of optimal perturbations as part of global analyses yielding under-resolved eigenspectra.
Resumo:
Instability analysis of compressible orthogonal swept leading-edge boundary layer flow was performed in the context of BiGlobal linear theory. 1, 2 An algorithm was developed exploiting the sparsity characteristics of the matrix discretizing the PDE-based eigenvalue problem. This allowed use of the MUMPS sparse linear algebra package 3 to obtain a direct solution of the linear systems associated with the Arnoldi iteration. The developed algorithm was then applied to efficiently analyze the effect of compressibility on the stability of the swept leading-edge boundary layer and obtain neutral curves of this flow as a function of the Mach number in the range 0 ≤ Ma ≤ 1. The present numerical results fully confirmed the asymptotic theory results of Theofilis et al. 4 Up to the maximum Mach number value studied, it was found that an increase of this parameter reduces the critical Reynolds number and the range of the unstable spanwise wavenumbers.
Resumo:
The aim of this thesis is to study the mechanisms of instability that occur in swept wings when the angle of attack increases. For this, a simplified model for the a simplified model for the non-orthogonal swept leading edge boundary layer has been used as well as different numerical techniques in order to solve the linear stability problem that describes the behavior of perturbations superposed upon this base flow. Two different approaches, matrix-free and matrix forming methods, have been validated using direct numerical simulations with spectral resolution. In this way, flow instability in the non-orthogonal swept attachment-line boundary layer is addressed in a linear analysis framework via the solution of the pertinent global (Bi-Global) PDE-based eigenvalue problem. Subsequently, a simple extension of the extended G¨ortler-H¨ammerlin ODEbased polynomial model proposed by Theofilis, Fedorov, Obrist & Dallmann (2003) for orthogonal flow, which includes previous models as particular cases and recovers global instability analysis results, is presented for non-orthogonal flow. Direct numerical simulations have been used to verify the stability results and unravel the limits of validity of the basic flow model analyzed. The effect of the angle of attack, AoA, on the critical conditions of the non-orthogonal problem has been documented; an increase of the angle of attack, from AoA = 0 (orthogonal flow) up to values close to _/2 which make the assumptions under which the basic flow is derived questionable, is found to systematically destabilize the flow. The critical conditions of non-orthogonal flows at 0 _ AoA _ _/2 are shown to be recoverable from those of orthogonal flow, via a simple analytical transformation involving AoA. These results can help to understand the mechanisms of destabilization that occurs in the attachment line of wings at finite angles of attack. Studies taking into account variations of the pressure field in the basic flow or the extension to compressible flows are issues that remain open. El objetivo de esta tesis es estudiar los mecanismos de la inestabilidad que se producen en ciertos dispositivos aerodinámicos cuando se aumenta el ángulo de ataque. Para ello se ha utilizado un modelo simplificado del flujo de base, así como diferentes técnicas numéricas, con el fin de resolver el problema de estabilidad lineal asociado que describe el comportamiento de las perturbaciones. Estos métodos; sin y con formación de matriz, se han validado utilizando simulaciones numéricas directas con resolución espectral. De esta manera, la inestabilidad del flujo de capa límite laminar oblicuo entorno a la línea de estancamiento se aborda en un marco de análisis lineal por medio del método Bi-Global de resolución del problema de valores propios en derivadas parciales. Posteriormente se propone una extensión simple para el flujo no-ortogonal del modelo polinomial de ecuaciones diferenciales ordinarias, G¨ortler-H¨ammerlin extendido, propuesto por Theofilis et al. (2003) para el flujo ortogonal, que incluye los modelos previos como casos particulares y recupera los resultados del analisis global de estabilidad lineal. Se han realizado simulaciones directas con el fin de verificar los resultados del análisis de estabilidad así como para investigar los límites de validez del modelo de flujo base utilizado. En este trabajo se ha documentado el efecto del ángulo de ataque AoA en las condiciones críticas del problema no ortogonal obteniendo que el incremento del ángulo de ataque, de AoA = 0 (flujo ortogonal) hasta valores próximos a _/2, en el cual las hipótesis sobre las que se basa el flujo base dejan de ser válidas, tiende sistemáticamente a desestabilizar el flujo. Las condiciones críticas del caso no ortogonal 0 _ AoA _ _/2 pueden recuperarse a partir del caso ortogonal mediante el uso de una transformación analítica simple que implica el ángulo de ataque AoA. Estos resultados pueden ayudar a comprender los mecanismos de desestabilización que se producen en el borde de ataque de las alas de los aviones a ángulos de ataque finitos. Como tareas pendientes quedaría realizar estudios que tengan en cuenta variaciones del campo de presión en el flujo base así como la extensión de éste al caso de flujos compresibles.
Resumo:
We argüe that in order to exploit both Independent And- and Or-parallelism in Prolog programs there is advantage in recomputing some of the independent goals, as opposed to all their solutions being reused. We present an abstract model, called the Composition-Tree, for representing and-or parallelism in Prolog Programs. The Composition-tree closely mirrors sequential Prolog execution by recomputing some independent goals rather than fully re-using them. We also outline two environment representation techniques for And-Or parallel execution of full Prolog based on the Composition-tree model abstraction. We argüe that these techniques have advantages over earlier proposals for exploiting and-or parallelism in Prolog.
Resumo:
In this paper we present a novel execution model for parallel implementation of logic programs which is capable of exploiting both independent and-parallelism and or-parallelism in an efficient way. This model extends the stack copying approach, which has been successfully applied in the Muse system to implement or-parallelism, by integrating it with proven techniques used to support independent and-parallelism. We show how all solutions to non-deterministic andparallel goals are found without repetitions. This is done through recomputation as in Prolog (and in various and-parallel systems, like &-Prolog and DDAS), i.e., solutions of and-parallel goals are not shared. We propose a scheme for the efficient management of the address space in a way that is compatible with the apparently incompatible requirements of both and- and or-parallelism. We also show how the full Prolog language, with all its extra-logical features, can be supported in our and-or parallel system so that its sequential semantics is preserved. The resulting system retains the advantages of both purely or-parallel systems as well as purely and-parallel systems. The stack copying scheme together with our proposed memory management scheme can also be used to implement models that combine dependent and-parallelism and or-parallelism, such as Andorra and Prometheus.
Resumo:
We argüe that in order to exploit both Independent And- and Or-parallelism in Prolog programs there is advantage in recomputing some of the independent goals, as opposed to all their solutions being reused. We present an abstract model, called the Composition- Tree, for representing and-or parallelism in Prolog Programs. The Composition-tree closely mirrors sequential Prolog execution by recomputing some independent goals rather than fully re-using them. We also outline two environment representation techniques for And-Or parallel execution of full Prolog based on the Composition-tree model abstraction. We argüe that these techniques have advantages over earlier proposals for exploiting and-or parallelism in Prolog.
Resumo:
We discuss several issues involved in the implementation of ACE, a model capable of exploiting both And-parallelism and Or-parallelism in Prolog in a unified framework. The Orparallel model that ACE employs is based on the idea of stack-copying developed for Muse, while the model of independent And-parallelism is based on the distributed stack approach of &-Prolog. We discuss the organization of the workers, a number of sharing assumtions, techniques for work load detection, and issues relaed to which parts need to be copied when a flexible and-scheduling strategy is used.
Resumo:
The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved since the advent of the first interpreters. However, higher inference speeds are still required in order to meet the demands of applications such as those contemplated for next generation computer systems. The execution of logic programs in parallel is currently considered a promising strategy for attaining such inference speeds. Logic Programming in turn appears as a suitable programming paradigm for parallel architectures because of the many opportunities for parallel execution present in the implementation of logic programs. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an "Abstract Machine" level suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and therefore the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-Parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set. This design is based on extending to a parallel environment the techniques introduced by the Warren Abstract Machine, which have already made very fast and space efficient sequential systems a reality. Therefore, the model herein presented is capable of retaining sequential execution speed similar to that of high performance sequential systems, while extracting additional gains in speed by efficiently implementing parallel execution. These claims are supported by simulations of the Abstract Machine on sample programs.
Resumo:
In this work, the dimensional synthesis of a spherical Parallel Manipulator (PM) with a -1S kinematic chain is presented. The goal of the synthesis is to find a set of parameters that defines the PM with the best performance in terms of workspace capabilities, dexterity and isotropy. The PM is parametrized in terms of a reference element, and a non-directed search of these parameters is carried out. First, the inverse kinematics and instantaneous kinematics of the mechanism are presented. The latter is found using the screw theory formulation. An algorithm that explores a bounded set of parameters and determines the corresponding value of global indexes is presented. The concepts of a novel global performance index and a compound index are introduced. Simulation results are shown and discussed. The best PMs found in terms of each performance index evaluated are locally analyzed in terms of its workspace and local dexterity. The relationship between the performance of the PM and its parameters is discussed, and a prototype with the best performance in terms of the compound index is presented and analyzed.
Resumo:
Nowadays robots have made their way into real applications that were prohibitive and unthinkable thirty years ago. This is mainly due to the increase in power computations and the evolution in the theoretical field of robotics and control. Even though there is plenty of information in the current literature on this topics, it is not easy to find clear concepts of how to proceed in order to design and implement a controller for a robot. In general, the design of a controller requires of a complete understanding and knowledge of the system to be controlled. Therefore, for advanced control techniques the systems must be first identified. Once again this particular objective is cumbersome and is never straight forward requiring of great expertise and some criteria must be adopted. On the other hand, the particular problem of designing a controller is even more complex when dealing with Parallel Manipulators (PM), since their closed-loop structures give rise to a highly nonlinear system. Under this basis the current work is developed, which intends to resume and gather all the concepts and experiences involve for the control of an Hydraulic Parallel Manipulator. The main objective of this thesis is to provide a guide remarking all the steps involve in the designing of advanced control technique for PMs. The analysis of the PM under study is minced up to the core of the mechanism: the hydraulic actuators. The actuators are modeled and experimental identified. Additionally, some consideration regarding traditional PID controllers are presented and an adaptive controller is finally implemented. From a macro perspective the kinematic and dynamic model of the PM are presented. Based on the model of the system and extending the adaptive controller of the actuator, a control strategy for the PM is developed and its performance is analyzed with simulation.
Resumo:
We have developed a new projector model specifically tailored for fast list-mode tomographic reconstructions in Positron emission tomography (PET) scanners with parallel planar detectors. The model provides an accurate estimation of the probability distribution of coincidence events defined by pairs of scintillating crystals. This distribution is parameterized with 2D elliptical Gaussian functions defined in planes perpendicular to the main axis of the tube of response (TOR). The parameters of these Gaussian functions have been obtained by fitting Monte Carlo simulations that include positron range, acolinearity of gamma rays, as well as detector attenuation and scatter effects. The proposed model has been applied efficiently to list-mode reconstruction algorithms. Evaluation with Monte Carlo simulations over a rotating high resolution PET scanner indicates that this model allows to obtain better recovery to noise ratio in OSEM (ordered-subsets, expectation-maximization) reconstruction, if compared to list-mode reconstruction with symmetric circular Gaussian TOR model, and histogram-based OSEM with precalculated system matrix using Monte Carlo simulated models and symmetries.
Resumo:
This paper outlines the problems found in the parallelization of SPH (Smoothed Particle Hydrodynamics) algorithms using Graphics Processing Units. Different results of some parallel GPU implementations in terms of the speed-up and the scalability compared to the CPU sequential codes are shown. The most problematic stage in the GPU-SPH algorithms is the one responsible for locating neighboring particles and building the vectors where this information is stored, since these specific algorithms raise many dificulties for a data-level parallelization. Because of the fact that the neighbor location using linked lists does not show enough data-level parallelism, two new approaches have been pro- posed to minimize bank conflicts in the writing and subsequent reading of the neighbor lists. The first strategy proposes an efficient coordination between CPU-GPU, using GPU algorithms for those stages that allow a straight forward parallelization, and sequential CPU algorithms for those instructions that involve some kind of vector reduction. This coordination provides a relatively orderly reading of the neighbor lists in the interactions stage, achieving a speed-up factor of x47 in this stage. However, since the construction of the neighbor lists is quite expensive, it is achieved an overall speed-up of x41. The second strategy seeks to maximize the use of the GPU in the neighbor's location process by executing a specific vector sorting algorithm that allows some data-level parallelism. Al- though this strategy has succeeded in improving the speed-up on the stage of neighboring location, the global speed-up on the interactions stage falls, due to inefficient reading of the neighbor vectors. Some changes to these strategies are proposed, aimed at maximizing the computational load of the GPU and using the GPU texture-units, in order to reach the maximum speed-up for such codes. Different practical applications have been added to the mentioned GPU codes. First, the classical dam-break problem is studied. Second, the wave impact of the sloshing fluid contained in LNG vessel tanks is also simulated as a practical example of particle methods
Resumo:
Irregular computations pose sorne of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures, which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. Starting in the mid 80s there has been significant progress in the development of parallelizing compilers for logic programming (and more recently, constraint programming) resulting in quite capable parallelizers. The typical applications of these paradigms frequently involve irregular computations, and make heavy use of dynamic data structures with pointers, since logical variables represent in practice a well-behaved form of pointers. This arguably makes the techniques used in these compilers potentially interesting. In this paper, we introduce in a tutoríal way, sorne of the problems faced by parallelizing compilers for logic and constraint programs and provide pointers to sorne of the significant progress made in the area. In particular, this work has resulted in a series of achievements in the areas of inter-procedural pointer aliasing analysis for independence detection, cost models and cost analysis, cactus-stack memory management, techniques for managing speculative and irregular computations through task granularity control and dynamic task allocation such as work-stealing schedulers), etc.
Resumo:
This paper presents a theoretical analysis and an optimization method for envelope amplifier. Highly efficient envelope amplifiers based on a switching converter in parallel or series with a linear regulator have been analyzed and optimized. The results of the optimization process have been shown and these two architectures are compared regarding their complexity and efficiency. The optimization method that is proposed is based on the previous knowledge about the transmitted signal type (OFDM, WCDMA...) and it can be applied to any signal type as long as the envelope probability distribution is known. Finally, it is shown that the analyzed architectures have an inherent efficiency limit.
Resumo:
Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also,a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with medium to large parallel execution overheads.