933 resultados para PARALLEL COMPUTING
Resumo:
This article describes a maximum likelihood method for estimating the parameters of the standard square-root stochastic volatility model and a variant of the model that includes jumps in equity prices. The model is fitted to data on the S&P 500 Index and the prices of vanilla options written on the index, for the period 1990 to 2011. The method is able to estimate both the parameters of the physical measure (associated with the index) and the parameters of the risk-neutral measure (associated with the options), including the volatility and jump risk premia. The estimation is implemented using a particle filter whose efficacy is demonstrated under simulation. The computational load of this estimation method, which previously has been prohibitive, is managed by the effective use of parallel computing using graphics processing units (GPUs). The empirical results indicate that the parameters of the models are reliably estimated and consistent with values reported in previous work. In particular, both the volatility risk premium and the jump risk premium are found to be significant.
Resumo:
A numerical scheme is presented for accurate simulation of fluid flow using the lattice Boltzmann equation (LBE) on unstructured mesh. A finite volume approach is adopted to discretize the LBE on a cell-centered, arbitrary shaped, triangular tessellation. The formulation includes a formal, second order discretization using a Total Variation Diminishing (TVD) scheme for the terms representing advection of the distribution function in physical space, due to microscopic particle motion. The advantage of the LBE approach is exploited by implementing the scheme in a new computer code to run on a parallel computing system. Performance of the new formulation is systematically investigated by simulating four benchmark flows of increasing complexity, namely (1) flow in a plane channel, (2) unsteady Couette flow, (3) flow caused by a moving lid over a 2D square cavity and (4) flow over a circular cylinder. For each of these flows, the present scheme is validated with the results from Navier-Stokes computations as well as lattice Boltzmann simulations on regular mesh. It is shown that the scheme is robust and accurate for the different test problems studied.
Resumo:
This paper presents an inverse dynamic formulation by the Newton–Euler approach for the Stewart platform manipulator of the most general architecture and models all the dynamic and gravity effects as well as the viscous friction at the joints. It is shown that a proper elimination procedure results in a remarkably economical and fast algorithm for the solution of actuator forces, which makes the method quite suitable for on-line control purposes. In addition, the parallelism inherent in the manipulator and in the modelling makes the algorithm quite efficient in a parallel computing environment, where it can be made as fast as the corresponding formulation for the 6-dof serial manipulator. The formulation has been implemented in a program and has been used for a few trajectories planned for a test manipulator. Results of simulation presented in the paper reveal the nature of the variation of actuator forces in the Stewart platform and justify the dynamic modelling for control.
Resumo:
With the advent of VLSI it has become possible to map parallel algorithms for compute-bound problems directly on silicon. Systolic architecture is very good candidate for VLSI implementation because of its regular and simple design, and regular communication pattern. In this paper, a systolic algorithm and corresponding systolic architecture, a linear systolic array, for the scanline-based hidden surface removal problem in three-dimensional computer graphics have been proposed. The algorithm is based on the concept of sample spans or intervals. The worst case time taken by the algorithm is O(n), n being the number of segments in a scanline. The time taken by the algorithm for a given scene depends on the scene itself, and on an average considerable improvement over the worst case behaviour is expected. A pipeline scheme for handling the I/O process has also been proposed which is suitable for VLSI implementation of the algorithm.
Resumo:
Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses. In this paper, we propose compiler scheduling algorithms targeting two previously ignored power-hungry components in clustered VLIW architectures, viz., instruction decoder and register file. We consider a split decoder design and propose a new energy-aware instruction scheduling algorithm that provides 14.5% and 17.3% benefit in the decoder power consumption on an average over a purely hardware based scheme in the context of 2-clustered and 4-clustered VLIW machines. In the case of register files, we propose two new scheduling algorithms that exploit limited register snooping capability to reduce extra register file accesses. The proposed algorithms reduce register file power consumption on an average by 6.85% and 11.90% (10.39% and 17.78%), respectively, along with performance improvement of 4.81% and 5.34% (9.39% and 11.16%) over a traditional greedy algorithm for 2-clustered (4-clustered) VLIW machine. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A state-of-the-art model of the coupled ocean-atmosphere system, the climate forecast system (CFS), from the National Centres for Environmental Prediction (NCEP), USA, has been ported onto the PARAM Padma parallel computing system at the Centre for Development of Advanced Computing (CDAC), Bangalore and retrospective predictions for the summer monsoon (June-September) season of 2009 have been generated, using five initial conditions for the atmosphere and one initial condition for the ocean for May 2009. Whereas a large deficit in the Indian summer monsoon rainfall (ISMR; June-September) was experienced over the Indian region (with the all-India rainfall deficit by 22% of the average), the ensemble average prediction was for above-average rainfall during the summer monsoon. The retrospective predictions of ISMR with CFS from NCEP for 1981-2008 have been analysed. The retrospective predictions from NCEP for the summer monsoon of 1994 and that from CDAC for 2009 have been compared with the simulations for each of the seasons with the stand-alone atmospheric component of the model, the global forecast system (GFS), and observations. It has been shown that the simulation with GFS for 2009 showed deficit rainfall as observed. The large error in the prediction for the monsoon of 2009 can be attributed to a positive Indian Ocean Dipole event seen in the prediction from July onwards, which was not present in the observations. This suggests that the error could be reduced with improvement of the ocean model over the equatorial Indian Ocean.
Resumo:
This paper presents an introduction to neurocomputers and an overview of the history of neurocomputers. Direct implementation methods of neurocomputers using techniques from microelectronics and photonics are discussed. Emulation methods using special-purpose hardware are highlighted. The role of parallel computing systems for improved performance is introduced. Some commercially available neurocomputers and performance issues of such systems are also presented.
Resumo:
Over past few years, the studies of cultured neuronal networks have opened up avenues for understanding the ion channels, receptor molecules, and synaptic plasticity that may form the basis of learning and memory. The hippocampal neurons from rats are dissociated and cultured on a surface containing a grid of 64 electrodes. The signals from these 64 electrodes are acquired using a fast data acquisition system MED64 (Alpha MED Sciences, Japan) at a sampling rate of 20 K samples with a precision of 16-bits per sample. A few minutes of acquired data runs in to a few hundreds of Mega Bytes. The data processing for the neural analysis is highly compute-intensive because the volume of data is huge. The major processing requirements are noise removal, pattern recovery, pattern matching, clustering and so on. In order to interface a neuronal colony to a physical world, these computations need to be performed in real-time. A single processor such as a desk top computer may not be adequate to meet this computational requirements. Parallel computing is a method used to satisfy the real-time computational requirements of a neuronal system that interacts with an external world while increasing the flexibility and scalability of the application. In this work, we developed a parallel neuronal system using a multi-node Digital Signal processing system. With 8 processors, the system is able to compute and map incoming signals segmented over a period of 200 ms in to an action in a trained cluster system in real time.
Resumo:
Precision, sophistication and economic factors in many areas of scientific research that demand very high magnitude of compute power is the order of the day. Thus advance research in the area of high performance computing is getting inevitable. The basic principle of sharing and collaborative work by geographically separated computers is known by several names such as metacomputing, scalable computing, cluster computing, internet computing and this has today metamorphosed into a new term known as grid computing. This paper gives an overview of grid computing and compares various grid architectures. We show the role that patterns can play in architecting complex systems, and provide a very pragmatic reference to a set of well-engineered patterns that the practicing developer can apply to crafting his or her own specific applications. We are not aware of pattern-oriented approach being applied to develop and deploy a grid. There are many grid frameworks that are built or are in the process of being functional. All these grids differ in some functionality or the other, though the basic principle over which the grids are built is the same. Despite this there are no standard requirements listed for building a grid. The grid being a very complex system, it is mandatory to have a standard Software Architecture Specification (SAS). We attempt to develop the same for use by any grid user or developer. Specifically, we analyze the grid using an object oriented approach and presenting the architecture using UML. This paper will propose the usage of patterns at all levels (analysis. design and architectural) of the grid development.
Resumo:
Unending quest for performance improvement coupled with the advancements in integrated circuit technology have led to the development of new architectural paradigm. Speculative multithreaded architecture (SpMT) philosophy relies on aggressive speculative execution for improved performance. However, aggressive speculative execution comes with a mixed flavor of improving performance, when successful, and adversely affecting the energy consumption (and performance) because of useless computation in the event of mis-speculation. Dynamic instruction criticality information can be usefully applied to control and guide such an aggressive speculative execution. In this paper, we present a model of micro-execution for SpMT architecture that we have developed to determine the dynamic instruction criticality. We have also developed two novel techniques utilizing the criticality information namely delaying the non-critical loads and the criticality based thread-prediction for reducing useless computations and energy consumption. Experimental results showing break-up of critical instructions and effectiveness of proposed techniques in reducing energy consumption are presented in the context of multiscalar processor that implements SpMT architecture. Our experiments show 17.7% and 11.6% reduction in dynamic energy for criticality based thread prediction and criticality based delayed load scheme respectively while the improvement in dynamic energy delay product is 13.9% and 5.5%, respectively. (c) 2012 Published by Elsevier B.V.
Resumo:
Decoherence as an obstacle in quantum computation is viewed as a struggle between two forces [1]: the computation which uses the exponential dimension of Hilbert space, and decoherence which destroys this entanglement by collapse. In this model of decohered quantum computation, a sequential quantum computer loses the battle, because at each time step, only a local operation is carried out but g*(t) number of gates collapse. With quantum circuits computing in parallel way the situation is different- g(t) number of gates can be applied at each time step and number gates collapse because of decoherence. As g(t) ≈ g*(t) competition here is even [1]. Our paper improves on this model by slowing down g*(t) by encoding the circuit in parallel computing architectures and running it in Single Instruction Multiple Data (SIMD) paradigm. We have proposed a parallel ion trap architecture for single-bit rotation of a qubit.
Resumo:
We present a method of rapidly producing computer-generated holograms that exhibit geometric occlusion in the reconstructed image. Conceptually, a bundle of rays is shot from every hologram sample into the object volume.We use z buffering to find the nearest intersecting object point for every ray and add its complex field contribution to the corresponding hologram sample. Each hologram sample belongs to an independent operation, allowing us to exploit the parallel computing capability of modern programmable graphics processing units (GPUs). Unlike algorithms that use points or planar segments as the basis for constructing the hologram, our algorithm's complexity is dependent on fixed system parameters, such as the number of ray-casting operations, and can therefore handle complicated models more efficiently. The finite number of hologram pixels is, in effect, a windowing function, and from analyzing the Wigner distribution function of windowed free-space transfer function we find an upper limit on the cone angle of the ray bundle. Experimentally, we found that an angular sampling distance of 0:01' for a 2:66' cone angle produces acceptable reconstruction quality. © 2009 Optical Society of America.
Resumo:
Computer generated holography is an extremely demanding and complex task when it comes to providing realistic reconstructions with full parallax, occlusion, and shadowing. We present an algorithm designed for data-parallel computing on modern graphics processing units to alleviate the computational burden. We apply Gaussian interpolation to create a continuous surface representation from discrete input object points. The algorithm maintains a potential occluder list for each individual hologram plane sample to keep the number of visibility tests to a minimum.We experimented with two approximations that simplify and accelerate occlusion computation. It is observed that letting several neighboring hologramplane samples share visibility information on object points leads to significantly faster computation without causing noticeable artifacts in the reconstructed images. Computing a reduced sample set via nonuniform sampling is also found to be an effective acceleration technique. © 2009 Optical Society of America.
Resumo:
In this paper we introduce four scenario Cluster based Lagrangian Decomposition (CLD) procedures for obtaining strong lower bounds to the (optimal) solution value of two-stage stochastic mixed 0-1 problems. At each iteration of the Lagrangian based procedures, the traditional aim consists of obtaining the solution value of the corresponding Lagrangian dual via solving scenario submodels once the nonanticipativity constraints have been dualized. Instead of considering a splitting variable representation over the set of scenarios, we propose to decompose the model into a set of scenario clusters. We compare the computational performance of the four Lagrange multiplier updating procedures, namely the Subgradient Method, the Volume Algorithm, the Progressive Hedging Algorithm and the Dynamic Constrained Cutting Plane scheme for different numbers of scenario clusters and different dimensions of the original problem. Our computational experience shows that the CLD bound and its computational effort depend on the number of scenario clusters to consider. In any case, our results show that the CLD procedures outperform the traditional LD scheme for single scenarios both in the quality of the bounds and computational effort. All the procedures have been implemented in a C++ experimental code. A broad computational experience is reported on a test of randomly generated instances by using the MIP solvers COIN-OR and CPLEX for the auxiliary mixed 0-1 cluster submodels, this last solver within the open source engine COIN-OR. We also give computational evidence of the model tightening effect that the preprocessing techniques, cut generation and appending and parallel computing tools have in stochastic integer optimization. Finally, we have observed that the plain use of both solvers does not provide the optimal solution of the instances included in the testbed with which we have experimented but for two toy instances in affordable elapsed time. On the other hand the proposed procedures provide strong lower bounds (or the same solution value) in a considerably shorter elapsed time for the quasi-optimal solution obtained by other means for the original stochastic problem.
Resumo:
This thesis presents a new approach for the numerical solution of three-dimensional problems in elastodynamics. The new methodology, which is based on a recently introduced Fourier continuation (FC) algorithm for the solution of Partial Differential Equations on the basis of accurate Fourier expansions of possibly non-periodic functions, enables fast, high-order solutions of the time-dependent elastic wave equation in a nearly dispersionless manner, and it requires use of CFL constraints that scale only linearly with spatial discretizations. A new FC operator is introduced to treat Neumann and traction boundary conditions, and a block-decomposed (sub-patch) overset strategy is presented for implementation of general, complex geometries in distributed-memory parallel computing environments. Our treatment of the elastic wave equation, which is formulated as a complex system of variable-coefficient PDEs that includes possibly heterogeneous and spatially varying material constants, represents the first fully-realized three-dimensional extension of FC-based solvers to date. Challenges for three-dimensional elastodynamics simulations such as treatment of corners and edges in three-dimensional geometries, the existence of variable coefficients arising from physical configurations and/or use of curvilinear coordinate systems and treatment of boundary conditions, are all addressed. The broad applicability of our new FC elasticity solver is demonstrated through application to realistic problems concerning seismic wave motion on three-dimensional topographies as well as applications to non-destructive evaluation where, for the first time, we present three-dimensional simulations for comparison to experimental studies of guided-wave scattering by through-thickness holes in thin plates.