258 resultados para Decomposition algorithms
em Indian Institute of Science - Bangalore - Índia
Resumo:
In this paper a new parallel algorithm for nonlinear transient dynamic analysis of large structures has been presented. An unconditionally stable Newmark-beta method (constant average acceleration technique) has been employed for time integration. The proposed parallel algorithm has been devised within the broad framework of domain decomposition techniques. However, unlike most of the existing parallel algorithms (devised for structural dynamic applications) which are basically derived using nonoverlapped domains, the proposed algorithm uses overlapped domains. The parallel overlapped domain decomposition algorithm proposed in this paper has been formulated by splitting the mass, damping and stiffness matrices arises out of finite element discretisation of a given structure. A predictor-corrector scheme has been formulated for iteratively improving the solution in each step. A computer program based on the proposed algorithm has been developed and implemented with message passing interface as software development environment. PARAM-10000 MIMD parallel computer has been used to evaluate the performances. Numerical experiments have been conducted to validate as well as to evaluate the performance of the proposed parallel algorithm. Comparisons have been made with the conventional nonoverlapped domain decomposition algorithms. Numerical studies indicate that the proposed algorithm is superior in performance to the conventional domain decomposition algorithms. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
Coarse Grained Reconfigurable Architectures (CGRA) are emerging as embedded application processing units in computing platforms for Exascale computing. Such CGRAs are distributed memory multi- core compute elements on a chip that communicate over a Network-on-chip (NoC). Numerical Linear Algebra (NLA) kernels are key to several high performance computing applications. In this paper we propose a systematic methodology to obtain the specification of Compute Elements (CE) for such CGRAs. We analyze block Matrix Multiplication and block LU Decomposition algorithms in the context of a CGRA, and obtain theoretical bounds on communication requirements, and memory sizes for a CE. Support for high performance custom computations common to NLA kernels are met through custom function units (CFUs) in the CEs. We present results to justify the merits of such CFUs.
Resumo:
Parallel execution of computational mechanics codes requires efficient mesh-partitioning techniques. These mesh-partitioning techniques divide the mesh into specified number of submeshes of approximately the same size and at the same time, minimise the interface nodes of the submeshes. This paper describes a new mesh partitioning technique, employing Genetic Algorithms. The proposed algorithm operates on the deduced graph (dual or nodal graph) of the given finite element mesh rather than directly on the mesh itself. The algorithm works by first constructing a coarse graph approximation using an automatic graph coarsening method. The coarse graph is partitioned and the results are interpolated onto the original graph to initialise an optimisation of the graph partition problem. In practice, hierarchy of (usually more than two) graphs are used to obtain the final graph partition. The proposed partitioning algorithm is applied to graphs derived from unstructured finite element meshes describing practical engineering problems and also several example graphs related to finite element meshes given in the literature. The test results indicate that the proposed GA based graph partitioning algorithm generates high quality partitions and are superior to spectral and multilevel graph partitioning algorithms.
Resumo:
Purpose: A computationally efficient algorithm (linear iterative type) based on singular value decomposition (SVD) of the Jacobian has been developed that can be used in rapid dynamic near-infrared (NIR) diffuse optical tomography. Methods: Numerical and experimental studies have been conducted to prove the computational efficacy of this SVD-based algorithm over conventional optical image reconstruction algorithms. Results: These studies indicate that the performance of linear iterative algorithms in terms of contrast recovery (quantitation of optical images) is better compared to nonlinear iterative (conventional) algorithms, provided the initial guess is close to the actual solution. The nonlinear algorithms can provide better quality images compared to the linear iterative type algorithms. Moreover, the analytical and numerical equivalence of the SVD-based algorithm to linear iterative algorithms was also established as a part of this work. It is also demonstrated that the SVD-based image reconstruction typically requires O(NN2) operations per iteration, as contrasted with linear and nonlinear iterative methods that, respectively, requir O(NN3) and O(NN6) operations, with ``NN'' being the number of unknown parameters in the optical image reconstruction procedure. Conclusions: This SVD-based computationally efficient algorithm can make the integration of image reconstruction procedure with the data acquisition feasible, in turn making the rapid dynamic NIR tomography viable in the clinic to continuously monitor hemodynamic changes in the tissue pathophysiology.
Resumo:
In this paper, we exploit the idea of decomposition to match buyers and sellers in an electronic exchange for trading large volumes of homogeneous goods, where the buyers and sellers specify marginal-decreasing piecewise constant price curves to capture volume discounts. Such exchanges are relevant for automated trading in many e-business applications. The problem of determining winners and Vickrey prices in such exchanges is known to have a worst-case complexity equal to that of as many as (1 + m + n) NP-hard problems, where m is the number of buyers and n is the number of sellers. Our method proposes the overall exchange problem to be solved as two separate and simpler problems: 1) forward auction and 2) reverse auction, which turns out to be generalized knapsack problems. In the proposed approach, we first determine the quantity of units to be traded between the sellers and the buyers using fast heuristics developed by us. Next, we solve a forward auction and a reverse auction using fully polynomial time approximation schemes available in the literature. The proposed approach has worst-case polynomial time complexity. and our experimentation shows that the approach produces good quality solutions to the problem. Note to Practitioners- In recent times, electronic marketplaces have provided an efficient way for businesses and consumers to trade goods and services. The use of innovative mechanisms and algorithms has made it possible to improve the efficiency of electronic marketplaces by enabling optimization of revenues for the marketplace and of utilities for the buyers and sellers. In this paper, we look at single-item, multiunit electronic exchanges. These are electronic marketplaces where buyers submit bids and sellers ask for multiple units of a single item. We allow buyers and sellers to specify volume discounts using suitable functions. Such exchanges are relevant for high-volume business-to-business trading of standard products, such as silicon wafers, very large-scale integrated chips, desktops, telecommunications equipment, commoditized goods, etc. The problem of determining winners and prices in such exchanges is known to involve solving many NP-hard problems. Our paper exploits the familiar idea of decomposition, uses certain algorithms from the literature, and develops two fast heuristics to solve the problem in a near optimal way in worst-case polynomial time.
Resumo:
There are a number of large networks which occur in many problems dealing with the flow of power, communication signals, water, gas, transportable goods, etc. Both design and planning of these networks involve optimization problems. The first part of this paper introduces the common characteristics of a nonlinear network (the network may be linear, the objective function may be non linear, or both may be nonlinear). The second part develops a mathematical model trying to put together some important constraints based on the abstraction for a general network. The third part deals with solution procedures; it converts the network to a matrix based system of equations, gives the characteristics of the matrix and suggests two solution procedures, one of them being a new one. The fourth part handles spatially distributed networks and evolves a number of decomposition techniques so that we can solve the problem with the help of a distributed computer system. Algorithms for parallel processors and spatially distributed systems have been described.There are a number of common features that pertain to networks. A network consists of a set of nodes and arcs. In addition at every node, there is a possibility of an input (like power, water, message, goods etc) or an output or none. Normally, the network equations describe the flows amoungst nodes through the arcs. These network equations couple variables associated with nodes. Invariably, variables pertaining to arcs are constants; the result required will be flows through the arcs. To solve the normal base problem, we are given input flows at nodes, output flows at nodes and certain physical constraints on other variables at nodes and we should find out the flows through the network (variables at nodes will be referred to as across variables).The optimization problem involves in selecting inputs at nodes so as to optimise an objective function; the objective may be a cost function based on the inputs to be minimised or a loss function or an efficiency function. The above mathematical model can be solved using Lagrange Multiplier technique since the equalities are strong compared to inequalities. The Lagrange multiplier technique divides the solution procedure into two stages per iteration. Stage one calculates the problem variables % and stage two the multipliers lambda. It is shown that the Jacobian matrix used in stage one (for solving a nonlinear system of necessary conditions) occurs in the stage two also.A second solution procedure has also been imbedded into the first one. This is called total residue approach. It changes the equality constraints so that we can get faster convergence of the iterations.Both solution procedures are found to coverge in 3 to 7 iterations for a sample network.The availability of distributed computer systems — both LAN and WAN — suggest the need for algorithms to solve the optimization problems. Two types of algorithms have been proposed — one based on the physics of the network and the other on the property of the Jacobian matrix. Three algorithms have been deviced, one of them for the local area case. These algorithms are called as regional distributed algorithm, hierarchical regional distributed algorithm (both using the physics properties of the network), and locally distributed algorithm (a multiprocessor based approach with a local area network configuration). The approach used was to define an algorithm that is faster and uses minimum communications. These algorithms are found to converge at the same rate as the non distributed (unitary) case.
Resumo:
In this paper we consider the problem of learning an n × n kernel matrix from m(1) similarity matrices under general convex loss. Past research have extensively studied the m = 1 case and have derived several algorithms which require sophisticated techniques like ACCP, SOCP, etc. The existing algorithms do not apply if one uses arbitrary losses and often can not handle m > 1 case. We present several provably convergent iterative algorithms, where each iteration requires either an SVM or a Multiple Kernel Learning (MKL) solver for m > 1 case. One of the major contributions of the paper is to extend the well knownMirror Descent(MD) framework to handle Cartesian product of psd matrices. This novel extension leads to an algorithm, called EMKL, which solves the problem in O(m2 log n 2) iterations; in each iteration one solves an MKL involving m kernels and m eigen-decomposition of n × n matrices. By suitably defining a restriction on the objective function, a faster version of EMKL is proposed, called REKL,which avoids the eigen-decomposition. An alternative to both EMKL and REKL is also suggested which requires only an SVMsolver. Experimental results on real world protein data set involving several similarity matrices illustrate the efficacy of the proposed algorithms.
Resumo:
A necessary step for the recognition of scanned documents is binarization, which is essentially the segmentation of the document. In order to binarize a scanned document, we can find several algorithms in the literature. What is the best binarization result for a given document image? To answer this question, a user needs to check different binarization algorithms for suitability, since different algorithms may work better for different type of documents. Manually choosing the best from a set of binarized documents is time consuming. To automate the selection of the best segmented document, either we need to use ground-truth of the document or propose an evaluation metric. If ground-truth is available, then precision and recall can be used to choose the best binarized document. What is the case, when ground-truth is not available? Can we come up with a metric which evaluates these binarized documents? Hence, we propose a metric to evaluate binarized document images using eigen value decomposition. We have evaluated this measure on DIBCO and H-DIBCO datasets. The proposed method chooses the best binarized document that is close to the ground-truth of the document.
Resumo:
Precise experimental implementation of unitary operators is one of the most important tasks for quantum information processing. Numerical optimization techniques are widely used to find optimized control fields to realize a desired unitary operator. However, finding high-fidelity control pulses to realize an arbitrary unitary operator in larger spin systems is still a difficult task. In this work, we demonstrate that a combination of the GRAPE algorithm, which is a numerical pulse optimization technique, and a unitary operator decomposition algorithm Ajoy et al., Phys. Rev. A 85, 030303 (2012)] can realize unitary operators with high experimental fidelity. This is illustrated by simulating the mirror-inversion propagator of an XY spin chain in a five-spin dipolar coupled nuclear spin system. Further, this simulation has been used to demonstrate the transfer of entangled states from one end of the spin chain to the other end.
Resumo:
The bilateral filter is a versatile non-linear filter that has found diverse applications in image processing, computer vision, computer graphics, and computational photography. A common form of the filter is the Gaussian bilateral filter in which both the spatial and range kernels are Gaussian. A direct implementation of this filter requires O(sigma(2)) operations per pixel, where sigma is the standard deviation of the spatial Gaussian. In this paper, we propose an accurate approximation algorithm that can cut down the computational complexity to O(1) per pixel for any arbitrary sigma (constant-time implementation). This is based on the observation that the range kernel operates via the translations of a fixed Gaussian over the range space, and that these translated Gaussians can be accurately approximated using the so-called Gauss-polynomials. The overall algorithm emerging from this approximation involves a series of spatial Gaussian filtering, which can be efficiently implemented (in parallel) using separability and recursion. We present some preliminary results to demonstrate that the proposed algorithm compares favorably with some of the existing fast algorithms in terms of speed and accuracy.
Resumo:
Recently, efficient scheduling algorithms based on Lagrangian relaxation have been proposed for scheduling parallel machine systems and job shops. In this article, we develop real-world extensions to these scheduling methods. In the first part of the paper, we consider the problem of scheduling single operation jobs on parallel identical machines and extend the methodology to handle multiple classes of jobs, taking into account setup times and setup costs, The proposed methodology uses Lagrangian relaxation and simulated annealing in a hybrid framework, In the second part of the paper, we consider a Lagrangian relaxation based method for scheduling job shops and extend it to obtain a scheduling methodology for a real-world flexible manufacturing system with centralized material handling.
Resumo:
In the context of removal of organic pollutants from wastewater, sonolysis of CCl4 dissolved in water has been widely investigated. These investigations are either completely experimental or correlate data empirically. In this work, a quantitative model is developed to predict the rate of sonolysis of aqueous CCl4. The model considers the isothermal growth and partially adiabatic collapse of cavitation bubbles containing gas and vapor leading to conditions of high temperatures and pressures in them, attainment of thermodynamic equilibrium at the end of collapse, release of bubble contents into the liquid pool, and reactions in the well-mixed pool. The model successfully predicts the extent of degradation of dissolved CCl4, and the influence of various parameters such as initial concentration of CCl4, temperature, and nature of gas atmosphere above the liquid. in particular, it predicts the results of Hua and Hoffmann (Environ. Sci Technol, 1996, 30, 864-871), who found that degradation is first order with CCl4 and that Argon as well as Ar-O-3 atmospheres give the same results. The framework of the model is capable of quantitatively describing the degradation of many dissolved organics by considering all the involved species.
Resumo:
A computational study for the convergence acceleration of Euler and Navier-Stokes computations with upwind schemes has been conducted in a unified framework. It involves the flux-vector splitting algorithms due to Steger-Warming and Van Leer, the flux-difference splitting algorithms due to Roe and Osher and the hybrid algorithms, AUSM (Advection Upstream Splitting Method) and HUS (Hybrid Upwind Splitting). Implicit time integration with line Gauss-Seidel relaxation and multigrid are among the procedures which have been systematically investigated on an individual as well as cumulative basis. The upwind schemes have been tested in various implicit-explicit operator combinations such that the optimal among them can be determined based on extensive computations for two-dimensional flows in subsonic, transonic, supersonic and hypersonic flow regimes. In this study, the performance of these implicit time-integration procedures has been systematically compared with those corresponding to a multigrid accelerated explicit Runge-Kutta method. It has been demonstrated that a multigrid method employed in conjunction with an implicit time-integration scheme yields distinctly superior convergence as compared to those associated with either of the acceleration procedures provided that effective smoothers, which have been identified in this investigation, are prescribed in the implicit operator.
Resumo:
Doppler weather radars with fast scanning rates must estimate spectral moments based on a small number of echo samples. This paper concerns the estimation of mean Doppler velocity in a coherent radar using a short complex time series. Specific results are presented based on 16 samples. A wide range of signal-to-noise ratios are considered, and attention is given to ease of implementation. It is shown that FFT estimators fare poorly in low SNR and/or high spectrum-width situations. Several variants of a vector pulse-pair processor are postulated and an algorithm is developed for the resolution of phase angle ambiguity. This processor is found to be better than conventional processors at very low SNR values. A feasible approximation to the maximum entropy estimator is derived as well as a technique utilizing the maximization of the periodogram. It is found that a vector pulse-pair processor operating with four lags for clear air observation and a single lag (pulse-pair mode) for storm observation may be a good way to estimate Doppler velocities over the entire gamut of weather phenomena.
Resumo:
Prior ultraviolet irradiation of coal results in catalysing the subsequent thermal decomposition and ignition of coal. Mechanically, it is shown that ultraviolet radiation brings about the catalysis by acting on the inorganic components of coal.