Biblioteca Digital

963 resultados para Picard iteration

Tiling stencil computations to maximize parallelism

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most stencil computations allow tile-wise concurrent start, i.e., there always exists a face of the iteration space and a set of tiling hyperplanes such that all tiles along that face can be started concurrently. This provides load balance and maximizes parallelism. However, existing automatic tiling frameworks often choose hyperplanes that lead to pipelined start-up and load imbalance. We address this issue with a new tiling technique that ensures concurrent start-up as well as perfect load-balance whenever possible. We first provide necessary and sufficient conditions on tiling hyperplanes to enable concurrent start for programs with affine data accesses. We then provide an approach to find such hyperplanes. Experimental evaluation on a 12-core Intel Westmere shows that our code is able to outperform a tuned domain-specific stencil code generator by 4% to 27%, and previous compiler techniques by a factor of 2x to 10.14x.

Runtime dependence computation and execution of loops on heterogeneous systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

GPUs have been used for parallel execution of DOALL loops. However, loops with indirect array references can potentially cause cross iteration dependences which are hard to detect using existing compilation techniques. Applications with such loops cannot easily use the GPU and hence do not benefit from the tremendous compute capabilities of GPUs. In this paper, we present an algorithm to compute at runtime the cross iteration dependences in such loops. The algorithm uses both the CPU and the GPU to compute the dependences. Specifically, it effectively uses the compute capabilities of the GPU to quickly collect the memory accesses performed by the iterations by executing the slice functions generated for the indirect array accesses. Using the dependence information, the loop iterations are levelized such that each level contains independent iterations which can be executed in parallel. Another interesting aspect of the proposed solution is that it pipelines the dependence computation of the future level with the actual computation of the current level to effectively utilize the resources available in the GPU. We use NVIDIA Tesla C2070 to evaluate our implementation using benchmarks from Polybench suite and some synthetic benchmarks. Our experiments show that the proposed technique can achieve an average speedup of 6.4x on loops with a reasonable number of cross iteration dependences.

A novel MCMC algorithm for near-optimal detection in large-scale uplink mulituser MIMO systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we propose a low-complexity algorithm based on Markov chain Monte Carlo (MCMC) technique for signal detection on the uplink in large scale multiuser multiple input multiple output (MIMO) systems with tens to hundreds of antennas at the base station (BS) and similar number of uplink users. The algorithm employs a randomized sampling method (which makes a probabilistic choice between Gibbs sampling and random sampling in each iteration) for detection. The proposed algorithm alleviates the stalling problem encountered at high SNRs in conventional MCMC algorithm and achieves near-optimal performance in large systems with M-QAM. A novel ingredient in the algorithm that is responsible for achieving near-optimal performance at low complexities is the joint use of a randomized MCMC (R-MCMC) strategy coupled with a multiple restart strategy with an efficient restart criterion. Near-optimal detection performance is demonstrated for large number of BS antennas and users (e.g., 64, 128, 256 BS antennas/users).

Large-Scale Elastic Net Regularized Linear Classification SVMs and Logistic Regression

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Elastic Net Regularizers have shown much promise in designing sparse classifiers for linear classification. In this work, we propose an alternating optimization approach to solve the dual problems of elastic net regularized linear classification Support Vector Machines (SVMs) and logistic regression (LR). One of the sub-problems turns out to be a simple projection. The other sub-problem can be solved using dual coordinate descent methods developed for non-sparse L2-regularized linear SVMs and LR, without altering their iteration complexity and convergence properties. Experiments on very large datasets indicate that the proposed dual coordinate descent - projection (DCD-P) methods are fast and achieve comparable generalization performance after the first pass through the data, with extremely sparse models.

Optimal sequential wireless relay placement on a random lattice path

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Our work is motivated by impromptu (or ``as-you-go'') deployment of wireless relay nodes along a path, a need that arises in many situations. In this paper, the path is modeled as starting at the origin (where there is the data sink, e.g., the control center), and evolving randomly over a lattice in the positive quadrant. A person walks along the path deploying relay nodes as he goes. At each step, the path can, randomly, either continue in the same direction or take a turn, or come to an end, at which point a data source (e.g., a sensor) has to be placed, that will send packets to the data sink. A decision has to be made at each step whether or not to place a wireless relay node. Assuming that the packet generation rate by the source is very low, and simple link-by-link scheduling, we consider the problem of sequential relay placement so as to minimize the expectation of an end-to-end cost metric (a linear combination of the sum of convex hop costs and the number of relays placed). This impromptu relay placement problem is formulated as a total cost Markov decision process. First, we derive the optimal policy in terms of an optimal placement set and show that this set is characterized by a boundary (with respect to the position of the last placed relay) beyond which it is optimal to place the next relay. Next, based on a simpler one-step-look-ahead characterization of the optimal policy, we propose an algorithm which is proved to converge to the optimal placement set in a finite number of steps and which is faster than value iteration. We show by simulations that the distance threshold based heuristic, usually assumed in the literature, is close to the optimal, provided that the threshold distance is carefully chosen. (C) 2014 Elsevier B.V. All rights reserved.

Risk-sensitive control of continuous time Markov chains

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We study risk-sensitive control of continuous time Markov chains taking values in discrete state space. We study both finite and infinite horizon problems. In the finite horizon problem we characterize the value function via Hamilton Jacobi Bellman equation and obtain an optimal Markov control. We do the same for infinite horizon discounted cost case. In the infinite horizon average cost case we establish the existence of an optimal stationary control under certain Lyapunov condition. We also develop a policy iteration algorithm for finding an optimal control.

3-D GPU Based Real Time Diffuse Optical Tomographic System

Relevância:

10.00% 10.00%

Publicador:

Resumo:

3-Dimensional Diffuse Optical Tomographic (3-D DOT) image reconstruction algorithm is computationally complex and requires excessive matrix computations and thus hampers reconstruction in real time. In this paper, we present near real time 3D DOT image reconstruction that is based on Broyden approach for updating Jacobian matrix. The Broyden method simplifies the algorithm by avoiding re-computation of the Jacobian matrix in each iteration. We have developed CPU and heterogeneous CPU/GPU code for 3D DOT image reconstruction in C and MatLab programming platform. We have used Compute Unified Device Architecture (CUDA) programming framework and CUDA linear algebra library (CULA) to utilize the massively parallel computational power of GPUs (NVIDIA Tesla K20c). The computation time achieved for C program based implementation for a CPU/GPU system for 3 planes measurement and FEM mesh size of 19172 tetrahedral elements is 806 milliseconds for an iteration.

Bayesian parameter identification in dynamic state space models using modified measurement equations

Relevância:

10.00% 10.00%

Publicador:

Resumo:

When Markov chain Monte Carlo (MCMC) samplers are used in problems of system parameter identification, one would face computational difficulties in dealing with large amount of measurement data and (or) low levels of measurement noise. Such exigencies are likely to occur in problems of parameter identification in dynamical systems when amount of vibratory measurement data and number of parameters to be identified could be large. In such cases, the posterior probability density function of the system parameters tends to have regions of narrow supports and a finite length MCMC chain is unlikely to cover pertinent regions. The present study proposes strategies based on modification of measurement equations and subsequent corrections, to alleviate this difficulty. This involves artificial enhancement of measurement noise, assimilation of transformed packets of measurements, and a global iteration strategy to improve the choice of prior models. Illustrative examples cover laboratory studies on a time variant dynamical system and a bending-torsion coupled, geometrically non-linear building frame under earthquake support motions. (C) 2015 Elsevier Ltd. All rights reserved.

A New Successive Displacement Type Load Flow Algorithm and its Application to Radial Systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new successive displacement type load flow method is developed in this paper. This algorithm differs from the conventional Y-Bus based Gauss Seidel load flow in that the voltages at each bus is updated in every iteration based on the exact solution of the power balance equation at that node instead of an approximate solution used by the Gauss Seidel method. It turns out that this modified implementation translates into only a marginal improvement in convergence behaviour for obtaining load flow solutions of interconnected systems. However it is demonstrated that the new approach can be adapted with some additional refinements in order to develop an effective load flow solution technique for radial systems. Numerical results considering a number of systems-both interconnected and radial, are provided to validate the proposed approach.

An Alternating l(p) - l(2) Projections Algorithm (ALPA) for Speech Modeling using Sparsity Constraints

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We address the problem of separating a speech signal into its excitation and vocal-tract filter components, which falls within the framework of blind deconvolution. Typically, the excitation in case of voiced speech is assumed to be sparse and the vocal-tract filter stable. We develop an alternating l(p) - l(2) projections algorithm (ALPA) to perform deconvolution taking into account these constraints. The algorithm is iterative, and alternates between two solution spaces. The initialization is based on the standard linear prediction decomposition of a speech signal into an autoregressive filter and prediction residue. In every iteration, a sparse excitation is estimated by optimizing an l(p)-norm-based cost and the vocal-tract filter is derived as a solution to a standard least-squares minimization problem. We validate the algorithm on voiced segments of natural speech signals and show applications to epoch estimation. We also present comparisons with state-of-the-art techniques and show that ALPA gives a sparser impulse-like excitation, where the impulses directly denote the epochs or instants of significant excitation.

Robust Savitzky-Golay Filters

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Local polynomial approximation of data is an approach towards signal denoising. Savitzky-Golay (SG) filters are finite-impulse-response kernels, which convolve with the data to result in polynomial approximation for a chosen set of filter parameters. In the case of noise following Gaussian statistics, minimization of mean-squared error (MSE) between noisy signal and its polynomial approximation is optimum in the maximum-likelihood (ML) sense but the MSE criterion is not optimal for non-Gaussian noise conditions. In this paper, we robustify the SG filter for applications involving noise following a heavy-tailed distribution. The optimal filtering criterion is achieved by l(1) norm minimization of error through iteratively reweighted least-squares (IRLS) technique. It is interesting to note that at any stage of the iteration, we solve a weighted SG filter by minimizing l(2) norm but the process converges to l(1) minimized output. The results show consistent improvement over the standard SG filter performance.

A Study of Composite Beam with Shape Memory Alloy Arbitrarily Embedded under Thermal and Mechanical Loadings

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The constitutive relations and kinematic assumptions on the composite beam with shape memory alloy (SMA) arbitrarily embedded are discussed and the results related to the different kinematic assumptions are compared. As the approach of mechanics of materials is to study the composite beam with the SMA layer embedded, the kinematic assumption is vital. In this paper, we systematically study the kinematic assumptions influence on the composite beam deflection and vibration characteristics. Based on the different kinematic assumptions, the equations of equilibrium/motion are different. Here three widely used kinematic assumptions are presented and the equations of equilibrium/motion are derived accordingly. As the three kinematic assumptions change from the simple to the complex one, the governing equations evolve from the linear to the nonlinear ones. For the nonlinear equations of equilibrium, the numerical solution is obtained by using Galerkin discretization method and Newton-Rhapson iteration method. The analysis on the numerical difficulty of using Galerkin method on the post-buckling analysis is presented. For the post-buckling analysis, finite element method is applied to avoid the difficulty due to the singularity occurred in Galerkin method. The natural frequencies of the composite beam with the nonlinear governing equation, which are obtained by directly linearizing the equations and locally linearizing the equations around each equilibrium, are compared. The influences of the SMA layer thickness and the shift from neutral axis on the deflection, buckling and post-buckling are also investigated. This paper presents a very general way to treat thermo-mechanical properties of the composite beam with SMA arbitrarily embedded. The governing equations for each kinematic assumption consist of a third order and a fourth order differential equation with a total of seven boundary conditions. Some previous studies on the SMA layer either ignore the thermal constraint effect or implicitly assume that the SMA is symmetrically embedded. The composite beam with the SMA layer asymmetrically embedded is studied here, in which symmetric embedding is a special case. Based on the different kinematic assumptions, the results are different depending on the deflection magnitude because of the nonlinear hardening effect due to the (large) deflection. And this difference is systematically compared for both the deflection and the natural frequencies. For simple kinematic assumption, the governing equations are linear and analytical solution is available. But as the deflection increases to the large magnitude, the simple kinematic assumption does not really reflect the structural deflection and the complex one must be used. During the systematic comparison of computational results due to the different kinematic assumptions, the application range of the simple kinematic assumption is also evaluated. Besides the equilibrium study of the composite laminate with SMA embedded, the buckling, post-buckling, free and forced vibrations of the composite beam with the different configurations are also studied and compared.

A Continuation Method of Parameter Inversion for Non-Equilibrium Convection-Dispersion Equation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Based on the homotopy mapping, a globally convergent method of parameter inversion for non-equilibrium convection-dispersion equations (CDEs) is developed. Moreover, in order to further improve the computational efficiency of the algorithm, a properly smooth function, which is derived from the sigmoid function, is employed to update the homotopy parameter during iteration. Numerical results show the feature of global convergence and high performance of this method. In addition, even the measurement quantities are heavily contaminated by noises, and a good solution can be found.

Vibration of an Adhered Microbeam Under a Periodically Shaking Electrical Force

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The vibration analysis of an adhered S-shaped microbeam under alternating sinusoidal voltage is presented. The shaking force is the electrical force due to the sinusoidal voltage. During vibration, both the microbeam deflection and the adhesion length keep changing. The microbeam deflection and adhesion length are numerically determined by the iteration method. As the adhesion length keeps changing, the domain of the equation of motion for the microbeam (unadhered part) changes correspondingly, which results in changes of the structure natural frequencies. For this reason, the system can never reach a steady state. The transient behaviors of the microbeam under different shaking frequencies are compared. We deliberately choose the initial conditions to compare our dynamic results with the existing static theory. The paper also analyzes the changing behavior of adhesion length during vibration and an asymmetric pattern of adhesion length change is revealed, which may be used to guide the dynamic de-adhering process. The abnormal behavior of the adhered microbeam vibrating at almost the same frequency under two quite different shaking frequencies is also shown. The Galerkin method is used to discretize the equation of motion and its convergence study is also presented. The model is only applicable in the case that the peel number is equal to 1. Some other model limitations are also discussed.

A Bayesian wavelet-based multidimensional deconvolution with sub-band emphasis

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a new algorithm for waveletbased multidimensional image deconvolution which employs subband-dependent minimization and the dual-tree complex wavelet transform in an iterative Bayesian framework. In addition, this algorithm employs a new prior instead of the popular ℓ1 norm, and is thus able to embed a learning scheme during the iteration which helps it to achieve better deconvolution results and faster convergence. © 2008 IEEE.

«
1
2
...
13
14
15
16
17
18
19
...
64
65
»