980 resultados para Algorithm efficiency
Resumo:
The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).
Resumo:
We derive a new class of iterative schemes for accelerating the convergence of the EM algorithm, by exploiting the connection between fixed point iterations and extrapolation methods. First, we present a general formulation of one-step iterative schemes, which are obtained by cycling with the extrapolation methods. We, then square the one-step schemes to obtain the new class of methods, which we call SQUAREM. Squaring a one-step iterative scheme is simply applying it twice within each cycle of the extrapolation method. Here we focus on the first order or rank-one extrapolation methods for two reasons, (1) simplicity, and (2) computational efficiency. In particular, we study two first order extrapolation methods, the reduced rank extrapolation (RRE1) and minimal polynomial extrapolation (MPE1). The convergence of the new schemes, both one-step and squared, is non-monotonic with respect to the residual norm. The first order one-step and SQUAREM schemes are linearly convergent, like the EM algorithm but they have a faster rate of convergence. We demonstrate, through five different examples, the effectiveness of the first order SQUAREM schemes, SqRRE1 and SqMPE1, in accelerating the EM algorithm. The SQUAREM schemes are also shown to be vastly superior to their one-step counterparts, RRE1 and MPE1, in terms of computational efficiency. The proposed extrapolation schemes can fail due to the numerical problems of stagnation and near breakdown. We have developed a new hybrid iterative scheme that combines the RRE1 and MPE1 schemes in such a manner that it overcomes both stagnation and near breakdown. The squared first order hybrid scheme, SqHyb1, emerges as the iterative scheme of choice based on our numerical experiments. It combines the fast convergence of the SqMPE1, while avoiding near breakdowns, with the stability of SqRRE1, while avoiding stagnations. The SQUAREM methods can be incorporated very easily into an existing EM algorithm. They only require the basic EM step for their implementation and do not require any other auxiliary quantities such as the complete data log likelihood, and its gradient or hessian. They are an attractive option in problems with a very large number of parameters, and in problems where the statistical model is complex, the EM algorithm is slow and each EM step is computationally demanding.
Resumo:
We consider nonparametric missing data models for which the censoring mechanism satisfies coarsening at random and which allow complete observations on the variable X of interest. W show that beyond some empirical process conditions the only essential condition for efficiency of an NPMLE of the distribution of X is that the regions associated with incomplete observations on X contain enough complete observations. This is heuristically explained by describing the EM-algorithm. We provide identifiably of the self-consistency equation and efficiency of the NPMLE in order to make this statement rigorous. The usual kind of differentiability conditions in the proof are avoided by using an identity which holds for the NPMLE of linear parameters in convex models. We provide a bivariate censoring application in which the condition and hence the NPMLE fails, but where other estimators, not based on the NPMLE principle, are highly inefficient. It is shown how to slightly reduce the data so that the conditions hold for the reduced data. The conditions are verified for the univariate censoring, double censored, and Ibragimov-Has'minski models.
Resumo:
Linear programs, or LPs, are often used in optimization problems, such as improving manufacturing efficiency of maximizing the yield from limited resources. The most common method for solving LPs is the Simplex Method, which will yield a solution, if one exists, but over the real numbers. From a purely numerical standpoint, it will be an optimal solution, but quite often we desire an optimal integer solution. A linear program in which the variables are also constrained to be integers is called an integer linear program or ILP. It is the focus of this report to present a parallel algorithm for solving ILPs. We discuss a serial algorithm using a breadth-first branch-and-bound search to check the feasible solution space, and then extend it into a parallel algorithm using a client-server model. In the parallel mode, the search may not be truly breadth-first, depending on the solution time for each node in the solution tree. Our search takes advantage of pruning, often resulting in super-linear improvements in solution time. Finally, we present results from sample ILPs, describe a few modifications to enhance the algorithm and improve solution time, and offer suggestions for future work.
Resumo:
This dissertation discusses structural-electrostatic modeling techniques, genetic algorithm based optimization and control design for electrostatic micro devices. First, an alternative modeling technique, the interpolated force model, for electrostatic micro devices is discussed. The method provides improved computational efficiency relative to a benchmark model, as well as improved accuracy for irregular electrode configurations relative to a common approximate model, the parallel plate approximation model. For the configuration most similar to two parallel plates, expected to be the best case scenario for the approximate model, both the parallel plate approximation model and the interpolated force model maintained less than 2.2% error in static deflection compared to the benchmark model. For the configuration expected to be the worst case scenario for the parallel plate approximation model, the interpolated force model maintained less than 2.9% error in static deflection while the parallel plate approximation model is incapable of handling the configuration. Second, genetic algorithm based optimization is shown to improve the design of an electrostatic micro sensor. The design space is enlarged from published design spaces to include the configuration of both sensing and actuation electrodes, material distribution, actuation voltage and other geometric dimensions. For a small population, the design was improved by approximately a factor of 6 over 15 generations to a fitness value of 3.2 fF. For a larger population seeded with the best configurations of the previous optimization, the design was improved by another 7% in 5 generations to a fitness value of 3.0 fF. Third, a learning control algorithm is presented that reduces the closing time of a radiofrequency microelectromechanical systems switch by minimizing bounce while maintaining robustness to fabrication variability. Electrostatic actuation of the plate causes pull-in with high impact velocities, which are difficult to control due to parameter variations from part to part. A single degree-of-freedom model was utilized to design a learning control algorithm that shapes the actuation voltage based on the open/closed state of the switch. Experiments on 3 test switches show that after 5-10 iterations, the learning algorithm lands the switch with an impact velocity not exceeding 0.2 m/s, eliminating bounce.
Resumo:
The numerical solution of the incompressible Navier-Stokes Equations offers an effective alternative to the experimental analysis of Fluid-Structure interaction i.e. dynamical coupling between a fluid and a solid which otherwise is very complex, time consuming and very expensive. To have a method which can accurately model these types of mechanical systems by numerical solutions becomes a great option, since these advantages are even more obvious when considering huge structures like bridges, high rise buildings, or even wind turbine blades with diameters as large as 200 meters. The modeling of such processes, however, involves complex multiphysics problems along with complex geometries. This thesis focuses on a novel vorticity-velocity formulation called the KLE to solve the incompressible Navier-stokes equations for such FSI problems. This scheme allows for the implementation of robust adaptive ODE time integration schemes and thus allows us to tackle the various multiphysics problems as separate modules. The current algorithm for KLE employs a structured or unstructured mesh for spatial discretization and it allows the use of a self-adaptive or fixed time step ODE solver while dealing with unsteady problems. This research deals with the analysis of the effects of the Courant-Friedrichs-Lewy (CFL) condition for KLE when applied to unsteady Stoke’s problem. The objective is to conduct a numerical analysis for stability and, hence, for convergence. Our results confirmthat the time step ∆t is constrained by the CFL-like condition ∆t ≤ const. hα, where h denotes the variable that represents spatial discretization.
Resumo:
This paper proposes the Optimized Power save Algorithm for continuous Media Applications (OPAMA) to improve end-user device energy efficiency. OPAMA enhances the standard legacy Power Save Mode (PSM) of IEEE 802.11 by taking into consideration application specific requirements combined with data aggregation techniques. By establishing a balanced cost/benefit tradeoff between performance and energy consumption, OPAMA is able to improve energy efficiency, while keeping the end-user experience at a desired level. OPAMA was assessed in the OMNeT++ simulator using real traces of variable bitrate video streaming applications. The results showed the capability to enhance energy efficiency, achieving savings up to 44% when compared with the IEEE 802.11 legacy PSM.
Resumo:
With the advent of cloud computing model, distributed caches have become the cornerstone for building scalable applications. Popular systems like Facebook [1] or Twitter use Memcached [5], a highly scalable distributed object cache, to speed up applications by avoiding database accesses. Distributed object caches assign objects to cache instances based on a hashing function, and objects are not moved from a cache instance to another unless more instances are added to the cache and objects are redistributed. This may lead to situations where some cache instances are overloaded when some of the objects they store are frequently accessed, while other cache instances are less frequently used. In this paper we propose a multi-resource load balancing algorithm for distributed cache systems. The algorithm aims at balancing both CPU and Memory resources among cache instances by redistributing stored data. Considering the possible conflict of balancing multiple resources at the same time, we give CPU and Memory resources weighted priorities based on the runtime load distributions. A scarcer resource is given a higher weight than a less scarce resource when load balancing. The system imbalance degree is evaluated based on monitoring information, and the utility load of a node, a unit for resource consumption. Besides, since continuous rebalance of the system may affect the QoS of applications utilizing the cache system, our data selection policy ensures that each data migration minimizes the system imbalance degree and hence, the total reconfiguration cost can be minimized. An extensive simulation is conducted to compare our policy with other policies. Our policy shows a significant improvement in time efficiency and decrease in reconfiguration cost.
Resumo:
This paper describes an approach to solve the inverse kinematics problem of humanoid robots whose construction shows a small but non negligible offset at the hip which prevents any purely analytical solution to be developed. Knowing that a purely numerical solution is not feasible due to variable efficiency problems, the proposed one first neglects the offset presence in order to obtain an approximate “solution” by means of an analytical algorithm based on screw theory, and then uses it as the initial condition of a numerical refining procedure based on the Levenberg‐Marquardt algorithm. In this way, few iterations are needed for any specified attitude, making it possible to implement the algorithm for real‐time applications. As a way to show the algorithm’s implementation, one case of study is considered throughout the paper, represented by the SILO2 humanoid robot.
Resumo:
A Finite Element technique to interpolate general data (function values and its derivatives) has been developped. The technique can be considered as a generalized solution of the classical polynomial interpolation, because the condition for the interpolating function to be a polynomial is replaced by a minimizing condition of a given “smoothing” functional. In this way it is possible to find interpolating functions with a given level of continuity according to the class of finite elements used. Examples have been presented in order to assess the accuracy and efficiency of the procedure.
Resumo:
Heuristic methods are popular tools to find critical slip surfaces in slope stability analyses. A new genetic algorithm (GA) is proposed in this work that has a standard structure but a novel encoding and generation of individuals with custom-designed operators for mutation and crossover that produce kinematically feasible slip surfaces with a high probability. In addition, new indices to assess the efficiency of operators in their search for the minimum factor of safety (FS) are proposed. The proposed GA is applied to traditional benchmark examples from the literature, as well as to a new practical example. Results show that the proposed GA is reliable, flexible and robust: it provides good minimum FS estimates that are not very sensitive to the number of nodes and that are very similar for different replications
Resumo:
We present a modelling method to estimate the 3-D geometry and location of homogeneously magnetized sources from magnetic anomaly data. As input information, the procedure needs the parameters defining the magnetization vector (intensity, inclination and declination) and the Earth's magnetic field direction. When these two vectors are expected to be different in direction, we propose to estimate the magnetization direction from the magnetic map. Then, using this information, we apply an inversion approach based on a genetic algorithm which finds the geometry of the sources by seeking the optimum solution from an initial population of models in successive iterations through an evolutionary process. The evolution consists of three genetic operators (selection, crossover and mutation), which act on each generation, and a smoothing operator, which looks for the best fit to the observed data and a solution consisting of plausible compact sources. The method allows the use of non-gridded, non-planar and inaccurate anomaly data and non-regular subsurface partitions. In addition, neither constraints for the depth to the top of the sources nor an initial model are necessary, although previous models can be incorporated into the process. We show the results of a test using two complex synthetic anomalies to demonstrate the efficiency of our inversion method. The application to real data is illustrated with aeromagnetic data of the volcanic island of Gran Canaria (Canary Islands).
Resumo:
The Iterative Closest Point algorithm (ICP) is commonly used in engineering applications to solve the rigid registration problem of partially overlapped point sets which are pre-aligned with a coarse estimate of their relative positions. This iterative algorithm is applied in many areas such as the medicine for volumetric reconstruction of tomography data, in robotics to reconstruct surfaces or scenes using range sensor information, in industrial systems for quality control of manufactured objects or even in biology to study the structure and folding of proteins. One of the algorithm’s main problems is its high computational complexity (quadratic in the number of points with the non-optimized original variant) in a context where high density point sets, acquired by high resolution scanners, are processed. Many variants have been proposed in the literature whose goal is the performance improvement either by reducing the number of points or the required iterations or even enhancing the complexity of the most expensive phase: the closest neighbor search. In spite of decreasing its complexity, some of the variants tend to have a negative impact on the final registration precision or the convergence domain thus limiting the possible application scenarios. The goal of this work is the improvement of the algorithm’s computational cost so that a wider range of computationally demanding problems from among the ones described before can be addressed. For that purpose, an experimental and mathematical convergence analysis and validation of point-to-point distance metrics has been performed taking into account those distances with lower computational cost than the Euclidean one, which is used as the de facto standard for the algorithm’s implementations in the literature. In that analysis, the functioning of the algorithm in diverse topological spaces, characterized by different metrics, has been studied to check the convergence, efficacy and cost of the method in order to determine the one which offers the best results. Given that the distance calculation represents a significant part of the whole set of computations performed by the algorithm, it is expected that any reduction of that operation affects significantly and positively the overall performance of the method. As a result, a performance improvement has been achieved by the application of those reduced cost metrics whose quality in terms of convergence and error has been analyzed and validated experimentally as comparable with respect to the Euclidean distance using a heterogeneous set of objects, scenarios and initial situations.
Resumo:
Connection establishment is a fundamental function for any connection-oriented network protocol and the efficiency of this function defines the flexibility and responsiveness of the protocol. This process initializes data transmission and performs transmission parameters negotiation, what makes it mandatory process and integral part of entire transmission. Thus, the duration of the connection establishment will affect the transmission process duration. This paper describes an implementation of a handshake algorithm, designed for connection with multiple peers, that is used in Reliable Multi-Destination Transport(RMDT) protocol, its optimization and testing.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-05