224 resultados para Parallel computing. Multilayer perceptron. OpenMP
Resumo:
In this paper we propose a framework for optimum steering input determination of all-wheel steer vehicles (AWSV) on rough terrains. The framework computes the steering input which minimizes the tracking error for a given trajectory. Unlike previous methodologies of computing steering inputs of car-like vehicles, the proposed methodology depends explicitly on the vehicle dynamics and can be extended to vehicle having arbitrary number of steering inputs. A fully generic framework has been used to derive the vehicle dynamics and a non-linear programming based constrained optimization approach has been used to compute the steering input considering the instantaneous vehicle dynamics, no-slip and contact constraints of the vehicle. All Wheel steer Vehicles have a special parallel steering ability where the instantaneous centre of rotation (ICR) is at infinity. The proposed framework automatically enables the vehicle to choose between parallel steer and normal operation depending on the error with respect to the desired trajectory. The efficacy of the proposed framework is proved by extensive uneven terrain simulations, for trajectories with continuous or discontinuous velocity profile.
Resumo:
We present a nonequilibrium strong-coupling approach to inhomogeneous systems of ultracold atoms in optical lattices. We demonstrate its application to the Mott-insulating phase of a two-dimensional Fermi-Hubbard model in the presence of a trap potential. Since the theory is formulated self-consistently, the numerical implementation relies on a massively parallel evaluation of the self-energy and the Green's function at each lattice site, employing thousands of CPUs. While the computation of the self-energy is straightforward to parallelize, the evaluation of the Green's function requires the inversion of a large sparse 10(d) x 10(d) matrix, with d > 6. As a crucial ingredient, our solution heavily relies on the smallness of the hopping as compared to the interaction strength and yields a widely scalable realization of a rapidly converging iterative algorithm which evaluates all elements of the Green's function. Results are validated by comparing with the homogeneous case via the local-density approximation. These calculations also show that the local-density approximation is valid in nonequilibrium setups without mass transport.
Resumo:
Precise pointer analysis is a problem of interest to both the compiler and the program verification community. Flow-sensitivity is an important dimension of pointer analysis that affects the precision of the final result computed. Scaling flow-sensitive pointer analysis to millions of lines of code is a major challenge. Recently, staged flow-sensitive pointer analysis has been proposed, which exploits a sparse representation of program code created by staged analysis. In this paper we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem. Graph-rewriting has already been used for flow-insensitive analysis. However, formulating flow-sensitive pointer analysis as a graph-rewriting problem adds additional challenges due to the nature of flow-sensitivity. We implement our parallel algorithm using Intel Threading Building Blocks and demonstrate considerable scaling (upto 2.6x) for 8 threads on a set of 10 benchmarks. Compared to the sequential implementation of staged flow-sensitive analysis, a single threaded execution of our implementation performs better in 8 of the benchmarks.
Resumo:
We show that every graph of maximum degree 3 can be represented as the intersection graph of axis parallel boxes in three dimensions, that is, every vertex can be mapped to an axis parallel box such that two boxes intersect if and only if their corresponding vertices are adjacent. In fact, we construct a representation in which any two intersecting boxes touch just at their boundaries.
Resumo:
The boxicity (resp. cubicity) of a graph G(V, E) is the minimum integer k such that G can be represented as the intersection graph of axis parallel boxes (resp. cubes) in R-k. Equivalently, it is the minimum number of interval graphs (resp. unit interval graphs) on the vertex set V, such that the intersection of their edge sets is E. The problem of computing boxicity (resp. cubicity) is known to be inapproximable, even for restricted graph classes like bipartite, co-bipartite and split graphs, within an O(n(1-epsilon))-factor for any epsilon > 0 in polynomial time, unless NP = ZPP. For any well known graph class of unbounded boxicity, there is no known approximation algorithm that gives n(1-epsilon)-factor approximation algorithm for computing boxicity in polynomial time, for any epsilon > 0. In this paper, we consider the problem of approximating the boxicity (cubicity) of circular arc graphs intersection graphs of arcs of a circle. Circular arc graphs are known to have unbounded boxicity, which could be as large as Omega(n). We give a (2 + 1/k) -factor (resp. (2 + log n]/k)-factor) polynomial time approximation algorithm for computing the boxicity (resp. cubicity) of any circular arc graph, where k >= 1 is the value of the optimum solution. For normal circular arc (NCA) graphs, with an NCA model given, this can be improved to an additive two approximation algorithm. The time complexity of the algorithms to approximately compute the boxicity (resp. cubicity) is O(mn + n(2)) in both these cases, and in O(mn + kn(2)) = O(n(3)) time we also get their corresponding box (resp. cube) representations, where n is the number of vertices of the graph and m is its number of edges. Our additive two approximation algorithm directly works for any proper circular arc graph, since their NCA models can be computed in polynomial time. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
The problem of finding an optimal vertex cover in a graph is a classic NP-complete problem, and is a special case of the hitting set question. On the other hand, the hitting set problem, when asked in the context of induced geometric objects, often turns out to be exactly the vertex cover problem on restricted classes of graphs. In this work we explore a particular instance of such a phenomenon. We consider the problem of hitting all axis-parallel slabs induced by a point set P, and show that it is equivalent to the problem of finding a vertex cover on a graph whose edge set is the union of two Hamiltonian Paths. We show the latter problem to be NP-complete, and also give an algorithm to find a vertex cover of size at most k, on graphs of maximum degree four, whose running time is 1.2637(k) n(O(1)).
Resumo:
In this paper we present a massively parallel open source solver for Richards equation, named the RichardsFOAM solver. This solver has been developed in the framework of the open source generalist computational fluid dynamics tool box OpenFOAM (R) and is capable to deal with large scale problems in both space and time. The source code for RichardsFOAM may be downloaded from the CPC program library website. It exhibits good parallel performances (up to similar to 90% parallel efficiency with 1024 processors both in strong and weak scaling), and the conditions required for obtaining such performances are analysed and discussed. These performances enable the mechanistic modelling of water fluxes at the scale of experimental watersheds (up to few square kilometres of surface area), and on time scales of decades to a century. Such a solver can be useful in various applications, such as environmental engineering for long term transport of pollutants in soils, water engineering for assessing the impact of land settlement on water resources, or in the study of weathering processes on the watersheds. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Contact damage in curved interface nano-layeredmetal/nitride (150 (ZrN)/10 (Zr) nm) multilayer is investigated in order to understand the role of interface morphology on contact damage under indentation. A finite element method (FEM) model was formulated with different wavelengths of 1000 nm, 500 nm, 250 nm and common height of 50 nm, which gives insight on the effect of different curvature on stress field generated under indentation. Elastic-plastic properties were assigned to the metal layer and substrate while the nitride layer was assigned perfectly elastic properties. Curved interface multilayers show delamination along the metal/nitride interface and vertical cracks emanating from the ends of the delamination. FEM revealed the presence of tensile stress normal to the interface even under the contact, along with tensile radial stresses, both present at the valley part of the curve, which leads to vertical cracks associated with interfacial delamination. Stress enhancement was seen to be relatively insensitive to curvature. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
The binding of ligand 5,10,15,20-tetra(N-methyl-4-pyridyl)porphine (TMPyP4) with telomeric and genomic G-quadruplex DNA has been extensively studied. However, a comparative study of interactions of TMPyP4 with different conformations of human telomeric G-quadruplex DNA, namely, parallel propeller-type (PP), antiparallel basket-type (AB), and mixed hybrid-type (MH) G-quadruplex DNA, has not been done. We considered all the possible binding sites in each of the G-quadruplex DNA structures and docked TMPyP4 to each one of them. The resultant most potent sites for binding were analyzed from the mean binding free energy of the complexes. Molecular dynamics simulations were then carried out, and analysis of the binding free energy of the TMPyP4-G-quadruplex complex showed that the binding of TMPyP4 with parallel propeller-type G-quadruplex DNA is preferred over the other two G-quadruplex DNA conformations. The results obtained from the change in solvent excluded surface area (SESA) and solvent accessible surface area (SASA) also support the more pronounced binding of the ligand with the parallel propeller-type G-quadruplex DNA.
Resumo:
We have developed a real-time imaging method for two-color wide-field fluorescence microscopy using a combined approach that integrates multi-spectral imaging and Bayesian image reconstruction technique. To enable simultaneous observation of two dyes (primary and secondary), we exploit their spectral properties that allow parallel recording in both the channels. The key advantage of this technique is the use of a single wavelength of light to excite both the primary dye and the secondary dye. The primary and secondary dyes respectively give rise to fluorescence and bleed-through signal, which after normalization were merged to obtain two-color 3D images. To realize real-time imaging, we employed maximum likelihood (ML) and maximum a posteriori (MAP) techniques on a high-performance computing platform (GPU). The results show two-fold improvement in contrast while the signal-to-background ratio (SBR) is improved by a factor of 4. We report a speed boost of 52 and 350 for 2D and 3D images respectively. Using this system, we have studied the real-time protein aggregation in yeast cells and HeLa cells that exhibits dot-like protein distribution. The proposed technique has the ability to temporally resolve rapidly occurring biological events.
Resumo:
The ac-side terminal voltages of parallel-connected converters are different if the line reactive drops of the individual converters are different. This could result either from differences in per-phase inductances or from differences in the line currents of the converters. In such cases, the modulating signals are different for the converters. Hence, the common-mode (CM) voltages for the converters, injected by conventional space vector pulsewidth modulation (CSVPWM) to increase dc-bus utilization, are different. Consequently, significant low-frequency zero-sequence circulating currents result. This paper proposes a new modulation method for parallel-connected converters with unequal terminal voltages. This method does not cause low-frequency zero-sequence circulating currents and is comparable with CSVPWM in terms of dc-bus utilization and device power loss. Experimental results are presented at a power level of 150 kVA from a circulating-power test setup, where the differences in converter terminal voltages are quite significant.
Resumo:
3-Dimensional Diffuse Optical Tomographic (3-D DOT) image reconstruction algorithm is computationally complex and requires excessive matrix computations and thus hampers reconstruction in real time. In this paper, we present near real time 3D DOT image reconstruction that is based on Broyden approach for updating Jacobian matrix. The Broyden method simplifies the algorithm by avoiding re-computation of the Jacobian matrix in each iteration. We have developed CPU and heterogeneous CPU/GPU code for 3D DOT image reconstruction in C and MatLab programming platform. We have used Compute Unified Device Architecture (CUDA) programming framework and CUDA linear algebra library (CULA) to utilize the massively parallel computational power of GPUs (NVIDIA Tesla K20c). The computation time achieved for C program based implementation for a CPU/GPU system for 3 planes measurement and FEM mesh size of 19172 tetrahedral elements is 806 milliseconds for an iteration.
Resumo:
The history of computing in India is inextricably intertwined with two interacting forces: the political climate determined by the political party in power) and the government policies mainly driven by the technocrats and bureaucrats who acted within the boundaries drawn by the political party in power. There were four break points (which occurred in 1970, 1978, 1991 and 1998) that changed the direction of the development of computers and their applications. This article explains why these breaks occurred and how they affected the history of computing in India.
Resumo:
Low-power requirements of contemporary sensing technology attract research on alternate power sources that can replace batteries. Energy harvesters absorb ambient energy and function as power sources for sensors and other low-power devices. Piezoelectric bimorphs have been demonstrating the preeminence in converting the mechanical energy in ambient vibrations into electrical energy. Improving the performance of these harvesters is pivotal as the energy in ambient vibrations is innately low. In this paper, we focus on enhancing the performance of piezoelectric harvesters through a multilayer and, in particular, a multistep configuration. Partial coverage of piezoelectric material in steps along the length of a cantilever beam results in a multistep piezoelectric energy harvester. We also discuss obtaining an approximate deformation curve for the beam with multiple steps in a computationally efficient manner. We find that the power generated by a multistep beam is almost 90% more than that by a multilayer harvester made out of the same volume of polyvinylidinefluoride ( PVDF), further corroborated experimentally. Improvements observed in the power generated prove to be a boon for weakly coupled low profile piezoelectric materials. Thus, in spite of the weak piezoelectric coupling observed in PVDF, its energy harvesting capability can be improved significantly using it in a multistep piezoelectric beam configuration.