204 resultados para scalable parallel programming
Resumo:
An efficient and scalable total synthesis of the architecturally challenging sesquiterpenoid (+/-)-penifulvin A has been accomplished via a 12-step sequence with an overall yield of 16%. For the construction of this structurally complex tetracyclic molecule, the key steps used included 1,4-conjugate addition, a Pd(0) catalyzed cross-coupling reaction between an enol phosphate and trimethyl aluminum, Claisen rearrangement using the Johnson orthoester protocol, Ti(III)-mediated reductive epoxide opening-cyclization, Lewis acid catalyzed epoxy-aldehyde rearrangement, and finally a substrate controlled oxidative cascade lactonization process.
Resumo:
It is essential to accurately estimate the working set size (WSS) of an application for various optimizations such as to partition cache among virtual machines or reduce leakage power dissipated in an over-allocated cache by switching it OFF. However, the state-of-the-art heuristics such as average memory access latency (AMAL) or cache miss ratio (CMR) are poorly correlated to the WSS of an application due to 1) over-sized caches and 2) their dispersed nature. Past studies focus on estimating WSS of an application executing on a uniprocessor platform. Estimating the same for a chip multiprocessor (CMP) with a large dispersed cache is challenging due to the presence of concurrently executing threads/processes. Hence, we propose a scalable, highly accurate method to estimate WSS of an application. We call this method ``tagged WSS (TWSS)'' estimation method. We demonstrate the use of TWSS to switch-OFF the over-allocated cache ways in Static and Dynamic NonUniform Cache Architectures (SNUCA, DNUCA) on a tiled CMP. In our implementation of adaptable way SNUCA and DNUCA caches, decision of altering associativity is taken by each L2 controller. Hence, this approach scales better with the number of cores present on a CMP. It gives overall (geometric mean) 26% and 19% higher energy-delay product savings compared to AMAL and CMR heuristics on SNUCA, respectively.
Resumo:
This paper discusses an approach for river mapping and flood evaluation to aid multi-temporal time series analysis of satellite images utilizing pixel spectral information for image classification and region-based segmentation to extract water covered region. Analysis of Moderate Resolution Imaging Spectroradiometer (MODIS) satellite images is applied in two stages: before flood and during flood. For these images the extraction of water region utilizes spectral information for image classification and spatial information for image segmentation. Multi-temporal MODIS images from ``normal'' (non-flood) and flood time-periods are processed in two steps. In the first step, image classifiers such as artificial neural networks and gene expression programming to separate the image pixels into water and non-water groups based on their spectral features. The classified image is then segmented using spatial features of the water pixels to remove the misclassified water region. From the results obtained, we evaluate the performance of the method and conclude that the use of image classification and region-based segmentation is an accurate and reliable for the extraction of water-covered region.
Resumo:
Precise pointer analysis is a problem of interest to both the compiler and the program verification community. Flow-sensitivity is an important dimension of pointer analysis that affects the precision of the final result computed. Scaling flow-sensitive pointer analysis to millions of lines of code is a major challenge. Recently, staged flow-sensitive pointer analysis has been proposed, which exploits a sparse representation of program code created by staged analysis. In this paper we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem. Graph-rewriting has already been used for flow-insensitive analysis. However, formulating flow-sensitive pointer analysis as a graph-rewriting problem adds additional challenges due to the nature of flow-sensitivity. We implement our parallel algorithm using Intel Threading Building Blocks and demonstrate considerable scaling (upto 2.6x) for 8 threads on a set of 10 benchmarks. Compared to the sequential implementation of staged flow-sensitive analysis, a single threaded execution of our implementation performs better in 8 of the benchmarks.
Resumo:
We show that every graph of maximum degree 3 can be represented as the intersection graph of axis parallel boxes in three dimensions, that is, every vertex can be mapped to an axis parallel box such that two boxes intersect if and only if their corresponding vertices are adjacent. In fact, we construct a representation in which any two intersecting boxes touch just at their boundaries.
Resumo:
A new generalized model predictive static programming technique is presented for rapidly solving a class of finite-horizon nonlinear optimal control problems with hard terminal constraints. Two key features for its high computational efficiency include one-time backward integration of a small-dimensional weighting matrix dynamics, followed bya static optimization formulation that requires only a static Lagrange multiplier to update the control history. It turns out that under Euler integration and rectangular approximation of finite integrals it is equivalent to the existing model predictive static programming technique. In addition to the benchmark double integrator problem, usefulness of the proposed technique is demonstrated by solving a three-dimensional angle-constrained guidance problem for an air-to-ground missile, which demands that the missile must meet constraints on both azimuth and elevation angles at the impact point in addition to achieving near-zero miss distance, while minimizing the lateral acceleration demand throughout its flight path. Simulation studies include maneuvering ground targets along with a first-order autopilot lag. Comparison studies with classical augmented proportional navigation guidance and modern general explicit guidance lead to the conclusion that the proposed guidance is superior to both and has a larger capture region as well.
Resumo:
In this paper we present a massively parallel open source solver for Richards equation, named the RichardsFOAM solver. This solver has been developed in the framework of the open source generalist computational fluid dynamics tool box OpenFOAM (R) and is capable to deal with large scale problems in both space and time. The source code for RichardsFOAM may be downloaded from the CPC program library website. It exhibits good parallel performances (up to similar to 90% parallel efficiency with 1024 processors both in strong and weak scaling), and the conditions required for obtaining such performances are analysed and discussed. These performances enable the mechanistic modelling of water fluxes at the scale of experimental watersheds (up to few square kilometres of surface area), and on time scales of decades to a century. Such a solver can be useful in various applications, such as environmental engineering for long term transport of pollutants in soils, water engineering for assessing the impact of land settlement on water resources, or in the study of weathering processes on the watersheds. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
The problem addressed in this paper is sound, scalable, demand-driven null-dereference verification for Java programs. Our approach consists conceptually of a base analysis, plus two major extensions for enhanced precision. The base analysis is a dataflow analysis wherein we propagate formulas in the backward direction from a given dereference, and compute a necessary condition at the entry of the program for the dereference to be potentially unsafe. The extensions are motivated by the presence of certain ``difficult'' constructs in real programs, e.g., virtual calls with too many candidate targets, and library method calls, which happen to need excessive analysis time to be analyzed fully. The base analysis is hence configured to skip such a difficult construct when it is encountered by dropping all information that has been tracked so far that could potentially be affected by the construct. Our extensions are essentially more precise ways to account for the effect of these constructs on information that is being tracked, without requiring full analysis of these constructs. The first extension is a novel scheme to transmit formulas along certain kinds of def-use edges, while the second extension is based on using manually constructed backward-direction summary functions of library methods. We have implemented our approach, and applied it on a set of real-life benchmarks. The base analysis is on average able to declare about 84% of dereferences in each benchmark as safe, while the two extensions push this number up to 91%. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
The recently developed reference-command tracking version of model predictive static programming (MPSP) is successfully applied to a single-stage closed grinding mill circuit. MPSP is an innovative optimal control technique that combines the philosophies of model predictive control (MPC) and approximate dynamic programming. The performance of the proposed MPSP control technique, which can be viewed as a `new paradigm' under the nonlinear MPC philosophy, is compared to the performance of a standard nonlinear MPC technique applied to the same plant for the same conditions. Results show that the MPSP control technique is more than capable of tracking the desired set-point in the presence of model-plant mismatch, disturbances and measurement noise. The performance of MPSP and nonlinear MPC compare very well, with definite advantages offered by MPSP. The computational speed of MPSP is increased through a sequence of innovations such as the conversion of the dynamic optimization problem to a low-dimensional static optimization problem, the recursive computation of sensitivity matrices and using a closed form expression to update the control. To alleviate the burden on the optimization procedure in standard MPC, the control horizon is normally restricted. However, in the MPSP technique the control horizon is extended to the prediction horizon with a minor increase in the computational time. Furthermore, the MPSP technique generally takes only a couple of iterations to converge, even when input constraints are applied. Therefore, MPSP can be regarded as a potential candidate for online applications of the nonlinear MPC philosophy to real-world industrial process plants. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
The binding of ligand 5,10,15,20-tetra(N-methyl-4-pyridyl)porphine (TMPyP4) with telomeric and genomic G-quadruplex DNA has been extensively studied. However, a comparative study of interactions of TMPyP4 with different conformations of human telomeric G-quadruplex DNA, namely, parallel propeller-type (PP), antiparallel basket-type (AB), and mixed hybrid-type (MH) G-quadruplex DNA, has not been done. We considered all the possible binding sites in each of the G-quadruplex DNA structures and docked TMPyP4 to each one of them. The resultant most potent sites for binding were analyzed from the mean binding free energy of the complexes. Molecular dynamics simulations were then carried out, and analysis of the binding free energy of the TMPyP4-G-quadruplex complex showed that the binding of TMPyP4 with parallel propeller-type G-quadruplex DNA is preferred over the other two G-quadruplex DNA conformations. The results obtained from the change in solvent excluded surface area (SESA) and solvent accessible surface area (SASA) also support the more pronounced binding of the ligand with the parallel propeller-type G-quadruplex DNA.
Resumo:
The ac-side terminal voltages of parallel-connected converters are different if the line reactive drops of the individual converters are different. This could result either from differences in per-phase inductances or from differences in the line currents of the converters. In such cases, the modulating signals are different for the converters. Hence, the common-mode (CM) voltages for the converters, injected by conventional space vector pulsewidth modulation (CSVPWM) to increase dc-bus utilization, are different. Consequently, significant low-frequency zero-sequence circulating currents result. This paper proposes a new modulation method for parallel-connected converters with unequal terminal voltages. This method does not cause low-frequency zero-sequence circulating currents and is comparable with CSVPWM in terms of dc-bus utilization and device power loss. Experimental results are presented at a power level of 150 kVA from a circulating-power test setup, where the differences in converter terminal voltages are quite significant.
Resumo:
3-Dimensional Diffuse Optical Tomographic (3-D DOT) image reconstruction algorithm is computationally complex and requires excessive matrix computations and thus hampers reconstruction in real time. In this paper, we present near real time 3D DOT image reconstruction that is based on Broyden approach for updating Jacobian matrix. The Broyden method simplifies the algorithm by avoiding re-computation of the Jacobian matrix in each iteration. We have developed CPU and heterogeneous CPU/GPU code for 3D DOT image reconstruction in C and MatLab programming platform. We have used Compute Unified Device Architecture (CUDA) programming framework and CUDA linear algebra library (CULA) to utilize the massively parallel computational power of GPUs (NVIDIA Tesla K20c). The computation time achieved for C program based implementation for a CPU/GPU system for 3 planes measurement and FEM mesh size of 19172 tetrahedral elements is 806 milliseconds for an iteration.
Resumo:
Here, we demonstrate an uninterrupted galvanic replacement reaction (GRR) for the synthesis of metallic (Ag, Cu and Sn) and bimetallic (Cu M, M=Ag, Au, Pt and Pd) sponges/dendrites by sacrificing the low reduction potential metals (Mg in our case) in acidic medium. The acidic medium prevents the oxide formation on Mg surface and facilitates the uninterrupted reaction. The morphology of dendritic/spongy structures is controlled by the volume of acid used for this reaction. The growth mechanism of the spongy/dendritic microstructures is explained by diffusion-limited aggregate model (DLA), which is also largely affected by the volume of acid. The significance of this method is that the yield can be easily predicted, which is a major challenge for the commercialization of the products. Furthermore, the synthesis is complete in 1-2 minutes at room temperature. We show that the sponges/dendrites efficiently act as catalysts to reduce 4-nitrophenol (4-NP) to 4-aminophenol (4-AP) using NaBH4-a widely studied conversion process.
Resumo:
We describe our novel LED communication infrastructure and demonstrate its scalability across platforms. Our system achieves 50 kilo bits per second on very simple SoCs and scales to megabits bits per second rates on dual processor based mobile phone platforms.
Resumo:
This article considers a semi-infinite mathematical programming problem with equilibrium constraints (SIMPEC) defined as a semi-infinite mathematical programming problem with complementarity constraints. We establish necessary and sufficient optimality conditions for the (SIMPEC). We also formulate Wolfe- and Mond-Weir-type dual models for (SIMPEC) and establish weak, strong and strict converse duality theorems for (SIMPEC) and the corresponding dual problems under invexity assumptions.