Biblioteca Digital

224 resultados para Parallel computing. Multilayer perceptron. OpenMP

Efficient asynchronous executions of AMR computations and visualization on a GPU system

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Adaptive Mesh Refinement is a method which dynamically varies the spatio-temporal resolution of localized mesh regions in numerical simulations, based on the strength of the solution features. In-situ visualization plays an important role for analyzing the time evolving characteristics of the domain structures. Continuous visualization of the output data for various timesteps results in a better study of the underlying domain and the model used for simulating the domain. In this paper, we develop strategies for continuous online visualization of time evolving data for AMR applications executed on GPUs. We reorder the meshes for computations on the GPU based on the users input related to the subdomain that he wants to visualize. This makes the data available for visualization at a faster rate. We then perform asynchronous executions of the visualization steps and fix-up operations on the CPUs while the GPU advances the solution. By performing experiments on Tesla S1070 and Fermi C2070 clusters, we found that our strategies result in 60% improvement in response time and 16% improvement in the rate of visualization of frames over the existing strategy of performing fix-ups and visualization at the end of the timesteps.

Composite cyclodextrin-calcium carbonate porous microparticles and modified multilayer capsules: novel carriers for encapsulation of hydrophobic drugs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Novel composite cyclodextrin (CD)-CaCO3 spherical porous microparticles have been synthesized through Ca2+-CD complex formation, which influences the crystal growth of CaCO3. The CDs are entrapped and distributed uniformly in the matrix of CaCO3 microparticles during crystallization. The hydrophobic fluorescent molecules coumarin and Nile red (NR) are efficiently encapsulated into these composite CD-CaCO3 porous particles through supramolecular inclusion complexation between entrapped CDs and hydrophobic molecules. Thermogravimetric (TGA) and infrared spectroscopy (IR) analysis of composite CD-CaCO3 particles reveals the presence of large CDs and their strong interaction with calcium carbonate nanoparticles. The resulting composite CD-CaCO3 microparticles are utilized as sacrificial templates for preparation of CD-modified layer-by-layer (LbL) capsules. After dissolution of the carbonate core, CDs are retained in the interior of the capsules in a network fashion and assist in the encapsulation of hydrophobic molecules. The efficient encapsulation of the hydrophobic fluorescent dye, coumarin, was successfully demonstrated using CD-modified capsules. In vitro release of the encapsulated coumarin from the CD-CaCO3 and CD-modified capsules has been demonstrated.

Evolution of microstructure and texture during deformation and recrystallization of heavily rolled Cu-Cu multilayer

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Cu-Cu multilayer processed by accumulative roll bonding was deformed to large strains and further annealed. The texture of the deformed Cu-Cu multilayer differs from the conventional fcc rolling textures in terms of higher fractions of Bs and RD-rotated cube components, compared with the volume fraction of Cu component. The elongated grain shape significantly affects the deformation characteristics. Characteristic microstructural features of both continuous dynamic recrystallization and discontinuous dynamic recrystallization were observed in the microtexture measurements. X-ray texture measurements of annealing of heavily deformed multilayer demonstrate constrained recrystallization and resulted in a bimodal grain size distribution in the annealed material at higher strains. The presence of cube- and BR-oriented grains in the deformed material confirms the oriented nucleation as the major influence on texture change during recrystallization. Persistence of cube component throughout the deformation is attributed to dynamic recrystallization. Evolution of RD-rotated cube is attributed to the deformation of cube components that evolve from dynamic recrystallization. The relaxation of strain components leads to Bs at larger strains. Further, the Bs component is found to recover rather than recrystallize during deformation. The presence of predominantly Cu and Bs orientations surrounding the interface layer suggests constrained annealing behavior.

DMT of parallel-path and layered networks under the half-duplex constraint

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we study the diversity-multiplexing-gain tradeoff (DMT) of wireless relay networks under the half-duplex constraint. It is often unclear what penalty if any, is imposed by the half-duplex constraint on the DMT of such networks. We study two classes of networks; the first class, called KPP(I) networks, is the class of networks with the relays organized in K parallel paths between the source and the destination. While we assume that there is no direct source-destination path, the K relaying paths can interfere with each other. The second class, termed as layered networks, is comprised of relays organized in layers, where links exist only between adjacent layers. We present a communication scheme based on static schedules and amplify-and-forward relaying for these networks. We also show that for KPP(I) networks with K >= 3, the proposed schemes can achieve full-duplex DMT performance, thus demonstrating that there is no performance hit on the DMT due to the half-duplex constraint. We also show that, for layered networks, a linear DMT of d(max)(1 - r)(+) between the maximum diversity d(max) and the maximum MG, r(max) = 1 is achievable. We adapt existing DMT optimal coding schemes to these networks, thus specifying the end-to-end communication strategy explicitly.

Inference for the Component and System Lifetime Distribution of a k-unit Parallel System Based on System Data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.

Least squares QR-based decomposition provides an efficient way of computing optimal regularization parameter in photoacoustic tomography

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A computationally efficient approach that computes the optimal regularization parameter for the Tikhonov-minimization scheme is developed for photoacoustic imaging. This approach is based on the least squares-QR decomposition which is a well-known dimensionality reduction technique for a large system of equations. It is shown that the proposed framework is effective in terms of quantitative and qualitative reconstructions of initial pressure distribution enabled via finding an optimal regularization parameter. The computational efficiency and performance of the proposed method are shown using a test case of numerical blood vessel phantom, where the initial pressure is exactly known for quantitative comparison. (C) 2013 Society of Photo-Optical Instrumentation Engineers (SPIE)

Polynomial time and parameterized approximation algorithms for boxicity

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The boxicity (cubicity) of a graph G, denoted by box(G) (respectively cub(G)), is the minimum integer k such that G can be represented as the intersection graph of axis parallel boxes (cubes) in ℝ k . The problem of computing boxicity (cubicity) is known to be inapproximable in polynomial time even for graph classes like bipartite, co-bipartite and split graphs, within an O(n 0.5 − ε ) factor for any ε > 0, unless NP = ZPP. We prove that if a graph G on n vertices has a clique on n − k vertices, then box(G) can be computed in time n22O(k2logk) . Using this fact, various FPT approximation algorithms for boxicity are derived. The parameter used is the vertex (or edge) edit distance of the input graph from certain graph families of bounded boxicity - like interval graphs and planar graphs. Using the same fact, we also derive an O(nloglogn√logn√) factor approximation algorithm for computing boxicity, which, to our knowledge, is the first o(n) factor approximation algorithm for the problem. We also present an FPT approximation algorithm for computing the cubicity of graphs, with vertex cover number as the parameter.

A divide and conquer strategy for scaling weather simulations with multiple regions of interest

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Accurate and timely prediction of weather phenomena, such as hurricanes and flash floods, require high-fidelity compute intensive simulations of multiple finer regions of interest within a coarse simulation domain. Current weather applications execute these nested simulations sequentially using all the available processors, which is sub-optimal due to their sub-linear scalability. In this work, we present a strategy for parallel execution of multiple nested domain simulations based on partitioning the 2-D processor grid into disjoint rectangular regions associated with each domain. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. Experiments on IBM Blue Gene systems using WRF show that the proposed strategies result in performance improvement of up to 33% with topology-oblivious mapping and up to additional 7% with topology-aware mapping over the default sequential strategy.

TCP: thread contention predictor for parallel programs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.

Hardness results for computing optimal locally Gabriel graphs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Delaunay and Gabriel graphs are widely studied geo-metric proximity structures. Motivated by applications in wireless routing, relaxed versions of these graphs known as Locally Delaunay Graphs (LDGs) and Lo-cally Gabriel Graphs (LGGs) have been proposed. We propose another generalization of LGGs called Gener-alized Locally Gabriel Graphs (GLGGs) in the context when certain edges are forbidden in the graph. Unlike a Gabriel Graph, there is no unique LGG or GLGG for a given point set because no edge is necessarily in-cluded or excluded. This property allows us to choose an LGG/GLGG that optimizes a parameter of interest in the graph. We show that computing an edge max-imum GLGG for a given problem instance is NP-hard and also APX-hard. We also show that computing an LGG on a given point set with dilation ≤k is NP-hard. Finally, we give an algorithm to verify whether a given geometric graph G= (V, E) is a valid LGG.

On Computing Amplitude, Phase, and Frequency Modulations Using a Vector Interpretation of the Analytic Signal

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The amplitude-modulation (AM) and phase-modulation (PM) of an amplitude-modulated frequency-modulated (AM-FM) signal are defined as the modulus and phase angle, respectively, of the analytic signal (AS). The FM is defined as the derivative of the PM. However, this standard definition results in a PM with jump discontinuities in cases when the AM index exceeds unity, resulting in an FM that contains impulses. We propose a new approach to define smooth AM, PM, and FM for the AS, where the PM is computed as the solution to an optimization problem based on a vector interpretation of the AS. Our approach is directly linked to the fractional Hilbert transform (FrHT) and leads to an eigenvalue problem. The resulting PM and AM are shown to be smooth, and in particular, the AM turns out to be bipolar. We show an equivalence of the eigenvalue formulation to the square of the AS, and arrive at a simple method to compute the smooth PM. Some examples on synthesized and real signals are provided to validate the theoretical calculations.

Analysis of the degrees-of-freedom of spatial parallel manipulators in regular and singular configurations

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a study of the nature of the degrees-of-freedom of spatial manipulators based on the concept of partition of degrees-of-freedom. In particular, the partitioning of degrees-of-freedom is studied in five lower-mobility spatial parallel manipulators possessing different combinations of degrees-of-freedom. An extension of the existing theory is introduced so as to analyse the nature of the gained degree(s)-of-freedom at a gain-type singularity. The gain of one- and two-degrees-of-freedom is analysed in several well-studied, as well as newly developed manipulators. The formulations also present a basis for the analysis of the velocity kinematics of manipulators of any architecture. (C) 2013 Elsevier Ltd. All rights reserved.

Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a runtime to initiate transfers as necessary. This allows us to avoid redundant transfers that are exhibited by all other existing automatic memory management proposals. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only and compiler-only proposals, it also transfers 2.2x to 13.3x less data on average.

CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.

Polyelectrolyte/silver nanocomposite multilayer films as multifunctional thin film platforms for remote activated protein and drug delivery

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We demonstrate a nanoparticle loading protocol to develop a transparent, multifunctional polyelectrolyte multilayer film for externally activated drug and protein delivery. The composite film was designed by alternate adsorption of poly(allylamine hydrochloride) (PAH) and dextran sulfate (DS) on a glass substrate followed by nanoparticle synthesis through a polyol reduction method. The films showed a uniform distribution of spherical silver nanoparticles with an average diameter of 50 +/- 20 nm, which increased to 80 +/- 20 nm when the AgNO3 concentration was increased from 25 to 50 mM. The porous and supramolecular structure of the polyelectrolyte multilayer film was used to immobilize ciprofloxacin hydrochloride (CH) and bovine serum albumin (BSA) within the polymeric network of the film. When exposed to external triggers such as ultrasonication and laser light the loaded films were ruptured and released the loaded BSA and CH. The release of CH is faster than that of BSA due to a higher diffusion rate. Circular dichroism measurements confirmed that there was no significant change in the conformation of released BSA in comparison with native BSA. The fabricated films showed significant antibacterial activity against the bacterial pathogen Staphylococcus aureus. Applications envisioned for such drug-loaded films include drug and vaccine delivery through the transdermal route, antimicrobial or anti-inflammatory coatings on implants and drug-releasing coatings for stents. (C) 2013 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.

«
1
2
...
7
8
9
10
11
12
13
14
15
»