882 resultados para Parallel computing. Multilayer perceptron. OpenMP
Resumo:
Surface electrode switching of 16-electrode wireless EIT is studied using a Radio Frequency (RF) based digital data transmission technique operating with 8 channel encoder/decoder ICs. An electrode switching module is developed the analog multiplexers and switched with 8-bit parallel digital data transferred by transmitter/receiver module developed with radio frequency technology. 8-bit parallel digital data collected from the receiver module are converted to 16-bit digital data by using binary adder circuits and then used for switching the electrodes in opposite current injection protocol. 8-bit parallel digital data are generated using NI USB 6251 DAQ card in LabVIEW software and sent to the transmission module which transmits the digital data bits to the receiver end. Receiver module supplies the parallel digital bits to the binary adder circuits and adder circuit outputs are fed to the multiplexers of the electrode switching module for surface electrode switching. 1 mA, 50 kHz sinusoidal constant current is injected at the phantom boundary using opposite current injection protocol. The boundary potentials developed at the voltage electrodes are measured and studied to assess the wireless data transmission.
Resumo:
Boxicity of a graph G(V, E) is the minimum integer k such that G can be represented as the intersection graph of k-dimensional axis parallel boxes in Rk. Equivalently, it is the minimum number of interval graphs on the vertex set V such that the intersection of their edge sets is E. It is known that boxicity cannot be approximated even for graph classes like bipartite, co-bipartite and split graphs below O(n0.5-ε)-factor, for any ε > 0 in polynomial time unless NP = ZPP. Till date, there is no well known graph class of unbounded boxicity for which even an nε-factor approximation algorithm for computing boxicity is known, for any ε < 1. In this paper, we study the boxicity problem on Circular Arc graphs - intersection graphs of arcs of a circle. We give a (2+ 1/k)-factor polynomial time approximation algorithm for computing the boxicity of any circular arc graph along with a corresponding box representation, where k ≥ 1 is its boxicity. For Normal Circular Arc(NCA) graphs, with an NCA model given, this can be improved to an additive 2-factor approximation algorithm. The time complexity of the algorithms to approximately compute the boxicity is O(mn+n2) in both these cases and in O(mn+kn2) which is at most O(n3) time we also get their corresponding box representations, where n is the number of vertices of the graph and m is its number of edges. The additive 2-factor algorithm directly works for any Proper Circular Arc graph, since computing an NCA model for it can be done in polynomial time.
Resumo:
Identical parallel-connected converters with unequal load sharing have unequal terminal voltages. The difference in terminal voltages is more pronounced in case of back-to-back connected converters, operated in power-circulation mode for the purpose of endurance tests. In this paper, a synchronous reference frame based analysis is presented to estimate the grid current distortion in interleaved, grid-connected converters with unequal terminal voltages. Influence of carrier interleaving angle on rms grid current ripple is studied theoretically as well as experimentally. Optimum interleaving angle to minimize the rms grid current ripple is investigated for different applications of parallel converters. The applications include unity power factor rectifiers, inverters for renewable energy sources, reactive power compensators, and circulating-power test set-up used for thermal testing of high-power converters. Optimum interleaving angle is shown to be a strong function of the average of the modulation indices of the two converters, irrespective of the application. The findings are verified experimentally on two parallel-connected converters, circulating reactive power of up to 150 kVA between them.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
We show that every graph of maximum degree 3 can be represented as the intersection graph of axis parallel boxes in three dimensions, that is, every vertex can be mapped to an axis parallel box such that two boxes intersect if and only if their corresponding vertices are adjacent. In fact, we construct a representation in which any two intersecting boxes just touch at their boundaries. Further, this construction can be realized in linear time.
Resumo:
Particle Swarm Optimization is a parallel algorithm that spawns particles across a search space searching for an optimized solution. Though inherently parallel, they have distinct synchronizations points which stumbles attempts to create completely distributed versions of it. In this paper, we attempt to create a completely distributed peer-peer particle swarm optimization in a cluster of heterogeneous nodes. Since, the original algorithm requires explicit synchronization points we modified the algorithm in multiple ways to support a peer-peer system of nodes. We also modify certain aspect of the basic PSO algorithm and show how certain numerical problems can take advantage of the same thereby yielding fast convergence.
Resumo:
Adaptive Mesh Refinement is a method which dynamically varies the spatio-temporal resolution of localized mesh regions in numerical simulations, based on the strength of the solution features. In-situ visualization plays an important role for analyzing the time evolving characteristics of the domain structures. Continuous visualization of the output data for various timesteps results in a better study of the underlying domain and the model used for simulating the domain. In this paper, we develop strategies for continuous online visualization of time evolving data for AMR applications executed on GPUs. We reorder the meshes for computations on the GPU based on the users input related to the subdomain that he wants to visualize. This makes the data available for visualization at a faster rate. We then perform asynchronous executions of the visualization steps and fix-up operations on the CPUs while the GPU advances the solution. By performing experiments on Tesla S1070 and Fermi C2070 clusters, we found that our strategies result in 60% improvement in response time and 16% improvement in the rate of visualization of frames over the existing strategy of performing fix-ups and visualization at the end of the timesteps.
Resumo:
Novel composite cyclodextrin (CD)-CaCO3 spherical porous microparticles have been synthesized through Ca2+-CD complex formation, which influences the crystal growth of CaCO3. The CDs are entrapped and distributed uniformly in the matrix of CaCO3 microparticles during crystallization. The hydrophobic fluorescent molecules coumarin and Nile red (NR) are efficiently encapsulated into these composite CD-CaCO3 porous particles through supramolecular inclusion complexation between entrapped CDs and hydrophobic molecules. Thermogravimetric (TGA) and infrared spectroscopy (IR) analysis of composite CD-CaCO3 particles reveals the presence of large CDs and their strong interaction with calcium carbonate nanoparticles. The resulting composite CD-CaCO3 microparticles are utilized as sacrificial templates for preparation of CD-modified layer-by-layer (LbL) capsules. After dissolution of the carbonate core, CDs are retained in the interior of the capsules in a network fashion and assist in the encapsulation of hydrophobic molecules. The efficient encapsulation of the hydrophobic fluorescent dye, coumarin, was successfully demonstrated using CD-modified capsules. In vitro release of the encapsulated coumarin from the CD-CaCO3 and CD-modified capsules has been demonstrated.
Resumo:
A Cu-Cu multilayer processed by accumulative roll bonding was deformed to large strains and further annealed. The texture of the deformed Cu-Cu multilayer differs from the conventional fcc rolling textures in terms of higher fractions of Bs and RD-rotated cube components, compared with the volume fraction of Cu component. The elongated grain shape significantly affects the deformation characteristics. Characteristic microstructural features of both continuous dynamic recrystallization and discontinuous dynamic recrystallization were observed in the microtexture measurements. X-ray texture measurements of annealing of heavily deformed multilayer demonstrate constrained recrystallization and resulted in a bimodal grain size distribution in the annealed material at higher strains. The presence of cube- and BR-oriented grains in the deformed material confirms the oriented nucleation as the major influence on texture change during recrystallization. Persistence of cube component throughout the deformation is attributed to dynamic recrystallization. Evolution of RD-rotated cube is attributed to the deformation of cube components that evolve from dynamic recrystallization. The relaxation of strain components leads to Bs at larger strains. Further, the Bs component is found to recover rather than recrystallize during deformation. The presence of predominantly Cu and Bs orientations surrounding the interface layer suggests constrained annealing behavior.
Resumo:
In this paper, we study the diversity-multiplexing-gain tradeoff (DMT) of wireless relay networks under the half-duplex constraint. It is often unclear what penalty if any, is imposed by the half-duplex constraint on the DMT of such networks. We study two classes of networks; the first class, called KPP(I) networks, is the class of networks with the relays organized in K parallel paths between the source and the destination. While we assume that there is no direct source-destination path, the K relaying paths can interfere with each other. The second class, termed as layered networks, is comprised of relays organized in layers, where links exist only between adjacent layers. We present a communication scheme based on static schedules and amplify-and-forward relaying for these networks. We also show that for KPP(I) networks with K >= 3, the proposed schemes can achieve full-duplex DMT performance, thus demonstrating that there is no performance hit on the DMT due to the half-duplex constraint. We also show that, for layered networks, a linear DMT of d(max)(1 - r)(+) between the maximum diversity d(max) and the maximum MG, r(max) = 1 is achievable. We adapt existing DMT optimal coding schemes to these networks, thus specifying the end-to-end communication strategy explicitly.
Resumo:
In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.
Resumo:
A computationally efficient approach that computes the optimal regularization parameter for the Tikhonov-minimization scheme is developed for photoacoustic imaging. This approach is based on the least squares-QR decomposition which is a well-known dimensionality reduction technique for a large system of equations. It is shown that the proposed framework is effective in terms of quantitative and qualitative reconstructions of initial pressure distribution enabled via finding an optimal regularization parameter. The computational efficiency and performance of the proposed method are shown using a test case of numerical blood vessel phantom, where the initial pressure is exactly known for quantitative comparison. (C) 2013 Society of Photo-Optical Instrumentation Engineers (SPIE)
Resumo:
The boxicity (cubicity) of a graph G, denoted by box(G) (respectively cub(G)), is the minimum integer k such that G can be represented as the intersection graph of axis parallel boxes (cubes) in ℝ k . The problem of computing boxicity (cubicity) is known to be inapproximable in polynomial time even for graph classes like bipartite, co-bipartite and split graphs, within an O(n 0.5 − ε ) factor for any ε > 0, unless NP = ZPP. We prove that if a graph G on n vertices has a clique on n − k vertices, then box(G) can be computed in time n22O(k2logk) . Using this fact, various FPT approximation algorithms for boxicity are derived. The parameter used is the vertex (or edge) edit distance of the input graph from certain graph families of bounded boxicity - like interval graphs and planar graphs. Using the same fact, we also derive an O(nloglogn√logn√) factor approximation algorithm for computing boxicity, which, to our knowledge, is the first o(n) factor approximation algorithm for the problem. We also present an FPT approximation algorithm for computing the cubicity of graphs, with vertex cover number as the parameter.
Resumo:
Accurate and timely prediction of weather phenomena, such as hurricanes and flash floods, require high-fidelity compute intensive simulations of multiple finer regions of interest within a coarse simulation domain. Current weather applications execute these nested simulations sequentially using all the available processors, which is sub-optimal due to their sub-linear scalability. In this work, we present a strategy for parallel execution of multiple nested domain simulations based on partitioning the 2-D processor grid into disjoint rectangular regions associated with each domain. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. Experiments on IBM Blue Gene systems using WRF show that the proposed strategies result in performance improvement of up to 33% with topology-oblivious mapping and up to additional 7% with topology-aware mapping over the default sequential strategy.
Resumo:
With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.