190 resultados para PARALLEL COMPUTING
Resumo:
Laminar separation bubbles are thought to be highly non-parallel, and hence global stability studies start from this premise. However, experimentalists have always realized that the flow is more parallel than is commonly believed, for pressure-gradient-induced bubbles, and this is why linear parallel stability theory has been successful in describing their early stages of transition. The present experimental/numerical study re-examines this important issue and finds that the base flow in such a separation bubble becomes nearly parallel due to a strong-interaction process between the separated boundary layer and the outer potential flow. The so-called dead-air region or the region of constant pressure is a simple consequence of this strong interaction. We use triple-deck theory to qualitatively explain these features. Next, the implications of global analysis for the linear stability of separation bubbles are considered. In particular we show that in the initial portion of the bubble, where the flow is nearly parallel, local stability analysis is sufficient to capture the essential physics. It appears that the real utility of the global analysis is perhaps in the rear portion of the bubble, where the flow is highly non-parallel, and where the secondary/nonlinear instability stages are likely to dominate the dynamics.
Resumo:
Fragment Finder 2.0 is a web-based interactive computing server which can be used to retrieve structurally similar protein fragments from 25 and 90% nonredundant data sets. The computing server identifies structurally similar fragments using the protein backbone C alpha angles. In addition, the identified fragments can be superimposed using either of the two structural superposition programs, STAMP and PROFIT, provided in the server. The freely available Java plug-in Jmol has been interfaced with the server for the visualization of the query and superposed fragments. The server is the updated version of a previously developed search engine and employs an in-house-developed fast pattern matching algorithm. This server can be accessed freely over the World Wide Web through the URL http://cluster.physics.iisc.ernet.in/ff/.
Resumo:
Many common activities, like reading, scanning scenes, or searching for an inconspicuous item in a cluttered environment, entail serial movements of the eyes that shift the gaze from one object to another. Previous studies have shown that the primate brain is capable of programming sequential saccadic eye movements in parallel. Given that the onset of saccades directed to a target are unpredictable in individual trials, what prevents a saccade during parallel programming from being executed in the direction of the second target before execution of another saccade in the direction of the first target remains unclear. Using a computational model, here we demonstrate that sequential saccades inhibit each other and share the brain's limited processing resources (capacity) so that the planning of a saccade in the direction of the first target always finishes first. In this framework, the latency of a saccade increases linearly with the fraction of capacity allocated to the other saccade in the sequence, and exponentially with the duration of capacity sharing. Our study establishes a link between the dual-task paradigm and the ramp-to-threshold model of response time to identify a physiologically viable mechanism that preserves the serial order of saccades without compromising the speed of performance.
Resumo:
Carbon nanotubes dispersed in polymer matrix have been aligned in the form of fibers and interconnects and cured electrically and by UV light. Conductivity and effective semiconductor tunneling against reverse to forward bias field have been designed to have differentiable current-voltage response of each of the fiber/channel. The current-voltage response is a function of the strain applied to the fibers along axial direction. Biaxial and shear strains are correlated by differentiating signals from the aligned fibers/channels. Using a small doping of magnetic nanoparticles in these composite fibers, magneto-resistance properties are realized which are strong enough to use the resulting magnetostriction as a state variable for signal processing and computing. Various basic analog signal processing tasks such as addition, convolution and filtering etc. can be performed. These preliminary study shows promising application of the concept in combined analog-digital computation in carbon nanotube based fibers. Various dynamic effects such as relaxation, electric field dependent nonlinearities and hysteresis on the output signals are studied using experimental data and analytical model.
Resumo:
In this paper, we address a scheduling problem for minimizing total weighted flowtime, observed in automobile gear manufacturing. Specifically, the bottleneck operation of the pre-heat treatment stage of gear manufacturing process has been dealt with in scheduling. Many real-life scenarios like unequal release times, sequence dependent setup times, and machine eligibility restrictions have been considered. A mathematical model taking into account dynamic starting conditions has been proposed. The problem is derived to be NP-hard. To approach the problem, a few heuristic algorithms have been proposed. Based on planned computational experiments, the performance of the proposed heuristic algorithms is evaluated: (a) in comparison with optimal solution for small-size problem instances and (b) in comparison with the estimated optimal solution for large-size problem instances. Extensive computational analyses reveal that the proposed heuristic algorithms are capable of consistently yielding near-statistically estimated optimal solutions in a reasonable computational time.
Resumo:
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
This paper presents a decentralized/peer-to-peer architecture-based parallel version of the vector evaluated particle swarm optimization (VEPSO) algorithm for multi-objective design optimization of laminated composite plates using message passing interface (MPI). The design optimization of laminated composite plates being a combinatorially explosive constrained non-linear optimization problem (CNOP), with many design variables and a vast solution space, warrants the use of non-parametric and heuristic optimization algorithms like PSO. Optimization requires minimizing both the weight and cost of these composite plates, simultaneously, which renders the problem multi-objective. Hence VEPSO, a multi-objective variant of the PSO algorithm, is used. Despite the use of such a heuristic, the application problem, being computationally intensive, suffers from long execution times due to sequential computation. Hence, a parallel version of the PSO algorithm for the problem has been developed to run on several nodes of an IBM P720 cluster. The proposed parallel algorithm, using MPI's collective communication directives, establishes a peer-to-peer relationship between the constituent parallel processes, deviating from the more common master-slave approach, in achieving reduction of computation time by factor of up to 10. Finally we show the effectiveness of the proposed parallel algorithm by comparing it with a serial implementation of VEPSO and a parallel implementation of the vector evaluated genetic algorithm (VEGA) for the same design problem. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The Reeb graph of a scalar function tracks the evolution of the topology of its level sets. This paper describes a fast algorithm to compute the Reeb graph of a piecewise-linear (PL) function defined over manifolds and non-manifolds. The key idea in the proposed approach is to maximally leverage the efficient contour tree algorithm to compute the Reeb graph. The algorithm proceeds by dividing the input into a set of subvolumes that have loop-free Reeb graphs using the join tree of the scalar function and computes the Reeb graph by combining the contour trees of all the subvolumes. Since the key ingredient of this method is a series of union-find operations, the algorithm is fast in practice. Experimental results demonstrate that it outperforms current generic algorithms by a factor of up to two orders of magnitude, and has a performance on par with algorithms that are catered to restricted classes of input. The algorithm also extends to handle large data that do not fit in memory.
Resumo:
Surface electrodes are essentially required to be switched for boundary data collection in electrical impedance tomography (Ell). Parallel digital data bits are required to operate the multiplexers used, generally, for electrode switching in ELT. More the electrodes in an EIT system more the digital data bits are needed. For a sixteen electrode system. 16 parallel digital data bits are required to operate the multiplexers in opposite or neighbouring current injection method. In this paper a common ground current injection is proposed for EIT and the resistivity imaging is studied. Common ground method needs only two analog multiplexers each of which need only 4 digital data bits and hence only 8 digital bits are required to switch the 16 surface electrodes. Results show that the USB based data acquisition system sequentially generate digital data required for multiplexers operating in common ground current injection method. The profile of the boundary data collected from practical phantom show that the multiplexers are operating in the required sequence in common ground current injection protocol. The voltage peaks obtained for all the inhomogeneity configurations are found at the accurate positions in the boundary data matrix which proved the sequential operation of multiplexers. Resistivity images reconstructed from the boundary data collected from the practical phantom with different configurations also show that the entire digital data generation module is functioning properly. Reconstructed images and their image parameters proved that the boundary data are successfully acquired by the DAQ system which in turn indicates a sequential and proper operation of multiplexers.
Resumo:
Surface electrode switching of 16-electrode wireless EIT is studied using a Radio Frequency (RF) based digital data transmission technique operating with 8 channel encoder/decoder ICs. An electrode switching module is developed the analog multiplexers and switched with 8-bit parallel digital data transferred by transmitter/receiver module developed with radio frequency technology. 8-bit parallel digital data collected from the receiver module are converted to 16-bit digital data by using binary adder circuits and then used for switching the electrodes in opposite current injection protocol. 8-bit parallel digital data are generated using NI USB 6251 DAQ card in LabVIEW software and sent to the transmission module which transmits the digital data bits to the receiver end. Receiver module supplies the parallel digital bits to the binary adder circuits and adder circuit outputs are fed to the multiplexers of the electrode switching module for surface electrode switching. 1 mA, 50 kHz sinusoidal constant current is injected at the phantom boundary using opposite current injection protocol. The boundary potentials developed at the voltage electrodes are measured and studied to assess the wireless data transmission.
Resumo:
Boxicity of a graph G(V, E) is the minimum integer k such that G can be represented as the intersection graph of k-dimensional axis parallel boxes in Rk. Equivalently, it is the minimum number of interval graphs on the vertex set V such that the intersection of their edge sets is E. It is known that boxicity cannot be approximated even for graph classes like bipartite, co-bipartite and split graphs below O(n0.5-ε)-factor, for any ε > 0 in polynomial time unless NP = ZPP. Till date, there is no well known graph class of unbounded boxicity for which even an nε-factor approximation algorithm for computing boxicity is known, for any ε < 1. In this paper, we study the boxicity problem on Circular Arc graphs - intersection graphs of arcs of a circle. We give a (2+ 1/k)-factor polynomial time approximation algorithm for computing the boxicity of any circular arc graph along with a corresponding box representation, where k ≥ 1 is its boxicity. For Normal Circular Arc(NCA) graphs, with an NCA model given, this can be improved to an additive 2-factor approximation algorithm. The time complexity of the algorithms to approximately compute the boxicity is O(mn+n2) in both these cases and in O(mn+kn2) which is at most O(n3) time we also get their corresponding box representations, where n is the number of vertices of the graph and m is its number of edges. The additive 2-factor algorithm directly works for any Proper Circular Arc graph, since computing an NCA model for it can be done in polynomial time.
Resumo:
Identical parallel-connected converters with unequal load sharing have unequal terminal voltages. The difference in terminal voltages is more pronounced in case of back-to-back connected converters, operated in power-circulation mode for the purpose of endurance tests. In this paper, a synchronous reference frame based analysis is presented to estimate the grid current distortion in interleaved, grid-connected converters with unequal terminal voltages. Influence of carrier interleaving angle on rms grid current ripple is studied theoretically as well as experimentally. Optimum interleaving angle to minimize the rms grid current ripple is investigated for different applications of parallel converters. The applications include unity power factor rectifiers, inverters for renewable energy sources, reactive power compensators, and circulating-power test set-up used for thermal testing of high-power converters. Optimum interleaving angle is shown to be a strong function of the average of the modulation indices of the two converters, irrespective of the application. The findings are verified experimentally on two parallel-connected converters, circulating reactive power of up to 150 kVA between them.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
We show that every graph of maximum degree 3 can be represented as the intersection graph of axis parallel boxes in three dimensions, that is, every vertex can be mapped to an axis parallel box such that two boxes intersect if and only if their corresponding vertices are adjacent. In fact, we construct a representation in which any two intersecting boxes just touch at their boundaries. Further, this construction can be realized in linear time.