224 resultados para Parallel computing. Multilayer perceptron. OpenMP
Resumo:
Decoherence as an obstacle in quantum computation is viewed as a struggle between two forces [1]: the computation which uses the exponential dimension of Hilbert space, and decoherence which destroys this entanglement by collapse. In this model of decohered quantum computation, a sequential quantum computer loses the battle, because at each time step, only a local operation is carried out but g*(t) number of gates collapse. With quantum circuits computing in parallel way the situation is different- g(t) number of gates can be applied at each time step and number gates collapse because of decoherence. As g(t) ≈ g*(t) competition here is even [1]. Our paper improves on this model by slowing down g*(t) by encoding the circuit in parallel computing architectures and running it in Single Instruction Multiple Data (SIMD) paradigm. We have proposed a parallel ion trap architecture for single-bit rotation of a qubit.
Resumo:
The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that is represented using either an unstructured mesh or a structured grid. A hybrid implementation of the algorithm using the GPU and multi-core CPU can compute the contour tree of an input containing 16 million vertices in less than ten seconds with a speedup factor of upto 13. Experiments based on an implementation in a multi-core CPU environment show near-linear speedup for large data sets.
Resumo:
We discuss the computational bottlenecks in molecular dynamics (MD) and describe the challenges in parallelizing the computation-intensive tasks. We present a hybrid algorithm using MPI (Message Passing Interface) with OpenMP threads for parallelizing a generalized MD computation scheme for systems with short range interatomic interactions. The algorithm is discussed in the context of nano-indentation of Chromium films with carbon indenters using the Embedded Atom Method potential for Cr-Cr interaction and the Morse potential for Cr-C interactions. We study the performance of our algorithm for a range of MPI-thread combinations and find the performance to depend strongly on the computational task and load sharing in the multi-core processor. The algorithm scaled poorly with MPI and our hybrid schemes were observed to outperform the pure message passing scheme, despite utilizing the same number of processors or cores in the cluster. Speed-up achieved by our algorithm compared favorably with that achieved by standard MD packages. (C) 2013 Elsevier Inc. All rights reserved.
Resumo:
A global recursive bisection algorithm is described for computing the complex zeros of a polynomial. It has complexityO(n 3 p) wheren is the degree of the polynomial andp the bit precision requirement. Ifn processors are available, it can be realized in parallel with complexityO(n 2 p); also it can be implemented using exact arithmetic. A combined Wilf-Hansen algorithm is suggested for reduction in complexity.
Resumo:
An axis-parallel k-dimensional box is a Cartesian product R-1 x R-2 x...x R-k where R-i (for 1 <= i <= k) is a closed interval of the form [a(i), b(i)] on the real line. For a graph G, its boxicity box(G) is the minimum dimension k, such that G is representable as the intersection graph of (axis-parallel) boxes in k-dimensional space. The concept of boxicity finds applications in various areas such as ecology, operations research etc. A number of NP-hard problems are either polynomial time solvable or have much better approximation ratio on low boxicity graphs. For example, the max-clique problem is polynomial time solvable on bounded boxicity graphs and the maximum independent set problem for boxicity d graphs, given a box representation, has a left perpendicular1 + 1/c log n right perpendicular(d-1) approximation ratio for any constant c >= 1 when d >= 2. In most cases, the first step usually is computing a low dimensional box representation of the given graph. Deciding whether the boxicity of a graph is at most 2 itself is NP-hard. We give an efficient randomized algorithm to construct a box representation of any graph G on n vertices in left perpendicular(Delta + 2) ln nright perpendicular dimensions, where Delta is the maximum degree of G. This algorithm implies that box(G) <= left perpendicular(Delta + 2) ln nright perpendicular for any graph G. Our bound is tight up to a factor of ln n. We also show that our randomized algorithm can be derandomized to get a polynomial time deterministic algorithm. Though our general upper bound is in terms of maximum degree Delta, we show that for almost all graphs on n vertices, their boxicity is O(d(av) ln n) where d(av) is the average degree.
Resumo:
The move towards IT outsourcing is the first step towards an environment where compute infrastructure is treated as a service. In utility computing this IT service has to honor Service Level Agreements (SLA) in order to meet the desired Quality of Service (QoS) guarantees. Such an environment requires reliable services in order to maximize the utilization of the resources and to decrease the Total Cost of Ownership (TCO). Such reliability cannot come at the cost of resource duplication, since it increases the TCO of the data center and hence the cost per compute unit. We, in this paper, look into aspects of projecting impact of hardware failures on the SLAs and techniques required to take proactive recovery steps in case of a predicted failure. By maintaining health vectors of all hardware and system resources, we predict the failure probability of resources based on observed hardware errors/failure events, at runtime. This inturn influences an availability aware middleware to take proactive action (even before the application is affected in case the system and the application have low recoverability). The proposed framework has been prototyped on a system running HP-UX. Our offline analysis of the prediction system on hardware error logs indicate no more than 10% false positives. This work to the best of our knowledge is the first of its kind to perform an end-to-end analysis of the impact of a hardware fault on application SLAs, in a live system.
Resumo:
The physical design of a VLSI circuit involves circuit partitioning as a subtask. Typically, it is necessary to partition a large electrical circuit into several smaller circuits such that the total cross-wiring is minimized. This problem is a variant of the more general graph partitioning problem, and it is known that there does not exist a polynomial time algorithm to obtain an optimal partition. The heuristic procedure proposed by Kernighan and Lin1,2 requires O(n2 log2n) time to obtain a near-optimal two-way partition of a circuit with n modules. In the VLSI context, due to the large problem size involved, this computational requirement is unacceptably high. This paper is concerned with the hardware acceleration of the Kernighan-Lin procedure on an SIMD architecture. The proposed parallel partitioning algorithm requires O(n) processors, and has a time complexity of O(n log2n). In the proposed scheme, the reduced array architecture is employed with due considerations towards cost effectiveness and VLSI realizability of the architecture.The authors are not aware of any earlier attempts to parallelize a circuit partitioning algorithm in general or the Kernighan-Lin algorithm in particular. The use of the reduced array architecture is novel and opens up the possibilities of using this computing structure for several other applications in electronic design automation.
Resumo:
In this paper we give a generalized predictor-corrector algorithm for solving ordinary differential equations with specified initial values. The method uses multiple correction steps which can be carried out in parallel with a prediction step. The proposed method gives a larger stability interval compared to the existing parallel predictor-corrector methods. A method has been suggested to implement the algorithm in multiple processor systems with efficient utilization of all the processors.
Resumo:
Massively parallel SIMD computing is applied to obtain an order of magnitude improvement in the executional speed of an important algorithm in VLSI design automation. The physical design of a VLSI circuit involves logic module placement as a subtask. The paper is concerned with accelerating the well known Min-cut placement technique for logic cell placement. The inherent parallelism of the Min-cut algorithm is identified, and it is shown that a parallel machine based on the efficient execution of the placement procedure.
Resumo:
This paper discusses the parallel implementation of the solution of a set of linear equations using the Alternative Quadrant Interlocking Factorisation Methods (AQIF), on a star topology. Both the AQIF and LU decomposition methods are mapped onto star topology on an IBM SP2 system, with MPI as the internode communicator. Performance parameters such as speedup, efficiency have been obtained through experimental and theoretical means. The studies demonstrate (i) a mismatch of 15% between the theoretical and experimental results, (ii) scalability of the AQIF algorithm, and (iii) faster executing AQIF algorithm.
Resumo:
As computational Grids are increasingly used for executing long running multi-phase parallel applications, it is important to develop efficient rescheduling frameworks that adapt application execution in response to resource and application dynamics. In this paper, three strategies or algorithms have been developed for deciding when and where to reschedule parallel applications that execute on multi-cluster Grids. The algorithms derive rescheduling plans that consist of potential points in application execution for rescheduling and schedules of resources for application execution between two consecutive rescheduling points. Using large number of simulations, it is shown that the rescheduling plans developed by the algorithms can lead to large decrease in application execution times when compared to executions without rescheduling on dynamic Grid resources. The rescheduling plans generated by the algorithms are also shown to be competitive when compared to the near-optimal plans generated by brute-force methods. Of the algorithms, genetic algorithm yielded the most efficient rescheduling plans with 9-12% smaller average execution times than the other algorithms.
Resumo:
In this paper, we present an algebraic method to study and design spatial parallel manipulators that demonstrate isotropy in the force and moment distributions.We use the force and moment transformation matrices separately,and derive conditions for their isotropy individually as well as in combination. The isotropy conditions are derived in closed-form in terms of the invariants of the quadratic forms associated with these matrices. The formulation has been applied to a class of Stewart platform manipulators. We obtain multi-parameter families of isotropic manipulator analytically. In addition to computing the isotropic configurations of an existing manipulator,we demonstrate a procedure for designing the manipulator for isotropy at a given configuration.
Resumo:
The Morse-Smale complex is a useful topological data structure for the analysis and visualization of scalar data. This paper describes an algorithm that processes all mesh elements of the domain in parallel to compute the Morse-Smale complex of large two-dimensional data sets at interactive speeds. We employ a reformulation of the Morse-Smale complex using Forman's Discrete Morse Theory and achieve scalability by computing the discrete gradient using local accesses only. We also introduce a novel approach to merge gradient paths that ensures accurate geometry of the computed complex. We demonstrate that our algorithm performs well on both multicore environments and on massively parallel architectures such as the GPU.
Resumo:
Information forms the basis of modern technology. To meet the ever-increasing demand for information, means have to be devised for a more efficient and better-equipped technology to intelligibly process data. Advances in photonics have made their impact on each of the four key applications in information processing, i.e., acquisition, transmission, storage and processing of information. The inherent advantages of ultrahigh bandwidth, high speed and low-loss transmission has already established fiber-optics as the backbone of communication technology. However, the optics to electronics inter-conversion at the transmitter and receiver ends severely limits both the speed and bit rate of lightwave communication systems. As the trend towards still faster and higher capacity systems continues, it has become increasingly necessary to perform more and more signal-processing operations in the optical domain itself, i.e., with all-optical components and devices that possess a high bandwidth and can perform parallel processing functions to eliminate the electronic bottleneck.
Resumo:
Fast three-dimensional (3D) imaging requires parallel optical slicing of a specimen with an efficient detection scheme. The generation of multiple localized dot-like excitation structures solves the problem of simultaneous slicing multiple specimen layers, but an efficient detection scheme is necessary. Confocal theta detection (detection at 90 degrees to the optical axis) provides a suitable detection platform that is capable of cross-talk-free fluorescence detection from each nanodot (axial dimension approximate to 150 nm). Additionally, this technique has the unique feature of imaging a specimen at a large working distance with super-resolution capabilities. Polarization studies show distinct field structures for fixed and fluid samples, indicating a non-negligible field-dipole interaction. The realization of the proposed imaging technique will advance and diversify multiphoton fluorescence microscopy for numerous applications in nanobioimaging and optical engineering.