933 resultados para Parallel computing


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We describe a model of computation of the parallel type, which we call 'computing with bio-agents', based on the concept that motions of biological objects such as bacteria or protein molecular motors in confined spaces can be regarded as computations. We begin with the observation that the geometric nature of the physical structures in which model biological objects move modulates the motions of the latter. Consequently, by changing the geometry, one can control the characteristic trajectories of the objects; on the basis of this, we argue that such systems are computing devices. We investigate the computing power of mobile bio-agent systems and show that they are computationally universal in the sense that they are capable of computing any Boolean function in parallel. We argue also that using appropriate conditions, bio-agent systems can solve NP-complete problems in probabilistic polynomial time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To cover wide range of pulsed power applications, this paper proposes a modularity concept to improve the performance and flexibility of the pulsed power supply. The proposed scheme utilizes the advantage of parallel and series configurations of flyback modules in obtaining high-voltage levels with fast rise time (dv/dt). Prototypes were implemented using 600-V insulated-gate bipolar transistor (IGBT) switches to generate up to 4-kV output pulses with 1-kHz repetition rate for experimentation. To assess the proposed modular approach for higher number of the modules, prototypes were implemented using 1700-V IGBTs switches, based on ten-series modules, and tested up to 20 kV. Conducted experimental results verified the effectiveness of the proposed method

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cloud computing allows for vast computational resources to be leveraged quickly and easily in bursts as and when required. Here we describe a technique that allows for Monte Carlo radiotherapy dose calculations to be performed using GEANT4 and executed in the cloud, with relative simulation cost and completion time evaluated as a function of machine count. As expected, simulation completion time decreases as 1=n for n parallel machines, and relative simulation cost is found to be optimal where n is a factor of the total simulation time in hours. Using the technique, we demonstrate the potential usefulness of cloud computing as a solution for rapid Monte Carlo simulation for radiotherapy dose calculation without the need for dedicated local computer hardware as a proof of principal. Funding source Cancer Australia (Department of Health and Ageing) Research Grant 614217

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this PhD research program is to investigate numerical methods for simulating variably-saturated flow and sea water intrusion in coastal aquifers in a high-performance computing environment. The work is divided into three overlapping tasks: to develop an accurate and stable finite volume discretisation and numerical solution strategy for the variably-saturated flow and salt transport equations; to implement the chosen approach in a high performance computing environment that may have multiple GPUs or CPU cores; and to verify and test the implementation. The geological description of aquifers is often complex, with porous materials possessing highly variable properties, that are best described using unstructured meshes. The finite volume method is a popular method for the solution of the conservation laws that describe sea water intrusion, and is well-suited to unstructured meshes. In this work we apply a control volume-finite element (CV-FE) method to an extension of a recently proposed formulation (Kees and Miller, 2002) for variably saturated groundwater flow. The CV-FE method evaluates fluxes at points where material properties and gradients in pressure and concentration are consistently defined, making it both suitable for heterogeneous media and mass conservative. Using the method of lines, the CV-FE discretisation gives a set of differential algebraic equations (DAEs) amenable to solution using higher-order implicit solvers. Heterogeneous computer systems that use a combination of computational hardware such as CPUs and GPUs, are attractive for scientific computing due to the potential advantages offered by GPUs for accelerating data-parallel operations. We present a C++ library that implements data-parallel methods on both CPU and GPUs. The finite volume discretisation is expressed in terms of these data-parallel operations, which gives an efficient implementation of the nonlinear residual function. This makes the implicit solution of the DAE system possible on the GPU, because the inexact Newton-Krylov method used by the implicit time stepping scheme can approximate the action of a matrix on a vector using residual evaluations. We also propose preconditioning strategies that are amenable to GPU implementation, so that all computationally-intensive aspects of the implicit time stepping scheme are implemented on the GPU. Results are presented that demonstrate the efficiency and accuracy of the proposed numeric methods and formulation. The formulation offers excellent conservation of mass, and higher-order temporal integration increases both numeric efficiency and accuracy of the solutions. Flux limiting produces accurate, oscillation-free solutions on coarse meshes, where much finer meshes are required to obtain solutions with equivalent accuracy using upstream weighting. The computational efficiency of the software is investigated using CPUs and GPUs on a high-performance workstation. The GPU version offers considerable speedup over the CPU version, with one GPU giving speedup factor of 3 over the eight-core CPU implementation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

MapReduce is a computation model for processing large data sets in parallel on large clusters of machines, in a reliable, fault-tolerant manner. A MapReduce computation is broken down into a number of map tasks and reduce tasks, which are performed by so called mappers and reducers, respectively. The placement of the mappers and reducers on the machines directly affects the performance and cost of the MapReduce computation. From the computational point of view, the mappers/reducers placement problem is a generation of the classical bin packing problem, which is NPcomplete. Thus, in this paper we propose a new grouping genetic algorithm for the mappers/reducers placement problem in cloud computing. Compared with the original one, our grouping genetic algorithm uses an innovative coding scheme and also eliminates the inversion operator which is an essential operator in the original grouping genetic algorithm. The new grouping genetic algorithm is evaluated by experiments and the experimental results show that it is much more efficient than four popular algorithms for the problem, including the original grouping genetic algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A global recursive bisection algorithm is described for computing the complex zeros of a polynomial. It has complexityO(n 3 p) wheren is the degree of the polynomial andp the bit precision requirement. Ifn processors are available, it can be realized in parallel with complexityO(n 2 p); also it can be implemented using exact arithmetic. A combined Wilf-Hansen algorithm is suggested for reduction in complexity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Stochastic volatility models are of fundamental importance to the pricing of derivatives. One of the most commonly used models of stochastic volatility is the Heston Model in which the price and volatility of an asset evolve as a pair of coupled stochastic differential equations. The computation of asset prices and volatilities involves the simulation of many sample trajectories with conditioning. The problem is treated using the method of particle filtering. While the simulation of a shower of particles is computationally expensive, each particle behaves independently making such simulations ideal for massively parallel heterogeneous computing platforms. In this paper, we present our portable Opencl implementation of the Heston model and discuss its performance and efficiency characteristics on a range of architectures including Intel cpus, Nvidia gpus, and Intel Many-Integrated-Core (mic) accelerators.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An axis-parallel k-dimensional box is a Cartesian product R-1 x R-2 x...x R-k where R-i (for 1 <= i <= k) is a closed interval of the form [a(i), b(i)] on the real line. For a graph G, its boxicity box(G) is the minimum dimension k, such that G is representable as the intersection graph of (axis-parallel) boxes in k-dimensional space. The concept of boxicity finds applications in various areas such as ecology, operations research etc. A number of NP-hard problems are either polynomial time solvable or have much better approximation ratio on low boxicity graphs. For example, the max-clique problem is polynomial time solvable on bounded boxicity graphs and the maximum independent set problem for boxicity d graphs, given a box representation, has a left perpendicular1 + 1/c log n right perpendicular(d-1) approximation ratio for any constant c >= 1 when d >= 2. In most cases, the first step usually is computing a low dimensional box representation of the given graph. Deciding whether the boxicity of a graph is at most 2 itself is NP-hard. We give an efficient randomized algorithm to construct a box representation of any graph G on n vertices in left perpendicular(Delta + 2) ln nright perpendicular dimensions, where Delta is the maximum degree of G. This algorithm implies that box(G) <= left perpendicular(Delta + 2) ln nright perpendicular for any graph G. Our bound is tight up to a factor of ln n. We also show that our randomized algorithm can be derandomized to get a polynomial time deterministic algorithm. Though our general upper bound is in terms of maximum degree Delta, we show that for almost all graphs on n vertices, their boxicity is O(d(av) ln n) where d(av) is the average degree.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The move towards IT outsourcing is the first step towards an environment where compute infrastructure is treated as a service. In utility computing this IT service has to honor Service Level Agreements (SLA) in order to meet the desired Quality of Service (QoS) guarantees. Such an environment requires reliable services in order to maximize the utilization of the resources and to decrease the Total Cost of Ownership (TCO). Such reliability cannot come at the cost of resource duplication, since it increases the TCO of the data center and hence the cost per compute unit. We, in this paper, look into aspects of projecting impact of hardware failures on the SLAs and techniques required to take proactive recovery steps in case of a predicted failure. By maintaining health vectors of all hardware and system resources, we predict the failure probability of resources based on observed hardware errors/failure events, at runtime. This inturn influences an availability aware middleware to take proactive action (even before the application is affected in case the system and the application have low recoverability). The proposed framework has been prototyped on a system running HP-UX. Our offline analysis of the prediction system on hardware error logs indicate no more than 10% false positives. This work to the best of our knowledge is the first of its kind to perform an end-to-end analysis of the impact of a hardware fault on application SLAs, in a live system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The physical design of a VLSI circuit involves circuit partitioning as a subtask. Typically, it is necessary to partition a large electrical circuit into several smaller circuits such that the total cross-wiring is minimized. This problem is a variant of the more general graph partitioning problem, and it is known that there does not exist a polynomial time algorithm to obtain an optimal partition. The heuristic procedure proposed by Kernighan and Lin1,2 requires O(n2 log2n) time to obtain a near-optimal two-way partition of a circuit with n modules. In the VLSI context, due to the large problem size involved, this computational requirement is unacceptably high. This paper is concerned with the hardware acceleration of the Kernighan-Lin procedure on an SIMD architecture. The proposed parallel partitioning algorithm requires O(n) processors, and has a time complexity of O(n log2n). In the proposed scheme, the reduced array architecture is employed with due considerations towards cost effectiveness and VLSI realizability of the architecture.The authors are not aware of any earlier attempts to parallelize a circuit partitioning algorithm in general or the Kernighan-Lin algorithm in particular. The use of the reduced array architecture is novel and opens up the possibilities of using this computing structure for several other applications in electronic design automation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we give a generalized predictor-corrector algorithm for solving ordinary differential equations with specified initial values. The method uses multiple correction steps which can be carried out in parallel with a prediction step. The proposed method gives a larger stability interval compared to the existing parallel predictor-corrector methods. A method has been suggested to implement the algorithm in multiple processor systems with efficient utilization of all the processors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Massively parallel SIMD computing is applied to obtain an order of magnitude improvement in the executional speed of an important algorithm in VLSI design automation. The physical design of a VLSI circuit involves logic module placement as a subtask. The paper is concerned with accelerating the well known Min-cut placement technique for logic cell placement. The inherent parallelism of the Min-cut algorithm is identified, and it is shown that a parallel machine based on the efficient execution of the placement procedure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses the parallel implementation of the solution of a set of linear equations using the Alternative Quadrant Interlocking Factorisation Methods (AQIF), on a star topology. Both the AQIF and LU decomposition methods are mapped onto star topology on an IBM SP2 system, with MPI as the internode communicator. Performance parameters such as speedup, efficiency have been obtained through experimental and theoretical means. The studies demonstrate (i) a mismatch of 15% between the theoretical and experimental results, (ii) scalability of the AQIF algorithm, and (iii) faster executing AQIF algorithm.