922 resultados para Parallel numerical algorithms
Resumo:
Parallel processing techniques have been used in the past to provide high performance computing resources for activities such as Computational Fluid Dynamics. This is normally achieved using specialized hardware and software, the expense of which would be difficult to justify for many fire engineering practices. In this paper, we demonstrate how typical office-based PCs attached to a local area network have the potential to offer the benefits of parallel processing with minimal costs associated with the purchase of additional hardware or software. A dynamic load balancing scheme was devised to allow the effective use of the software on heterogeneous PC networks. This scheme ensured that the impact between the parallel processing task and other computer users on the network was minimized thus allowing practical parallel processing within a conventional office environment. Copyright © 2006 John Wiley & Sons, Ltd.
Resumo:
Computer egress simulation has potential to be used in large scale incidents to provide live advice to incident commanders. While there are many considerations which must be taken into account when applying such models to live incidents, one of the first concerns the computational speed of simulations. No matter how important the insight provided by the simulation, numerical hindsight will not prove useful to an incident commander. Thus for this type of application to be useful, it is essential that the simulation can be run many times faster than real time. Parallel processing is a method of reducing run times for very large computational simulations by distributing the workload amongst a number of CPUs. In this paper we examine the development of a parallel version of the buildingEXODUS software. The parallel strategy implemented is based on a systematic partitioning of the problem domain onto an arbitrary number of sub-domains. Each sub-domain is computed on a separate processor and runs its own copy of the EXODUS code. The software has been designed to work on typical office based networked PCs but will also function on a Windows based cluster. Two evaluation scenarios using the parallel implementation of EXODUS are described; a large open area and a 50 story high-rise building scenario. Speed-ups of up to 3.7 are achieved using up to six computers, with high-rise building evacuation simulation achieving run times of 6.4 times faster than real time.
Resumo:
We consider a variety of preemptive scheduling problems with controllable processing times on a single machine and on identical/uniform parallel machines, where the objective is to minimize the total compression cost. In this paper, we propose fast divide-and-conquer algorithms for these scheduling problems. Our approach is based on the observation that each scheduling problem we discuss can be formulated as a polymatroid optimization problem. We develop a novel divide-and-conquer technique for the polymatroid optimization problem and then apply it to each scheduling problem. We show that each scheduling problem can be solved in $ \O({\rm T}_{\rm feas}(n) \times\log n)$ time by using our divide-and-conquer technique, where n is the number of jobs and Tfeas(n) denotes the time complexity of the corresponding feasible scheduling problem with n jobs. This approach yields faster algorithms for most of the scheduling problems discussed in this paper.
Resumo:
We consider a problem of scheduling jobs on m parallel machines. The machines are dedicated, i.e., for each job the processing machine is known in advance. We mainly concentrate on the model in which at any time there is one unit of an additional resource. Any job may be assigned the resource and this reduces its processing time. A job that is given the resource uses it at each time of its processing. No two jobs are allowed to use the resource simultaneously. The objective is to minimize the makespan. We prove that the two-machine problem is NP-hard in the ordinary sense, describe a pseudopolynomial dynamic programming algorithm and convert it into an FPTAS. For the problem with an arbitrary number of machines we present an algorithm with a worst-case ratio close to 3/2, and close to 3, if a job can be given several units of the resource. For the problem with a fixed number of machines we give a PTAS. Virtually all algorithms rely on a certain variant of the linear knapsack problem (maximization, minimization, multiple-choice, bicriteria). © 2008 Wiley Periodicals, Inc. Naval Research Logistics, 2008
Resumo:
In this paper, we provide a unified approach to solving preemptive scheduling problems with uniform parallel machines and controllable processing times. We demonstrate that a single criterion problem of minimizing total compression cost subject to the constraint that all due dates should be met can be formulated in terms of maximizing a linear function over a generalized polymatroid. This justifies applicability of the greedy approach and allows us to develop fast algorithms for solving the problem with arbitrary release and due dates as well as its special case with zero release dates and a common due date. For the bicriteria counterpart of the latter problem we develop an efficient algorithm that constructs the trade-off curve for minimizing the compression cost and the makespan.
Resumo:
Orthogonal frequency division multiplexing(OFDM) is becoming a fundamental technology in future generation wireless communications. Call admission control is an effective mechanism to guarantee resilient, efficient, and quality-of-service (QoS) services in wireless mobile networks. In this paper, we present several call admission control algorithms for OFDM-based wireless multiservice networks. Call connection requests are differentiated into narrow-band calls and wide-band calls. For either class of calls, the traffic process is characterized as batch arrival since each call may request multiple subcarriers to satisfy its QoS requirement. The batch size is a random variable following a probability mass function (PMF) with realistically maximum value. In addition, the service times for wide-band and narrow-band calls are different. Following this, we perform a tele-traffic queueing analysis for OFDM-based wireless multiservice networks. The formulae for the significant performance metrics call blocking probability and bandwidth utilization are developed. Numerical investigations are presented to demonstrate the interaction between key parameters and performance metrics. The performance tradeoff among different call admission control algorithms is discussed. Moreover, the analytical model has been validated by simulation. The methodology as well as the result provides an efficient tool for planning next-generation OFDM-based broadband wireless access systems.
Resumo:
A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable. © 2005 IEEE.
Resumo:
An analysis of a modified series-L/parallel-tuned Class-E power amplifier is presented, which includes the effects that a shunt capacitance placed across the switching device will have on Class-E behaviour. In the original series L/parallel-tuned topology in which the output transistor capacitance is not inherently included in the circuit, zero-current switching (ZCS) and zero-current derivative switching (ZCDS) conditions should be applied to obtain optimum Class-E operation. On the other hand, when the output transistor capacitance is incorporated in the circuit, i.e. in the modified series-L/parallel-tuned topology, the ZCS and ZCDS would not give optimum operation and therefore zero-voltage-switching (ZVS) and zero-voltage-derivative switching (ZVDS) conditions should be applied instead. In the modified series-L/parallel-tuned Class-E configuration, the output-device inductance and the output-device output capacitance, both of which can significantly affect the amplifier's performance at microwave frequencies, furnish part, if not all, of the series inductance L and the shunt capacitance COUT, respectively. Further, when compared with the classic shunt-C/series-tuned topology, the proposed Class-E configuration offers some advantages in terms of 44% higher maximum operating frequency (fMAX) and 4% higher power-output capability (PMAX). As in the classic topology, the fMAX of the proposed amplifier circuit is reached when the output-device output capacitance furnishes all of the capacitance COUT, for a given combination of frequency, output power and DC supply voltage. It is also shown that numerical simulations agree well with theoretical predictions.
Resumo:
Computionally efficient sequential learning algorithms are developed for direct-link resource-allocating networks (DRANs). These are achieved by decomposing existing recursive training algorithms on a layer by layer and neuron by neuron basis. This allows network weights to be updated in an efficient parallel manner and facilitates the implementation of minimal update extensions that yield a significant reduction in computation load per iteration compared to existing sequential learning methods employed in resource-allocation network (RAN) and minimal RAN (MRAN) approaches. The new algorithms, which also incorporate a pruning strategy to control network growth, are evaluated on three different system identification benchmark problems and shown to outperform existing methods both in terms of training error convergence and computational efficiency. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
A novel 3rd-order compact E-plane ridge waveguide filter is presented. Miniaturization is achieved upon introducing a configuration of parallel-coupled E-plane ridge waveguide resonators. Furthermore, the proposed filter allows for transmission zeros at finite frequencies. Fabrication simplicity and mass producibility of standard E-plane filters is maintained. The numerical and experimental results are presented to validate the proposed configuration. A miniaturisation factor of 2 and very sharp upper cutoff are achieved. 2005 Wiley Periodicals, Inc.
Resumo:
An analysis of the operation of a series-L/parallel-tuned class-E amplifier and its equivalence to the classic shunt-C/series-tuned class-E amplifier are presented. The first reported closed form design equations for the series-L/parallel-tuned topology operating under ideal switching conditions are given. Furthermore, a design procedure is introduced that allows the effect that nonzero switch resistance has on amplifier performance efficiency to be accounted for. The technique developed allows optimal circuit components to be found for a given device series resistance. For a relatively high value of switching device ON series resistance of 4O, drain efficiency of around 66% for the series-L/parallel-tuned topology, and 73% for the shunt-C/series-tuned topology appear to be the theoretical limits. At lower switching device series resistance levels, the efficiency performance of each type are similar, but the series-L/parallel-tuned topology offers some advantages in terms of its potential for MMIC realisation. Theoretical analysis is confirmed by numerical simulation for a 500mW (27dBm), 10% bandwidth, 5 V series-L/parallel-tuned, then, shunt-C/series-tuned class E power amplifier, operating at 2.5 GHz, and excellent agreement between theory and simulation results is achieved. The theoretical work presented in the paper should facilitate the design of high-efficiency switched amplifiers at frequencies commensurate with the needs of modern mobile wireless applications in the microwave frequency range, where intrinsically low-output-capacitance MMIC switching devices such as pHEMTs are to be used.
Resumo:
For the purpose of equalisation of rapidly time variant multipath channels, we derive a novel adaptive algorithm, the amplitude banded LMS (ABLMS); which implements a nonlinear adaptation based on a coefficient matrix. Then we develop the: ABLMS algorithm as the adaptation procedure for a linear transversal equaliser (LTE) and a decision feedback equaliser (DFE) where a parallel adaptation scheme is deployed. Computer simulations demonstrate that with a small increase of computational complexity, the ABLMS based parallel equalisers provide a significant improvement related to the conventional LMS DFE and the LMS LTE in the case of a second order Markov communication channel model.
Resumo:
For Variable Stiffness (VS) composites with steered curvilinear tow paths, the fiber orientation angle varies continuously throughout the laminate, and is not required to be straight, parallel and uniform within each ply as in conventional composite laminates. Hence, the thermal properties (conduction), as well as the structural stiffness and strength, vary as functions of location in the laminate, and the associated composite structure is often called a “variable stiffness” composite structure. The steered fibers lead not only to the alteration of mechanical load paths, but also to the alteration of thermal paths that may
result in favorable temperature distributions within the laminate and improve the laminate performance. Evaluation of VS laminate performance under thermal loading is the focus of this chapter. Thermal performance evaluations require experimental and numerical analysis of VS laminates under different processing and loading conditions. One of the advantages of using composite materials in many applications is the tailoring capability of the laminate,
not only during the design phase but also for manufacturing. Heat transfer through variable conduction and chemical reaction (degree of cure) occurring during manufacturing (curing) plays an important role in the final thermal and mechanical performance, and shape of composite structures.
Resumo:
Local computation in join trees or acyclic hypertrees has been shown to be linked to a particular algebraic structure, called valuation algebra.There are many models of this algebraic structure ranging from probability theory to numerical analysis, relational databases and various classical and non-classical logics. It turns out that many interesting models of valuation algebras may be derived from semiring valued mappings. In this paper we study how valuation algebras are induced by semirings and how the structure of the valuation algebra is related to the algebraic structure of the semiring. In particular, c-semirings with idempotent multiplication induce idempotent valuation algebras and therefore permit particularly efficient architectures for local computation. Also important are semirings whose multiplicative semigroup is embedded in a union of groups. They induce valuation algebras with a partially defined division. For these valuation algebras, the well-known architectures for Bayesian networks apply. We also extend the general computational framework to allow derivation of bounds and approximations, for when exact computation is not feasible.
Resumo:
Recently, a number of most significant digit (msd) first bit parallel multipliers for recursive filtering have been reported. However, the design approach which has been used has, in general, been heuristic and consequently, optimality has not always been assured. In this paper, msd first multiply accumulate algorithms are described and important relationships governing the dependencies between latency, number representations, etc are derived. A more systematic approach to designing recursive filters is illustrated by applying the algorithms and associated relationships to the design of cascadable modules for high sample rate IIR filtering and wave digital filtering.