Biblioteca Digital

993 resultados para parallel efficiency

Efficiency Comparison of DFT/IDFT Algorithms by Evaluating Diverse Hardware Implementations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniques for maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables, and an approach for performing parallel addition of N input symbols.

Efficiency Comparison of DFT/IDFT Algorithms by Evaluating Diverse Hardware : Implementations,Parallelization Prospectsand Possible Improvements

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniquesfor maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables,and an approach for performing parallel addition of N input symbols.

Organisational status and efficiency: the case of the spanish SOE "Paradores"

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this paper is to provide new evidence on the issue of the effect on public enterprises economic performance of the introduction of some given changes in organisational status and management practices, while keeping the enterprises under public control. Our approach is case study type and relies on comparative efficiency literature. We identify relevant changes on the organisational status of a State owned large hotel group along a period of twenty years, next we measure its annual efficiency indicators, and then evaluate to which extent the observed changes in economic performance can be attributable to the corresponding management reforms carried out. As a result we find that the formally more relevant change in organisational status (the enterprise passing to be a Limited Company), which implied a substantial increase in the enterprise autonomy, did not produce a significant improvement in its economic performance; a finding contrary to what we expected according to agency theory. However, a second relevant organisational change –five years later- when both the principal (government) and the agent (firm’s CEO) changed is consistently related to a significant improvement in economic performance. As a research implication we abide for use more precise agency theory statements; and as a practical implication we argue here that potentialities of improvement brought about by a formal-legal change in the status of the enterprise may require also –in order to actually improve firm’s efficiency- some changes in the firm’s key personal positions: supervisor (principal) and CEO (agent), in the sense that a change to a greater-autonomy for the enterprise it seems should come together a parallel new ‘management culture’. Practical implications Management good practises to apply to other public enterprise’s restructuring in order to improve their efficiency. It’s the first study on organizational changes and efficiency for an important Spanish public enterprise.

On the computation of reducible invariant tori on a parallel computer

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an algorithm for the computation of reducible invariant tori of discrete dynamical systems that is suitable for tori of dimensions larger than 1. It is based on a quadratically convergent scheme that approximates, at the same time, the Fourier series of the torus, its Floquet transformation, and its Floquet matrix. The Floquet matrix describes the linearization of the dynamics around the torus and, hence, its linear stability. The algorithm presents a high degree of parallelism, and the computational effort grows linearly with the number of Fourier modes needed to represent the solution. For these reasons it is a very good option to compute quasi-periodic solutions with several basic frequencies. The paper includes some examples (flows) to show the efficiency of the method in a parallel computer. In these flows we compute invariant tori of dimensions up to 5, by taking suitable sections.

Energy Efficiency of a Diesel-Electric MobileWorking Machine

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The power demand of many mobile working machines such as mine loaders, straddle carriers and harvesters varies significantly during operation, and typically, the average power demand of a working machine is considerably lower than the demand for maximum power. Consequently, for most of the time, the diesel engine of a working machine operates at a poor efficiency far from its optimum efficiency range. However, the energy efficiency of dieseldriven working machines can be improved by electric hybridization. This way, the diesel engine can be dimensioned to operate within its optimum efficiency range, and the electric drive with its energy storages responds to changes in machine loading. A hybrid working machine can be implemented in many ways either as a parallel hybrid, a series hybrid or a combination of these two. The energy efficiency of hybrid working machines can be further enhanced by energy recovery and reuse. This doctoral thesis introduces the component models required in the simulation model of a working machine. Component efficiency maps are applied to the modelling; the efficiency maps for electrical machines are determined analytically in the whole torque–rotational speed plane based on the electricalmachine parameters. Furthermore, the thesis provides simulation models for parallel, series and parallel-series hybrid working machines. With these simulation models, the energy consumption of the working machine can be analysed. In addition, the hybridization process is introduced and described. The thesis provides a case example of the hybridization and dimensioning process of a working machine, starting from the work cycle of the machine. The selection and dimensioning of the hybrid system have a significant impact on the energy consumption of a hybrid working machine. The thesis compares the energy consumption of a working machine implemented by three different hybrid systems (parallel, series and parallel-series) and with different component dimensions. The payback time of a hybrid working machine and the energy storage lifetime are also estimated in the study.

Energy-efficient control strategies for variable speed driven parallel pumping systems based on pump operation point monitoring with frequency converters

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The pumping processes requiring wide range of flow are often equipped with parallelconnected centrifugal pumps. In parallel pumping systems, the use of variable speed control allows that the required output for the process can be delivered with a varying number of operated pump units and selected rotational speed references. However, the optimization of the parallel-connected rotational speed controlled pump units often requires adaptive modelling of both parallel pump characteristics and the surrounding system in varying operation conditions. The available information required for the system modelling in typical parallel pumping applications such as waste water treatment and various cooling and water delivery pumping tasks can be limited, and the lack of real-time operation point monitoring often sets limits for accurate energy efficiency optimization. Hence, alternatives for easily implementable control strategies which can be adopted with minimum system data are necessary. This doctoral thesis concentrates on the methods that allow the energy efficient use of variable speed controlled parallel pumps in system scenarios in which the parallel pump units consist of a centrifugal pump, an electric motor, and a frequency converter. Firstly, the suitable operation conditions for variable speed controlled parallel pumps are studied. Secondly, methods for determining the output of each parallel pump unit using characteristic curve-based operation point estimation with frequency converter are discussed. Thirdly, the implementation of the control strategy based on real-time pump operation point estimation and sub-optimization of each parallel pump unit is studied. The findings of the thesis support the idea that the energy efficiency of the pumping can be increased without the installation of new, more efficient components in the systems by simply adopting suitable control strategies. An easily implementable and adaptive control strategy for variable speed controlled parallel pumping systems can be created by utilizing the pump operation point estimation available in modern frequency converters. Hence, additional real-time flow metering, start-up measurements, and detailed system model are unnecessary, and the pumping task can be fulfilled by determining a speed reference for each parallel-pump unit which suggests the energy efficient operation of the pumping system.

Virtual Runtime Application Partitions for Resource Management in Massively Parallel Architectures

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents a novel design paradigm, called Virtual Runtime Application Partitions (VRAP), to judiciously utilize the on-chip resources. As the dark silicon era approaches, where the power considerations will allow only a fraction chip to be powered on, judicious resource management will become a key consideration in future designs. Most of the works on resource management treat only the physical components (i.e. computation, communication, and memory blocks) as resources and manipulate the component to application mapping to optimize various parameters (e.g. energy efficiency). To further enhance the optimization potential, in addition to the physical resources we propose to manipulate abstract resources (i.e. voltage/frequency operating point, the fault-tolerance strength, the degree of parallelism, and the configuration architecture). The proposed framework (i.e. VRAP) encapsulates methods, algorithms, and hardware blocks to provide each application with the abstract resources tailored to its needs. To test the efficacy of this concept, we have developed three distinct self adaptive environments: (i) Private Operating Environment (POE), (ii) Private Reliability Environment (PRE), and (iii) Private Configuration Environment (PCE) that collectively ensure that each application meets its deadlines using minimal platform resources. In this work several novel architectural enhancements, algorithms and policies are presented to realize the virtual runtime application partitions efficiently. Considering the future design trends, we have chosen Coarse Grained Reconfigurable Architectures (CGRAs) and Network on Chips (NoCs) to test the feasibility of our approach. Specifically, we have chosen Dynamically Reconfigurable Resource Array (DRRA) and McNoC as the representative CGRA and NoC platforms. The proposed techniques are compared and evaluated using a variety of quantitative experiments. Synthesis and simulation results demonstrate VRAP significantly enhances the energy and power efficiency compared to state of the art.

Improving Short DNA Sequence Alignment with Parallel Computing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.

Nutrient cycling and nutrient use efficiency in urban and peri-urban agriculture of Kabul, Afghanistan

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Like elsewhere also in Kabul, Afghanistan urban and peri-urban agriculture (UPA) has often been accused of being resource inefficient and unsustainable causing negatives externalities to community health and to the surroundings. These arise from the inappropriate management and use of agricultural inputs, including often pesticides and inter-city wastes containing heavy metal residues and pathogens. To address these concerns, parallel studies with the aims of quantification of carbon (C), nitrogen (N), phosphorus (P) and potassium (K) horizontal and vertical fluxes; the assessment of heavy metal and pathogen contaminations of UPA produce, and an economic analysis of cereal, vegetable and grape production systems conducted for two years in UPA of Kabul from April 2008 to October 2009. The results of the studies from these three UPA diverse production systems can be abridged as follows: Biennial net balances in vegetable production systems were positive for N (80 kg ha-1 ), P (75 kg ha-1) and C (3,927 kg ha-1), negative for K (-205 kg ha-1), whereas in cereal production systems biennial horizontal balances were positive for P (20 kg ha-1 ) and C (4,900 kg ha-1) negative for N (-155 kg ha-1) and K (-355 kg ha-1) and in vineyards corresponding values were highly positive for N (295 kg ha-1), P (235 kg ha-1), C (3,362 kg ha-1) and slightly positive for K (5 kg ha-1). Regardless of N and C gaseous emissions, yearly leaching losses of N and P in selected vegetable gardens varied from 70 - 205 kg N ha-1 and 5 - 10 kg P ha-1. Manure and irrigation water contributed on average 12 - 79% to total Inputs of N, P, K and C, 10 - 53% to total inputs of C in the gardens and fields. The elevated levels of heavy metal and pathogen loads on fresh UPA vegetables reflected contamination from increasing traffic in the city, deposits of the past decades of war, lacking collection and treatment of raw inter-city wastes which call for solutions to protect consumer and producer health and increase reliability of UPA productions. A cost-revenue analysis of all inputs and outputs of cereal, vegetable and grapes production systems over two years showed substantial differences in net UPA household income. To confirm these results, more detailed studies are needed, but tailoring and managing the optimal application of inputs to crop needs will significantly enhance farmer’s better revenues as will as environmental and produce quality.

Scalability of efficient parallel K-Means

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.

Dynamic load balancing in parallel KD-tree k-means

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Parallel Hybrid Monte Carlo Algorithms for Matrix Computations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we consider hybrid (fast stochastic approximation and deterministic refinement) algorithms for Matrix Inversion (MI) and Solving Systems of Linear Equations (SLAE). Monte Carlo methods are used for the stochastic approximation, since it is known that they are very efficient in finding a quick rough approximation of the element or a row of the inverse matrix or finding a component of the solution vector. We show how the stochastic approximation of the MI can be combined with a deterministic refinement procedure to obtain MI with the required precision and further solve the SLAE using MI. We employ a splitting A = D – C of a given non-singular matrix A, where D is a diagonal dominant matrix and matrix C is a diagonal matrix. In our algorithm for solving SLAE and MI different choices of D can be considered in order to control the norm of matrix T = D –1C, of the resulting SLAE and to minimize the number of the Markov Chains required to reach given precision. Further we run the algorithms on a mini-Grid and investigate their efficiency depending on the granularity. Corresponding experimental results are presented.

A sparse parallel hybrid Monte Carlo algorithm for matrix computations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we introduce a new algorithm, based on the successful work of Fathi and Alexandrov, on hybrid Monte Carlo algorithms for matrix inversion and solving systems of linear algebraic equations. This algorithm consists of two parts, approximate inversion by Monte Carlo and iterative refinement using a deterministic method. Here we present a parallel hybrid Monte Carlo algorithm, which uses Monte Carlo to generate an approximate inverse and that improves the accuracy of the inverse with an iterative refinement. The new algorithm is applied efficiently to sparse non-singular matrices. When we are solving a system of linear algebraic equations, Bx = b, the inverse matrix is used to compute the solution vector x = B(-1)b. We present results that show the efficiency of the parallel hybrid Monte Carlo algorithm in the case of sparse matrices.

Computational complexity of weighted splitting schemes on parallel computers

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In models of complicated physical-chemical processes operator splitting is very often applied in order to achieve sufficient accuracy as well as efficiency of the numerical solution. The recently rediscovered weighted splitting schemes have the great advantage of being parallelizable on operator level, which allows us to reduce the computational time if parallel computers are used. In this paper, the computational times needed for the weighted splitting methods are studied in comparison with the sequential (S) splitting and the Marchuk-Strang (MSt) splitting and are illustrated by numerical experiments performed by use of simplified versions of the Danish Eulerian model (DEM).

Parallel implementation and one year experiments with the Danish Eulerian Model

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Large scale air pollution models are powerful tools, designed to meet the increasing demand in different environmental studies. The atmosphere is the most dynamic component of the environment, where the pollutants can be moved quickly on far distnce. Therefore the air pollution modeling must be done in a large computational domain. Moreover, all relevant physical, chemical and photochemical processes must be taken into account. In such complex models operator splitting is very often applied in order to achieve sufficient accuracy as well as efficiency of the numerical solution. The Danish Eulerian Model (DEM) is one of the most advanced such models. Its space domain (4800 × 4800 km) covers Europe, most of the Mediterian and neighboring parts of Asia and the Atlantic Ocean. Efficient parallelization is crucial for the performance and practical capabilities of this huge computational model. Different splitting schemes, based on the main processes mentioned above, have been implemented and tested with respect to accuracy and performance in the new version of DEM. Some numerical results of these experiments are presented in this paper.

«
1
2
3
4
5
6
7
8
...
66
67
»