269 resultados para Distributed Algorithm
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.
Resumo:
A new model has been developed for assessing multiple sources of nitrogen in catchments. The model (INCA) is process based and uses reaction kinetic equations to simulate the principal mechanisms operating. The model allows for plant uptake, surface and sub-surface pathways and can simulate up to six land uses simultaneously. The model can be applied to catchment as a semi-distributed simulation and has an inbuilt multi-reach structure for river systems. Sources of nitrogen can be from atmospheric deposition, from the terrestrial environment (e.g. agriculture, leakage from forest systems etc.), from urban areas or from direct discharges via sewage or intensive farm units. The model is a daily simulation model and can provide information in the form of time series at key sites, or as profiles down river systems or as statistical distributions. The process model is described and in a companion paper the model is applied to the River Tywi catchment in South Wales and the Great Ouse in Bedfordshire.
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
Climate data are used in a number of applications including climate risk management and adaptation to climate change. However, the availability of climate data, particularly throughout rural Africa, is very limited. Available weather stations are unevenly distributed and mainly located along main roads in cities and towns. This imposes severe limitations to the availability of climate information and services for the rural community where, arguably, these services are needed most. Weather station data also suffer from gaps in the time series. Satellite proxies, particularly satellite rainfall estimate, have been used as alternatives because of their availability even over remote parts of the world. However, satellite rainfall estimates also suffer from a number of critical shortcomings that include heterogeneous time series, short time period of observation, and poor accuracy particularly at higher temporal and spatial resolutions. An attempt is made here to alleviate these problems by combining station measurements with the complete spatial coverage of satellite rainfall estimates. Rain gauge observations are merged with a locally calibrated version of the TAMSAT satellite rainfall estimates to produce over 30-years (1983-todate) of rainfall estimates over Ethiopia at a spatial resolution of 10 km and a ten-daily time scale. This involves quality control of rain gauge data, generating locally calibrated version of the TAMSAT rainfall estimates, and combining these with rain gauge observations from national station network. The infrared-only satellite rainfall estimates produced using a relatively simple TAMSAT algorithm performed as good as or even better than other satellite rainfall products that use passive microwave inputs and more sophisticated algorithms. There is no substantial difference between the gridded-gauge and combined gauge-satellite products over the test area in Ethiopia having a dense station network; however, the combined product exhibits better quality over parts of the country where stations are sparsely distributed.
Resumo:
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.
Resumo:
Distributed generation plays a key role in reducing CO2 emissions and losses in transmission of power. However, due to the nature of renewable resources, distributed generation requires suitable control strategies to assure reliability and optimality for the grid. Multi-agent systems are perfect candidates for providing distributed control of distributed generation stations as well as providing reliability and flexibility for the grid integration. The proposed multi-agent energy management system consists of single-type agents who control one or more gird entities, which are represented as generic sub-agent elements. The agent applies one control algorithm across all elements and uses a cost function to evaluate the suitability of the element as a supplier. The behavior set by the agent's user defines which parameters of an element have greater weight in the cost function, which allows the user to specify the preference on suppliers dynamically. This study shows the ability of the multi-agent energy management system to select suppliers according to the selection behavior given by the user. The optimality of the supplier for the required demand is ensured by the cost function based on the parameters of the element.
Resumo:
Reinforcing the Low Voltage (LV) distribution network will become essential to ensure it remains within its operating constraints as demand on the network increases. The deployment of energy storage in the distribution network provides an alternative to conventional reinforcement. This paper presents a control methodology for energy storage to reduce peak demand in a distribution network based on day-ahead demand forecasts and historical demand data. The control methodology pre-processes the forecast data prior to a planning phase to build in resilience to the inevitable errors between the forecasted and actual demand. The algorithm uses no real time adjustment so has an economical advantage over traditional storage control algorithms. Results show that peak demand on a single phase of a feeder can be reduced even when there are differences between the forecasted and the actual demand. In particular, results are presented that demonstrate when the algorithm is applied to a large number of single phase demand aggregations that it is possible to identify which of these aggregations are the most suitable candidates for the control methodology.
Resumo:
Unorganized traffic is a generalized form of travel wherein vehicles do not adhere to any predefined lanes and can travel in-between lanes. Such travel is visible in a number of countries e.g. India, wherein it enables a higher traffic bandwidth, more overtaking and more efficient travel. These advantages are visible when the vehicles vary considerably in size and speed, in the absence of which the predefined lanes are near-optimal. Motion planning for multiple autonomous vehicles in unorganized traffic deals with deciding on the manner in which every vehicle travels, ensuring no collision either with each other or with static obstacles. In this paper the notion of predefined lanes is generalized to model unorganized travel for the purpose of planning vehicles travel. A uniform cost search is used for finding the optimal motion strategy of a vehicle, amidst the known travel plans of the other vehicles. The aim is to maximize the separation between the vehicles and static obstacles. The search is responsible for defining an optimal lane distribution among vehicles in the planning scenario. Clothoid curves are used for maintaining a lane or changing lanes. Experiments are performed by simulation over a set of challenging scenarios with a complex grid of obstacles. Additionally behaviours of overtaking, waiting for a vehicle to cross and following another vehicle are exhibited.
Resumo:
Flash floods pose a significant danger for life and property. Unfortunately, in arid and semiarid environment the runoff generation shows a complex non-linear behavior with a strong spatial and temporal non-uniformity. As a result, the predictions made by physically-based simulations in semiarid areas are subject to great uncertainty, and a failure in the predictive behavior of existing models is common. Thus better descriptions of physical processes at the watershed scale need to be incorporated into the hydrological model structures. For example, terrain relief has been systematically considered static in flood modelling at the watershed scale. Here, we show that the integrated effect of small distributed relief variations originated through concurrent hydrological processes within a storm event was significant on the watershed scale hydrograph. We model these observations by introducing dynamic formulations of two relief-related parameters at diverse scales: maximum depression storage, and roughness coefficient in channels. In the final (a posteriori) model structure these parameters are allowed to be both time-constant or time-varying. The case under study is a convective storm in a semiarid Mediterranean watershed with ephemeral channels and high agricultural pressures (the Rambla del Albujón watershed; 556 km 2 ), which showed a complex multi-peak response. First, to obtain quasi-sensible simulations in the (a priori) model with time-constant relief-related parameters, a spatially distributed parameterization was strictly required. Second, a generalized likelihood uncertainty estimation (GLUE) inference applied to the improved model structure, and conditioned to observed nested hydrographs, showed that accounting for dynamic relief-related parameters led to improved simulations. The discussion is finally broadened by considering the use of the calibrated model both to analyze the sensitivity of the watershed to storm motion and to attempt the flood forecasting of a stratiform event with highly different behavior.
Resumo:
The equations of Milsom are evaluated, giving the ground range and group delay of radio waves propagated via the horizontally stratified model ionosphere proposed by Bradley and Dudeney. Expressions for the ground range which allow for the effects of the underlying E- and F1-regions are used to evaluate the basic maximum usable frequency or M-factors for single F-layer hops. An algorithm for the rapid calculation of the M-factor at a given range is developed, and shown to be accurate to within 5%. The results reveal that the M(3000)F2-factor scaled from vertical-incidence ionograms using the standard URSI procedure can be up to 7.5% in error. A simple addition to the algorithm effects a correction to ionogram values to make these accurate to 0.5%.
Resumo:
Highly heterogeneous mountain snow distributions strongly affect soil moisture patterns; local ecology; and, ultimately, the timing, magnitude, and chemistry of stream runoff. Capturing these vital heterogeneities in a physically based distributed snow model requires appropriately scaled model structures. This work looks at how model scale—particularly the resolutions at which the forcing processes are represented—affects simulated snow distributions and melt. The research area is in the Reynolds Creek Experimental Watershed in southwestern Idaho. In this region, where there is a negative correlation between snow accumulation and melt rates, overall scale degradation pushed simulated melt to earlier in the season. The processes mainly responsible for snow distribution heterogeneity in this region—wind speed, wind-affected snow accumulations, thermal radiation, and solar radiation—were also independently rescaled to test process-specific spatiotemporal sensitivities. It was found that in order to accurately simulate snowmelt in this catchment, the snow cover needed to be resolved to 100 m. Wind and wind-affected precipitation—the primary influence on snow distribution—required similar resolution. Thermal radiation scaled with the vegetation structure (~100 m), while solar radiation was adequately modeled with 100–250-m resolution. Spatiotemporal sensitivities to model scale were found that allowed for further reductions in computational costs through the winter months with limited losses in accuracy. It was also shown that these modeling-based scale breaks could be associated with physiographic and vegetation structures to aid a priori modeling decisions.
Resumo:
A parallel formulation for the simulation of a branch prediction algorithm is presented. This parallel formulation identifies independent tasks in the algorithm which can be executed concurrently. The parallel implementation is based on the multithreading model and two parallel programming platforms: pthreads and Cilk++. Improvement in execution performance by up to 7 times is observed for a generic 2-bit predictor in a 12-core multiprocessor system.
Resumo:
This paper describes a fast integer sorting algorithm, herein referred as Bit-index sort, which is a non-comparison sorting algorithm for partial per-mutations, with linear complexity order in execution time. Bit-index sort uses a bit-array to classify input sequences of distinct integers, and exploits built-in bit functions in C compilers supported by machine hardware to retrieve the ordered output sequence. Results show that Bit-index sort outperforms in execution time to quicksort and counting sort algorithms. A parallel approach for Bit-index sort using two simultaneous threads is included, which obtains speedups up to 1.6.