54 resultados para Distributed data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A system for continuous data assimilation is presented and discussed. To simulate the dynamical development a channel version of a balanced barotropic model is used and geopotential (height) data are assimilated into the models computations as data become available. In the first experiment the updating is performed every 24th, 12th and 6th hours with a given network. The stations are distributed at random in 4 groups in order to simulate 4 areas with different density of stations. Optimum interpolation is performed for the difference between the forecast and the valid observations. The RMS-error of the analyses is reduced in time, and the error being smaller the more frequent the updating is performed. The updating every 6th hour yields an error in the analysis less than the RMS-error of the observation. In a second experiment the updating is performed by data from a moving satellite with a side-scan capability of about 15°. If the satellite data are analysed at every time step before they are introduced into the system the error of the analysis is reduced to a value below the RMS-error of the observation already after 24 hours and yields as a whole a better result than updating from a fixed network. If the satellite data are introduced without any modification the error of the analysis is reduced much slower and it takes about 4 days to reach a comparable result to the one where the data have been analysed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the introduction of new observing systems based on asynoptic observations, the analysis problem has changed in character. In the near future we may expect that a considerable part of meteorological observations will be unevenly distributed in four dimensions, i.e. three dimensions in space and one in time. The term analysis, or objective analysis in meteorology, means the process of interpolating observed meteorological observations from unevenly distributed locations to a network of regularly spaced grid points. Necessitated by the requirement of numerical weather prediction models to solve the governing finite difference equations on such a grid lattice, the objective analysis is a three-dimensional (or mostly two-dimensional) interpolation technique. As a consequence of the structure of the conventional synoptic network with separated data-sparse and data-dense areas, four-dimensional analysis has in fact been intensively used for many years. Weather services have thus based their analysis not only on synoptic data at the time of the analysis and climatology, but also on the fields predicted from the previous observation hour and valid at the time of the analysis. The inclusion of the time dimension in objective analysis will be called four-dimensional data assimilation. From one point of view it seems possible to apply the conventional technique on the new data sources by simply reducing the time interval in the analysis-forecasting cycle. This could in fact be justified also for the conventional observations. We have a fairly good coverage of surface observations 8 times a day and several upper air stations are making radiosonde and radiowind observations 4 times a day. If we have a 3-hour step in the analysis-forecasting cycle instead of 12 hours, which is applied most often, we may without any difficulties treat all observations as synoptic. No observation would thus be more than 90 minutes off time and the observations even during strong transient motion would fall within a horizontal mesh of 500 km * 500 km.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bayesian analysis is given of an instrumental variable model that allows for heteroscedasticity in both the structural equation and the instrument equation. Specifically, the approach for dealing with heteroscedastic errors in Geweke (1993) is extended to the Bayesian instrumental variable estimator outlined in Rossi et al. (2005). Heteroscedasticity is treated by modelling the variance for each error using a hierarchical prior that is Gamma distributed. The computation is carried out by using a Markov chain Monte Carlo sampling algorithm with an augmented draw for the heteroscedastic case. An example using real data illustrates the approach and shows that ignoring heteroscedasticity in the instrument equation when it exists may lead to biased estimates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The susceptibility of a catchment to flooding is affected by its soil moisture prior to an extreme rainfall event. While soil moisture is routinely observed by satellite instruments, results from previous work on the assimilation of remotely sensed soil moisture into hydrologic models have been mixed. This may have been due in part to the low spatial resolution of the observations used. In this study, the remote sensing aspects of a project attempting to improve flow predictions from a distributed hydrologic model by assimilating soil moisture measurements are described. Advanced Synthetic Aperture Radar (ASAR) Wide Swath data were used to measure soil moisture as, unlike low resolution microwave data, they have sufficient resolution to allow soil moisture variations due to local topography to be detected, which may help to take into account the spatial heterogeneity of hydrological processes. Surface soil moisture content (SSMC) was measured over the catchments of the Severn and Avon rivers in the South West UK. To reduce the influence of vegetation, measurements were made only over homogeneous pixels of improved grassland determined from a land cover map. Radar backscatter was corrected for terrain variations and normalized to a common incidence angle. SSMC was calculated using change detection. To search for evidence of a topographic signal, the mean SSMC from improved grassland pixels on low slopes near rivers was compared to that on higher slopes. When the mean SSMC on low slopes was 30–90%, the higher slopes were slightly drier than the low slopes. The effect was reversed for lower SSMC values. It was also more pronounced during a drying event. These findings contribute to the scant information in the literature on the use of high resolution SAR soil moisture measurement to improve hydrologic models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we develop an energy-efficient resource-allocation scheme with proportional fairness for downlink multiuser orthogonal frequency-division multiplexing (OFDM) systems with distributed antennas. Our aim is to maximize energy efficiency (EE) under the constraints of the overall transmit power of each remote access unit (RAU), proportional fairness data rates, and bit error rates (BERs). Because of the nonconvex nature of the optimization problem, obtaining the optimal solution is extremely computationally complex. Therefore, we develop a low-complexity suboptimal algorithm, which separates subcarrier allocation and power allocation. For the low-complexity algorithm, we first allocate subcarriers by assuming equal power distribution. Then, by exploiting the properties of fractional programming, we transform the nonconvex optimization problem in fractional form into an equivalent optimization problem in subtractive form, which includes a tractable solution. Next, an optimal energy-efficient power-allocation algorithm is developed to maximize EE while maintaining proportional fairness. Through computer simulation, we demonstrate the effectiveness of the proposed low-complexity algorithm and illustrate the fundamental trade off between energy and spectral-efficient transmission designs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present one of the first studies of the use of Distributed Temperature Sensing (DTS) along fibre-optic cables to purposely monitor spatial and temporal variations in ground surface temperature (GST) and soil temperature, and provide an estimate of the heat flux at the base of the canopy layer and in the soil. Our field site was at a groundwater-fed wet meadow in the Netherlands covered by a canopy layer (between 0-0.5 m thickness) consisting of grass and sedges. At this site, we ran a single cable across the surface in parallel 40 m sections spaced by 2 m, to create a 40×40 m monitoring field for GST. We also buried a short length (≈10 m) of cable to depth of 0.1±0.02 m to measure soil temperature. We monitored the temperature along the entire cable continuously over a two-day period and captured the diurnal course of GST, and how it was affected by rainfall and canopy structure. The diurnal GST range, as observed by the DTS system, varied between 20.94 and 35.08◦C; precipitation events acted to suppress the range of GST. The spatial distribution of GST correlated with canopy vegetation height during both day and night. Using estimates of thermal inertia, combined with a harmonic analysis of GST and soil temperature, substrate and soil-heat fluxes were determined. Our observations demonstrate how the use of DTS shows great promise in better characterising area-average substrate/soil heat flux, their spatiotemporal variability, and how this variability is affected by canopy structure. The DTS system is able to provide a much richer data set than could be obtained from point temperature sensors. Furthermore, substrate heat fluxes derived from GST measurements may be able to provide improved closure of the land surface energy balance in micrometeorological field studies. This will enhance our understanding of how hydrometeorological processes interact with near-surface heat fluxes.