986 resultados para Unified Parallel C
Resumo:
In this paper we introduce a new algorithm, based on the successful work of Fathi and Alexandrov, on hybrid Monte Carlo algorithms for matrix inversion and solving systems of linear algebraic equations. This algorithm consists of two parts, approximate inversion by Monte Carlo and iterative refinement using a deterministic method. Here we present a parallel hybrid Monte Carlo algorithm, which uses Monte Carlo to generate an approximate inverse and that improves the accuracy of the inverse with an iterative refinement. The new algorithm is applied efficiently to sparse non-singular matrices. When we are solving a system of linear algebraic equations, Bx = b, the inverse matrix is used to compute the solution vector x = B(-1)b. We present results that show the efficiency of the parallel hybrid Monte Carlo algorithm in the case of sparse matrices.
Resumo:
A unified approach is proposed for data modelling that includes supervised regression and classification applications as well as unsupervised probability density function estimation. The orthogonal-least-squares regression based on the leave-one-out test criteria is formulated within this unified data-modelling framework to construct sparse kernel models that generalise well. Examples from regression, classification and density estimation applications are used to illustrate the effectiveness of this generic data-modelling approach for constructing parsimonious kernel models with excellent generalisation capability. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
We propose a unified data modeling approach that is equally applicable to supervised regression and classification applications, as well as to unsupervised probability density function estimation. A particle swarm optimization (PSO) aided orthogonal forward regression (OFR) algorithm based on leave-one-out (LOO) criteria is developed to construct parsimonious radial basis function (RBF) networks with tunable nodes. Each stage of the construction process determines the center vector and diagonal covariance matrix of one RBF node by minimizing the LOO statistics. For regression applications, the LOO criterion is chosen to be the LOO mean square error, while the LOO misclassification rate is adopted in two-class classification applications. By adopting the Parzen window estimate as the desired response, the unsupervised density estimation problem is transformed into a constrained regression problem. This PSO aided OFR algorithm for tunable-node RBF networks is capable of constructing very parsimonious RBF models that generalize well, and our analysis and experimental results demonstrate that the algorithm is computationally even simpler than the efficient regularization assisted orthogonal least square algorithm based on LOO criteria for selecting fixed-node RBF models. Another significant advantage of the proposed learning procedure is that it does not have learning hyperparameters that have to be tuned using costly cross validation. The effectiveness of the proposed PSO aided OFR construction procedure is illustrated using several examples taken from regression and classification, as well as density estimation applications.
Resumo:
A connection between a fuzzy neural network model with the mixture of experts network (MEN) modelling approach is established. Based on this linkage, two new neuro-fuzzy MEN construction algorithms are proposed to overcome the curse of dimensionality that is inherent in the majority of associative memory networks and/or other rule based systems. The first construction algorithm employs a function selection manager module in an MEN system. The second construction algorithm is based on a new parallel learning algorithm in which each model rule is trained independently, for which the parameter convergence property of the new learning method is established. As with the first approach, an expert selection criterion is utilised in this algorithm. These two construction methods are equivalent in their effectiveness in overcoming the curse of dimensionality by reducing the dimensionality of the regression vector, but the latter has the additional computational advantage of parallel processing. The proposed algorithms are analysed for effectiveness followed by numerical examples to illustrate their efficacy for some difficult data based modelling problems.
Resumo:
We describe the HadGEM2 family of climate configurations of the Met Office Unified Model, MetUM. The concept of a model "family" comprises a range of specific model configurations incorporating different levels of complexity but with a common physical framework. The HadGEM2 family of configurations includes atmosphere and ocean components, with and without a vertical extension to include a well-resolved stratosphere, and an Earth-System (ES) component which includes dynamic vegetation, ocean biology and atmospheric chemistry. The HadGEM2 physical model includes improvements designed to address specific systematic errors encountered in the previous climate configuration, HadGEM1, namely Northern Hemisphere continental temperature biases and tropical sea surface temperature biases and poor variability. Targeting these biases was crucial in order that the ES configuration could represent important biogeochemical climate feedbacks. Detailed descriptions and evaluations of particular HadGEM2 family members are included in a number of other publications, and the discussion here is limited to a summary of the overall performance using a set of model metrics which compare the way in which the various configurations simulate present-day climate and its variability.
Resumo:
The real-time parallel computation of histograms using an array of pipelined cells is proposed and prototyped in this paper with application to consumer imaging products. The array operates in two modes: histogram computation and histogram reading. The proposed parallel computation method does not use any memory blocks. The resulting histogram bins can be stored into an external memory block in a pipelined fashion for subsequent reading or streaming of the results. The array of cells can be tuned to accommodate the required data path width in a VLSI image processing engine as present in many imaging consumer devices. Synthesis of the architectures presented in this paper in FPGA are shown to compute the real-time histogram of images streamed at over 36 megapixels at 30 frames/s by processing in parallel 1, 2 or 4 pixels per clock cycle.
Resumo:
Currently, most operational forecasting models use latitude-longitude grids, whose convergence of meridians towards the poles limits parallel scaling. Quasi-uniform grids might avoid this limitation. Thuburn et al, JCP, 2009 and Ringler et al, JCP, 2010 have developed a method for arbitrarily-structured, orthogonal C-grids (TRiSK), which has many of the desirable properties of the C-grid on latitude-longitude grids but which works on a variety of quasi-uniform grids. Here, five quasi-uniform, orthogonal grids of the sphere are investigated using TRiSK to solve the shallow-water equations. We demonstrate some of the advantages and disadvantages of the hexagonal and triangular icosahedra, a Voronoi-ised cubed sphere, a Voronoi-ised skipped latitude-longitude grid and a grid of kites in comparison to a full latitude-longitude grid. We will show that the hexagonal-icosahedron gives the most accurate results (for least computational cost). All of the grids suffer from spurious computational modes; this is especially true of the kite grid, despite it having exactly twice as many velocity degrees of freedom as height degrees of freedom. However, the computational modes are easiest to control on the hexagonal icosahedron since they consist of vorticity oscillations on the dual grid which can be controlled using a diffusive advection scheme for potential vorticity.
Resumo:
Both the (5,3) counter and (2,2,3) counter multiplication techniques are investigated for the efficiency of their operation speed and the viability of the architectures when implemented in a fast bipolar ECL technology. The implementation of the counters in series-gated ECL and threshold logic are contrasted for speed, noise immunity and complexity, and are critically compared with the fastest practical design of a full-adder. A novel circuit technique to overcome the problems of needing high fan-in input weights in threshold circuits through the use of negative weighted inputs is presented. The authors conclude that a (2,2,3) counter based array multiplier implemented in series-gated ECL should enable a significant increase in speed over conventional full adder based array multipliers.
Resumo:
The authors compare various array multiplier architectures based on (p,q) counter circuits. The tradeoff in multiplier design is always between adding complexity and increasing speed. It is shown that by using a (2,2,3) counter cell it is possible to gain a significant increase in speed over a conventional full-adder, carry-save array based approach. The increase in complexity should be easily accommodated using modern emitter-coupled-logic processes.
Resumo:
With many operational centers moving toward order 1-km-gridlength models for routine weather forecasting, this paper presents a systematic investigation of the properties of high-resolution versions of the Met Office Unified Model for short-range forecasting of convective rainfall events. The authors describe a suite of configurations of the Met Office Unified Model running with grid lengths of 12, 4, and 1 km and analyze results from these models for a number of convective cases from the summers of 2003, 2004, and 2005. The analysis includes subjective evaluation of the rainfall fields and comparisons of rainfall amounts, initiation, cell statistics, and a scale-selective verification technique. It is shown that the 4- and 1-km-gridlength models often give more realistic-looking precipitation fields because convection is represented explicitly rather than parameterized. However, the 4-km model representation suffers from large convective cells and delayed initiation because the grid length is too long to correctly reproduce the convection explicitly. These problems are not as evident in the 1-km model, although it does suffer from too numerous small cells in some situations. Both the 4- and 1-km models suffer from poor representation at the start of the forecast in the period when the high-resolution detail is spinning up from the lower-resolution (12 km) starting data used. A scale-selective precipitation verification technique implies that for later times in the forecasts (after the spinup period) the 1-km model performs better than the 12- and 4-km models for lower rainfall thresholds. For higher thresholds the 4-km model scores almost as well as the 1-km model, and both do better than the 12-km model.
Resumo:
The formulation and performance of the Met Office visibility analysis and prediction system are described. The visibility diagnostic within the limited-area Unified Model is a function of humidity and a prognostic aerosol content. The aerosol model includes advection, industrial and general urban sources, plus boundary-layer mixing and removal by rain. The assimilation is a 3-dimensional variational scheme in which the visibility observation operator is a very nonlinear function of humidity, aerosol and temperature. A quality control scheme for visibility data is included. Visibility observations can give rise to humidity increments of significant magnitude compared with the direct impact of humidity observations. We present the results of sensitivity studies which show the contribution of different components of the system to improved skill in visibility forecasts. Visibility assimilation is most important within the first 6-12 hours of the forecast and for visibilities below 1 km, while modelling of aerosol sources and advection is important for slightly higher visibilities (1-5 km) and is still significant at longer forecast times
Resumo:
This work proposes a unified neurofuzzy modelling scheme. To begin with, the initial fuzzy base construction method is based on fuzzy clustering utilising a Gaussian mixture model (GMM) combined with the analysis of covariance (ANOVA) decomposition in order to obtain more compact univariate and bivariate membership functions over the subspaces of the input features. The mean and covariance of the Gaussian membership functions are found by the expectation maximisation (EM) algorithm with the merit of revealing the underlying density distribution of system inputs. The resultant set of membership functions forms the basis of the generalised fuzzy model (GFM) inference engine. The model structure and parameters of this neurofuzzy model are identified via the supervised subspace orthogonal least square (OLS) learning. Finally, instead of providing deterministic class label as model output by convention, a logistic regression model is applied to present the classifier’s output, in which the sigmoid type of logistic transfer function scales the outputs of the neurofuzzy model to the class probability. Experimental validation results are presented to demonstrate the effectiveness of the proposed neurofuzzy modelling scheme.
Resumo:
A parallel formulation of an algorithm for the histogram computation of n data items using an on-the-fly data decomposition and a novel quantum-like representation (QR) is developed. The QR transformation separates multiple data read operations from multiple bin update operations thereby making it easier to bind data items into their corresponding histogram bins. Under this model the steps required to compute the histogram is n/s + t steps, where s is a speedup factor and t is associated with pipeline latency. Here, we show that an overall speedup factor, s, is available for up to an eightfold acceleration. Our evaluation also shows that each one of these cells requires less area/time complexity compared to similar proposals found in the literature.