22 resultados para Computationally efficient

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A multiple factor parametrization is described to permit the efficient calculation of collision efficiency (E) between electrically charged aerosol particles and neutral cloud droplets in numerical models of cloud and climate. The four-parameter representation summarizes the results obtained from a detailed microphysical model of E, which accounts for the different forces acting on the aerosol in the path of falling cloud droplets. The parametrization's range of validity is for aerosol particle radii of 0.4 to 10 mu m, aerosol particle densities of I to 2.0 g cm(-3), aerosol particle charges from neutral to 100 elementary charges and drop radii from 18.55 to 142 mu m. The parametrization yields values of E well within an order of magnitude of the detailed model's values, from a dataset of 3978 E values. Of these values 95% have modelled to parametrized ratios between 0.5 and 1.5 for aerosol particle sizes ranging between 0.4 and 2.0 mu m, and about 96% in the second size range. This parametrization speeds up the calculation of E by a factor of similar to 10(3) compared with the original microphysical model, permitting the inclusion of electric charge effects in numerical cloud and climate models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing size of databases, many existing rule learning algorithms have proved to be computational expensive on large datasets. To tackle the problem of scalability, parallel classification rule induction algorithms have been introduced. As TDIDT is the most popular classifier, even though there are strongly competitive alternative algorithms, most parallel approaches to inducing classification rules are based on TDIDT. In this paper we describe work on a distributed classifier that induces classification rules in a parallel manner based on Prism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Induction of classification rules is one of the most important technologies in data mining. Most of the work in this field has concentrated on the Top Down Induction of Decision Trees (TDIDT) approach. However, alternative approaches have been developed such as the Prism algorithm for inducing modular rules. Prism often produces qualitatively better rules than TDIDT but suffers from higher computational requirements. We investigate approaches that have been developed to minimize the computational requirements of TDIDT, in order to find analogous approaches that could reduce the computational requirements of Prism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper investigates the challenge of representing structural differences in river channel cross-section geometry for regional to global scale river hydraulic models and the effect this can have on simulations of wave dynamics. Classically, channel geometry is defined using data, yet at larger scales the necessary information and model structures do not exist to take this approach. We therefore propose a fundamentally different approach where the structural uncertainty in channel geometry is represented using a simple parameterization, which could then be estimated through calibration or data assimilation. This paper first outlines the development of a computationally efficient numerical scheme to represent generalised channel shapes using a single parameter, which is then validated using a simple straight channel test case and shown to predict wetted perimeter to within 2% for the channels tested. An application to the River Severn, UK is also presented, along with an analysis of model sensitivity to channel shape, depth and friction. The channel shape parameter was shown to improve model simulations of river level, particularly for more physically plausible channel roughness and depth parameter ranges. Calibrating channel Manning’s coefficient in a rectangular channel provided similar water level simulation accuracy in terms of Nash-Sutcliffe efficiency to a model where friction and shape or depth were calibrated. However, the calibrated Manning coefficient in the rectangular channel model was ~2/3 greater than the likely physically realistic value for this reach and this erroneously slowed wave propagation times through the reach by several hours. Therefore, for large scale models applied in data sparse areas, calibrating channel depth and/or shape may be preferable to assuming a rectangular geometry and calibrating friction alone.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper new robust nonlinear model construction algorithms for a large class of linear-in-the-parameters models are introduced to enhance model robustness, including three algorithms using combined A- or D-optimality or PRESS statistic (Predicted REsidual Sum of Squares) with regularised orthogonal least squares algorithm respectively. A common characteristic of these algorithms is that the inherent computation efficiency associated with the orthogonalisation scheme in orthogonal least squares or regularised orthogonal least squares has been extended such that the new algorithms are computationally efficient. A numerical example is included to demonstrate effectiveness of the algorithms. Copyright (C) 2003 IFAC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An orthogonal forward selection (OFS) algorithm based on the leave-one-out (LOO) criterion is proposed for the construction of radial basis function (RBF) networks with tunable nodes. This OFS-LOO algorithm is computationally efficient and is capable of identifying parsimonious RBF networks that generalise well. Moreover, the proposed algorithm is fully automatic and the user does not need to specify a termination criterion for the construction process.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An orthogonal forward selection (OFS) algorithm based on leave-one-out (LOO) criteria is proposed for the construction of radial basis function (RBF) networks with tunable nodes. Each stage of the construction process determines an RBF node, namely, its center vector and diagonal covariance matrix, by minimizing the LOO statistics. For regression application, the LOO criterion is chosen to be the LOO mean-square error, while the LOO misclassification rate is adopted in two-class classification application. This OFS-LOO algorithm is computationally efficient, and it is capable of constructing parsimonious RBF networks that generalize well. Moreover, the proposed algorithm is fully automatic, and the user does not need to specify a termination criterion for the construction process. The effectiveness of the proposed RBF network construction procedure is demonstrated using examples taken from both regression and classification applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a simple yet computationally efficient construction algorithm for two-class kernel classifiers. In order to optimise classifier's generalisation capability, an orthogonal forward selection procedure is used to select kernels one by one by minimising the leave-one-out (LOO) misclassification rate directly. It is shown that the computation of the LOO misclassification rate is very efficient owing to orthogonalisation. Examples are used to demonstrate that the proposed algorithm is a viable alternative to construct sparse two-class kernel classifiers in terms of performance and computational efficiency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a simple and computationally efficient construction algorithm for two class linear-in-the-parameters classifiers. In order to optimize model generalization, a forward orthogonal selection (OFS) procedure is used for minimizing the leave-one-out (LOO) misclassification rate directly. An analytic formula and a set of forward recursive updating formula of the LOO misclassification rate are developed and applied in the proposed algorithm. Numerical examples are used to demonstrate that the proposed algorithm is an excellent alternative approach to construct sparse two class classifiers in terms of performance and computational efficiency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this correspondence new robust nonlinear model construction algorithms for a large class of linear-in-the-parameters models are introduced to enhance model robustness via combined parameter regularization and new robust structural selective criteria. In parallel to parameter regularization, we use two classes of robust model selection criteria based on either experimental design criteria that optimizes model adequacy, or the predicted residual sums of squares (PRESS) statistic that optimizes model generalization capability, respectively. Three robust identification algorithms are introduced, i.e., combined A- and D-optimality with regularized orthogonal least squares algorithm, respectively; and combined PRESS statistic with regularized orthogonal least squares algorithm. A common characteristic of these algorithms is that the inherent computation efficiency associated with the orthogonalization scheme in orthogonal least squares or regularized orthogonal least squares has been extended such that the new algorithms are computationally efficient. Numerical examples are included to demonstrate effectiveness of the algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Two-dimensional flood inundation modelling is a widely used tool to aid flood risk management. In urban areas, the model spatial resolution required to represent flows through a typical street network often results in an impractical computational cost at the city scale. This paper presents the calibration and evaluation of a recently developed formulation of the LISFLOOD-FP model, which is more computationally efficient at these resolutions. Aerial photography was available for model evaluation on 3 days from the 24 to the 31 of July. The new formulation was benchmarked against the original version of the model at 20 and 40 m resolutions, demonstrating equally accurate simulation, given the evaluation data but at a 67 times faster computation time. The July event was then simulated at the 2 m resolution of the available airborne LiDAR DEM. This resulted in more accurate simulation of the floodplain drying dynamics compared with the coarse resolution models, although maximum inundation levels were simulated equally well at all resolutions tested.