110 resultados para ensemble classifiers
em CentAUR: Central Archive University of Reading - UK
Resumo:
Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting.
Resumo:
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
Resumo:
The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.
Resumo:
The prediction of extratropical cyclones by the European Centre for Medium Range Weather Forecasts (ECMWF) and the National Centers for Environmental Prediction (NCEP) Ensemble Prediction Systems (EPS) has been investigated using an objective feature tracking methodology to identify and track the cyclones along the forecast trajectories. Overall the results show that the ECMWF EPS has a slightly higher level of skill than the NCEP EPS in the northern hemisphere (NH). However in the southern hemisphere (SH), NCEP has higher predictive skill than ECMWF for the intensity of the cyclones. The results from both EPS indicate a higher level of predictive skill for the position of extratropical cyclones than their intensity and show that there is a larger spread in intensity than position. Further analysis shows that the predicted propagation speed of cyclones is generally too slow for the ECMWF EPS and show a slight bias for the intensity of the cyclones to be overpredicted. This is also true for the NCEP EPS in the SH. For the NCEP EPS in the NH the intensity of the cyclones is underpredicted. There is small bias in both the EPS for the cyclones to be displaced towards the poles. For each ensemble forecast of each cyclone, the predictive skill of the ensemble member that best predicts the cyclones position and intensity was computed. The results are very encouraging showing that the predictive skill of the best ensemble member is significantly higher than that of the control forecast in terms of both the position and intensity of the cyclones. The prediction of cyclones before they are identified as 850 hPa vorticity centers in the analysis cycle was also considered. It is shown that an indication of extratropical cyclones can be given by at least 1 ensemble member 7 days before they are identified in the analysis. Further analysis of the ECMWF EPS shows that the ensemble mean has a higher level of skill than the control forecast, particularly for the intensity of the cyclones, 2 from day 3 of the forecast. There is a higher level of skill in the NH than the SH and the spread in the SH is correspondingly larger. The difference between the ensemble mean and spread is very small for the position of the cyclones, but the spread of the ensemble is smaller than the ensemble mean error for the intensity of the cyclones in both hemispheres. Results also show that the ECMWF control forecast has ½ to 1 day more skill than the perturbed members, for both the position and intensity of the cyclones, throughout the forecast.
Resumo:
The prediction of extratropical cyclones by the European Centre for Medium Range Weather Forecasts (ECMWF) and the National Centers for Environmental Prediction (NCEP) Ensemble Prediction Systems (EPS) is investigated using a storm-tracking forecast verifica-tion methodology. The cyclones are identified and tracked along the forecast trajectories so that statistics can be generated to determine the rate at which the position and intensity of the forecasted cyclones diverge from the corresponding analysed cyclones with forecast time. Overall the ECMWF EPS has a slightly higher level of performance than the NCEP EPS. However, in the southern hemisphere the NCEP EPS has a slightly higher level of skill for the intensity of the storms. The results from both EPS indicate a higher level of predictive skill for the position of extratropical cyclones than their intensity and show that there is a larger spread in intensity than position. The results also illustrate several benefits an EPS can offer over a deterministic forecast.
Resumo:
Aerosols from anthropogenic and natural sources have been recognized as having an important impact on the climate system. However, the small size of aerosol particles (ranging from 0.01 to more than 10 μm in diameter) and their influence on solar and terrestrial radiation makes them difficult to represent within the coarse resolution of general circulation models (GCMs) such that small-scale processes, for example, sulfate formation and conversion, need parameterizing. It is the parameterization of emissions, conversion, and deposition and the radiative effects of aerosol particles that causes uncertainty in their representation within GCMs. The aim of this study was to perturb aspects of a sulfur cycle scheme used within a GCM to represent the climatological impacts of sulfate aerosol derived from natural and anthropogenic sulfur sources. It was found that perturbing volcanic SO2 emissions and the scavenging rate of SO2 by precipitation had the largest influence on the sulfate burden. When these parameters were perturbed the sulfate burden ranged from 0.73 to 1.17 TgS for 2050 sulfur emissions (A2 Special Report on Emissions Scenarios (SRES)), comparable with the range in sulfate burden across all the Intergovernmental Panel on Climate Change SRESs. Thus, the results here suggest that the range in sulfate burden due to model uncertainty is comparable with scenario uncertainty. Despite the large range in sulfate burden there was little influence on the climate sensitivity, which had a range of less than 0.5 K across the ensemble. We hypothesize that this small effect was partly associated with high sulfate loadings in the control phase of the experiment.
Resumo:
Resumo:
A regional study of the prediction of extratropical cyclones by the European Centre for Medium-Range Weather Forecasts (ECMWF) Ensemble Prediction System (EPS) has been performed. An objective feature-tracking method has been used to identify and track the cyclones along the forecast trajectories. Forecast error statistics have then been produced for the position, intensity, and propagation speed of the storms. In previous work, data limitations meant it was only possible to present the diagnostics for the entire Northern Hemisphere (NH) or Southern Hemisphere. A larger data sample has allowed the diagnostics to be computed separately for smaller regions around the globe and has made it possible to explore the regional differences in the prediction of storms by the EPS. Results show that in the NH there is a larger ensemble mean error in the position of storms over the Atlantic Ocean. Further analysis revealed that this is mainly due to errors in the prediction of storm propagation speed rather than in direction. Forecast storms propagate too slowly in all regions, but the bias is about 2 times as large in the NH Atlantic region. The results show that storm intensity is generally overpredicted over the ocean and underpredicted over the land and that the absolute error in intensity is larger over the ocean than over the land. In the NH, large errors occur in the prediction of the intensity of storms that originate as tropical cyclones but then move into the extratropics. The ensemble is underdispersive for the intensity of cyclones (i.e., the spread is smaller than the mean error) in all regions. The spatial patterns of the ensemble mean error and ensemble spread are very different for the intensity of cyclones. Spatial distributions of the ensemble mean error suggest that large errors occur during the growth phase of storm development, but this is not indicated by the spatial distributions of the ensemble spread. In the NH there are further differences. First, the large errors in the prediction of the intensity of cyclones that originate in the tropics are not indicated by the spread. Second, the ensemble mean error is larger over the Pacific Ocean than over the Atlantic, whereas the opposite is true for the spread. The use of a storm-tracking approach, to both weather forecasters and developers of forecast systems, is also discussed.