84 resultados para Random Rooted Labeled Trees
em CentAUR: Central Archive University of Reading - UK
Resumo:
This paper investigates the impact of policies to promote the adoption of LEED-certified buildings across CBSA in the United States. Drawing upon a unique database that combines data from a large number of sources and using a number of regression procedures, the determinants of the proportion LEED-certified space for more than 170 CBSA in the US is modeled. LEED-certified space still accounts for a relatively small proportion of commercial stock in all markets. The average proportion is less than 1%. There is no conclusive evidence of a positive impact of policy intervention on the levels of LEED-certified space. However, after accounting for bias introduced by non-random assignment of policies, we find preliminary evidence of a positive impact of city-level green building incentives. There is a significant positive association between market size and indicators of economic vitality on proportions of LEED-certified space.
Resumo:
Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting.
Resumo:
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
Resumo:
Ensemble learning can be used to increase the overall classification accuracy of a classifier by generating multiple base classifiers and combining their classification results. A frequently used family of base classifiers for ensemble learning are decision trees. However, alternative approaches can potentially be used, such as the Prism family of algorithms that also induces classification rules. Compared with decision trees, Prism algorithms generate modular classification rules that cannot necessarily be represented in the form of a decision tree. Prism algorithms produce a similar classification accuracy compared with decision trees. However, in some cases, for example, if there is noise in the training and test data, Prism algorithms can outperform decision trees by achieving a higher classification accuracy. However, Prism still tends to overfit on noisy data; hence, ensemble learners have been adopted in this work to reduce the overfitting. This paper describes the development of an ensemble learner using a member of the Prism family as the base classifier to reduce the overfitting of Prism algorithms on noisy datasets. The developed ensemble classifier is compared with a stand-alone Prism classifier in terms of classification accuracy and resistance to noise.
Resumo:
The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.
Resumo:
Long-term monitoring of forest soils as part of a pan-European network to detect environmental change depends on an accurate determination of the mean of the soil properties at each monitoring event. Forest soil is known to be very variable spatially, however. A study was undertaken to explore and quantify this variability at three forest monitoring plots in Britain. Detailed soil sampling was carried out, and the data from the chemical analyses were analysed by classical statistics and geostatistics. An analysis of variance showed that there were no consistent effects from the sample sites in relation to the position of the trees. The variogram analysis showed that there was spatial dependence at each site for several variables and some varied in an apparently periodic way. An optimal sampling analysis based on the multivariate variogram for each site suggested that a bulked sample from 36 cores would reduce error to an acceptable level. Future sampling should be designed so that it neither targets nor avoids trees and disturbed ground. This can be achieved best by using a stratified random sampling design.
Resumo:
The effect of presubmergence and green manuring on various processes involved in [N-15]-urea transformations were studied in a growth chamber after [N-15]-urea application to floodwater. Presubmergence for 14 days increased urea hydrolysis rates and floodwater pH, resulting in higher NH3 volatilization as compared to without presubmergence. Presubmergence also increased nitrification and subsequent denitrification but lower N assimilation by floodwater algae caused higher gaseous losses. Addition of green manure maintained higher NH4+-N concentration in floodwater mainly because of lower nitrification rates but resulted in highest NH3 volatilization losses. Although green manure did not affect the KCl extractable NH4+-N from applied fertilizer, it maintained higher NH4+-N content due to its decomposition and increased mineralization of organic N. After 32 days about 36.9% (T-1), 23.9% (T-2), and 36.4% (T-3) of the applied urea N was incorporated in the pool of soil organic N in treatments. It was evident that the presubmergence has effected the recovery of applied urea N.
Resumo:
[1] Cloud cover is conventionally estimated from satellite images as the observed fraction of cloudy pixels. Active instruments such as radar and Lidar observe in narrow transects that sample only a small percentage of the area over which the cloud fraction is estimated. As a consequence, the fraction estimate has an associated sampling uncertainty, which usually remains unspecified. This paper extends a Bayesian method of cloud fraction estimation, which also provides an analytical estimate of the sampling error. This method is applied to test the sensitivity of this error to sampling characteristics, such as the number of observed transects and the variability of the underlying cloud field. The dependence of the uncertainty on these characteristics is investigated using synthetic data simulated to have properties closely resembling observations of the spaceborne Lidar NASA-LITE mission. Results suggest that the variance of the cloud fraction is greatest for medium cloud cover and least when conditions are mostly cloudy or clear. However, there is a bias in the estimation, which is greatest around 25% and 75% cloud cover. The sampling uncertainty is also affected by the mean lengths of clouds and of clear intervals; shorter lengths decrease uncertainty, primarily because there are more cloud observations in a transect of a given length. Uncertainty also falls with increasing number of transects. Therefore a sampling strategy aimed at minimizing the uncertainty in transect derived cloud fraction will have to take into account both the cloud and clear sky length distributions as well as the cloud fraction of the observed field. These conclusions have implications for the design of future satellite missions. This paper describes the first integrated methodology for the analytical assessment of sampling uncertainty in cloud fraction observations from forthcoming spaceborne radar and Lidar missions such as NASA's Calipso and CloudSat.