86 resultados para Trace Rule


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing size of databases, many existing rule learning algorithms have proved to be computational expensive on large datasets. To tackle the problem of scalability, parallel classification rule induction algorithms have been introduced. As TDIDT is the most popular classifier, even though there are strongly competitive alternative algorithms, most parallel approaches to inducing classification rules are based on TDIDT. In this paper we describe work on a distributed classifier that induces classification rules in a parallel manner based on Prism.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Induction of classification rules is one of the most important technologies in data mining. Most of the work in this field has concentrated on the Top Down Induction of Decision Trees (TDIDT) approach. However, alternative approaches have been developed such as the Prism algorithm for inducing modular rules. Prism often produces qualitatively better rules than TDIDT but suffers from higher computational requirements. We investigate approaches that have been developed to minimize the computational requirements of TDIDT, in order to find analogous approaches that could reduce the computational requirements of Prism.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analysis of observed ozone profiles in Northern Hemisphere low and middle latitudes reveals the seasonal persistence of ozone anomalies in both the lower and upper stratosphere. Principal component analysis is used to detect that above 16 hPa the persistence is strongest in the latitude band 15–45°N, while below 16 hPa the strongest persistence is found over 45–60°N. In both cases, ozone anomalies persist through the entire year from November to October. The persistence of ozone anomalies in the lower stratosphere is presumably related to the wintertime ozone buildup with subsequent photochemical relaxation through summer, as previously found for total ozone. The persistence in the upper stratosphere is more surprising, given the short lifetime of Ox at these altitudes. It is hypothesized that this “seasonal memory” in the upper stratospheric ozone anomalies arises from the seasonal persistence of transport-induced wintertime NOy anomalies, which then perturb the ozone chemistry throughout the rest of the year. This hypothesis is confirmed by analysis of observations of NO2, NOx, and various long-lived trace gases in the upper stratosphere, which are found to exhibit the same seasonal persistence. Previous studies have attributed much of the year-to-year variability in wintertime extratropical upper stratospheric ozone to the Quasi-Biennial Oscillation (QBO) through transport-induced NOy (and hence NO2) anomalies but have not identified any statistical connection between the QBO and summertime ozone variability. Our results imply that through this “seasonal memory,” the QBO has an asynchronous effect on ozone in the low to midlatitude upper stratosphere during summer and early autumn.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Simulations of the stratosphere from thirteen coupled chemistry-climate models (CCMs) are evaluated to provide guidance for the interpretation of ozone predictions made by the same CCMs. The focus of the evaluation is on how well the fields and processes that are important for determining the ozone distribution are represented in the simulations of the recent past. The core period of the evaluation is from 1980 to 1999 but long-term trends are compared for an extended period (1960–2004). Comparisons of polar high-latitude temperatures show that most CCMs have only small biases in the Northern Hemisphere in winter and spring, but still have cold biases in the Southern Hemisphere spring below 10 hPa. Most CCMs display the correct stratospheric response of polar temperatures to wave forcing in the Northern, but not in the Southern Hemisphere. Global long-term stratospheric temperature trends are in reasonable agreement with satellite and radiosonde observations. Comparisons of simulations of methane, mean age of air, and propagation of the annual cycle in water vapor show a wide spread in the results, indicating differences in transport. However, for around half the models there is reasonable agreement with observations. In these models the mean age of air and the water vapor tape recorder signal are generally better than reported in previous model intercomparisons. Comparisons of the water vapor and inorganic chlorine (Cly) fields also show a large intermodel spread. Differences in tropical water vapor mixing ratios in the lower stratosphere are primarily related to biases in the simulated tropical tropopause temperatures and not transport. The spread in Cly, which is largest in the polar lower stratosphere, appears to be primarily related to transport differences. In general the amplitude and phase of the annual cycle in total ozone is well simulated apart from the southern high latitudes. Most CCMs show reasonable agreement with observed total ozone trends and variability on a global scale, but a greater spread in the ozone trends in polar regions in spring, especially in the Arctic. In conclusion, despite the wide range of skills in representing different processes assessed here, there is sufficient agreement between the majority of the CCMs and the observations that some confidence can be placed in their predictions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we study convergence of the L2-projection onto the space of polynomials up to degree p on a simplex in Rd, d >= 2. Optimal error estimates are established in the case of Sobolev regularity and illustrated on several numerical examples. The proof is based on the collapsed coordinate transform and the expansion into various polynomial bases involving Jacobi polynomials and their antiderivatives. The results of the present paper generalize corresponding estimates for cubes in Rd from [P. Houston, C. Schwab, E. Süli, Discontinuous hp-finite element methods for advection-diffusion-reaction problems. SIAM J. Numer. Anal. 39 (2002), no. 6, 2133-2163].

Relevância:

20.00% 20.00%

Publicador:

Resumo:

During SPURT (Spurenstofftransport in der Tropopausenregion, trace gas transport in the tropopause region) we performed measurements of a wide range of trace gases with different lifetimes and sink/source characteristics in the northern hemispheric upper troposphere (UT) and lowermost stratosphere (LMS). A large number of in-situ instruments were deployed on board a Learjet 35A, flying at altitudes up to 13.7 km, at times reaching to nearly 380 K potential temperature. Eight measurement campaigns (consisting of a total of 36 flights), distributed over all seasons and typically covering latitudes between 35° N and 75° N in the European longitude sector (10° W–20° E), were performed. Here we present an overview of the project, describing the instrumentation, the encountered meteorological situations during the campaigns and the data set available from SPURT. Measurements were obtained for N2O, CH4, CO, CO2, CFC12, H2, SF6, NO, NOy, O3 and H2O. We illustrate the strength of this new data set by showing mean distributions of the mixing ratios of selected trace gases, using a potential temperature-equivalent latitude coordinate system. The observations reveal that the LMS is most stratospheric in character during spring, with the highest mixing ratios of O3 and NOy and the lowest mixing ratios of N2O and SF6. The lowest mixing ratios of NOy and O3 are observed during autumn, together with the highest mixing ratios of N2O and SF6 indicating a strong tropospheric influence. For H2O, however, the maximum concentrations in the LMS are found during summer, suggesting unique (temperature- and convection-controlled) conditions for this molecule during transport across the tropopause. The SPURT data set is presently the most accurate and complete data set for many trace species in the LMS, and its main value is the simultaneous measurement of a suite of trace gases having different lifetimes and physical-chemical histories. It is thus very well suited for studies of atmospheric transport, for model validation, and for investigations of seasonal changes in the UT/LMS, as demonstrated in accompanying and elsewhere published studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trace element contamination is one of the main problems linked to the quality of compost, especially when it is produced from urban wastes, which can lead to high levels of some potentially toxic elements such as Cu, Pb or Zn. In this work, the distribution and bioavailability of five elements (Cu, Zn, Pb, Cr and Ni) were studied in five Spanish composts obtained from different feedstocks (municipal solid waste, garden trimmings, sewage sludge and mixed manure). The five composts showed high total concentrations of these elements, which in some cases limited their commercialization due to legal imperatives. First, a physical fractionation of the composts was performed, and the five elements were determined in each size fraction. Their availability was assessed by several methods of extraction (water, CaCl2–DTPA, the PBET extract, the TCLP extract, and sodium pyrophosphate), and their chemical distribution was assessed using the BCR sequential extraction procedure. The results showed that the finer fractions were enriched with the elements studied, and that Cu, Pb and Zn were the most potentially problematic ones, due to both their high total concentrations and availability. The partition into the BCR fractions was different for each element, but the differences between composts were scarce. Pb was evenly distributed among the four fractions defined in the BCR (soluble, oxidizable, reducible and residual); Cu was mainly found in the oxidizable fraction, linked to organic matter, and Zn was mainly associated to the reducible fraction (iron oxides), while Ni and Cr were mainly present almost exclusively in the residual fraction. It was not possible to establish a univocal relation between trace elements availability and their BCR fractionation. Given the differences existing for the availability and distribution of these elements, which not always were related to their total concentrations, we think that legal limits should consider availability, in order to achieve a more realistic assessment of the risks linked to compost use.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A process-based fire regime model (SPITFIRE) has been developed, coupled with ecosystem dynamics in the LPJ Dynamic Global Vegetation Model, and used to explore fire regimes and the current impact of fire on the terrestrial carbon cycle and associated emissions of trace atmospheric constituents. The model estimates an average release of 2.24 Pg C yr−1 as CO2 from biomass burning during the 1980s and 1990s. Comparison with observed active fire counts shows that the model reproduces where fire occurs and can mimic broad geographic patterns in the peak fire season, although the predicted peak is 1–2 months late in some regions. Modelled fire season length is generally overestimated by about one month, but shows a realistic pattern of differences among biomes. Comparisons with remotely sensed burnt-area products indicate that the model reproduces broad geographic patterns of annual fractional burnt area over most regions, including the boreal forest, although interannual variability in the boreal zone is underestimated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper considers the use of Association Rule Mining (ARM) and our proposed Transaction based Rule Change Mining (TRCM) to identify the rule types present in tweet’s hashtags over a specific consecutive period of time and their linkage to real life occurrences. Our novel algorithm was termed TRCM-RTI in reference to Rule Type Identification. We created Time Frame Windows (TFWs) to detect evolvement statuses and calculate the lifespan of hashtags in online tweets. We link RTI to real life events by monitoring and recording rule evolvement patterns in TFWs on the Twitter network.