53 resultados para modular crystallizer
Resumo:
Induction of classification rules is one of the most important technologies in data mining. Most of the work in this field has concentrated on the Top Down Induction of Decision Trees (TDIDT) approach. However, alternative approaches have been developed such as the Prism algorithm for inducing modular rules. Prism often produces qualitatively better rules than TDIDT but suffers from higher computational requirements. We investigate approaches that have been developed to minimize the computational requirements of TDIDT, in order to find analogous approaches that could reduce the computational requirements of Prism.
Resumo:
Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.
Resumo:
The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.
Resumo:
The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.
Resumo:
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.
Resumo:
This study investigates the child (L1) acquisition of properties at the interfaces of morpho-syntax, syntax-semantics and syntax-pragmatics, by focusing on inflected infinitives in European Portuguese (EP). Three child groups were tested, 6–7-year-olds, 9–10-year-olds and 11–12-year-olds, as well as an adult control group. The data demonstrate that children as young as 6 have knowledge of the morpho-syntactic properties of inflected infinitives, although they seem at first glance to show partially insufficient knowledge of their syntax–semantic interface properties (i.e. non-obligatory control properties), differently from children aged 9 and older, who show clearer evidence of knowledge of both types of properties. However, in general, both morpho-syntactic and syntax–semantics interface properties are also accessible to 6–7-year-old children, although these children give preference to a range of interpretations partially different from the adults; in certain cases, they may not appeal to certain pragmatic inferences that permit additional interpretations to adults and older children. Crucially, our data demonstrate that EP children master the two types of properties of inflected infinitives years before Brazilian Portuguese children do (Pires and Rothman, 2009a and Pires and Rothman, 2009b), reasons for and implications of which we discuss in detail.
Resumo:
The use of three orthogonally tagged phosphine reagents to assist chemical work-up via phase-switch scavenging in conjunction with a modular flow reactor is described. These techniques (acidic, basic and Click chemistry) are used to prepare various amides and tri-substituted guanidines from in situ generated iminophosphoranes.
Resumo:
The cycloaddition of acetylenes with azides to give the corresponding 1,4-disubstituted 1,2,3-triazoles is reported using immobilised reagents and scavengers in pre-packed glass tubes in a modular flow reactor.
Resumo:
The use of a mesofluidic flow reactor is described for performing Curtius rearrangement reactions of carboxylic acids in the presence of diphenylphosphoryl azide and trapping of the intermediate isocyanates with various nucleophiles.
Resumo:
A scalable method for the preparation of 4,5-disubstituted thiazoles and imidazoles as distinct regioisomeric products using a modular flow microreactor has been devised. The process makes use of microfluidic reaction chips and packed immobilized-reagent columns to effect bifurcation of the reaction pathway.
Resumo:
The Environmental Data Abstraction Library provides a modular data management library for bringing new and diverse datatypes together for visualisation within numerous software packages, including the ncWMS viewing service, which already has very wide international uptake. The structure of EDAL is presented along with examples of its use to compare satellite, model and in situ data types within the same visualisation framework. We emphasize the value of this capability for cross calibration of datasets and evaluation of model products against observations, including preparation for data assimilation.
Resumo:
This study investigates the child (L1) acquisition of properties at the interfaces of morphosyntax, syntax-semantics and syntax-pragmatics, by focusing on inflected infinitives in European Portuguese (EP). Three child groups were tested, 6–7-year-olds, 9–10-year-olds and 11–12-year-olds, as well as an adult control group. The data demonstrate that children as young as 6 have knowledge of the morpho-syntactic properties of inflected infinitives, although they seem at first glance to show partially insufficient knowledge of their syntax–semantic interface properties (i.e. non-obligatory control properties), differently from children aged 9 and older, who show clearer evidence of knowledge of both types of properties. However, in general, both morpho-syntactic and syntax–semantics interface properties are also accessible to 6–7-year-old children, although these children give preference to a range of interpretations partially different from the adults; in certain cases, they may not appeal to certain pragmatic inferences that permit additional interpretations to adults and older children. Crucially, our data demonstrate that EP children master the two types of properties of inflected infinitives years before Brazilian Portuguese children do (Pires and Rothman, 2009a,b), reasons for and implications of which we discuss in detail.
Resumo:
This paper presents the model SCOPE (Soil Canopy Observation, Photochemistry and Energy fluxes), which is a vertical (1-D) integrated radiative transfer and energy balance model. The model links visible to thermal infrared radiance spectra (0.4 to 50 μm) as observed above the canopy to the fluxes of water, heat and carbon dioxide, as a function of vegetation structure, and the vertical profiles of temperature. Output of the model is the spectrum of outgoing radiation in the viewing direction and the turbulent heat fluxes, photosynthesis and chlorophyll fluorescence. A special routine is dedicated to the calculation of photosynthesis rate and chlorophyll fluorescence at the leaf level as a function of net radiation and leaf temperature. The fluorescence contributions from individual leaves are integrated over the canopy layer to calculate top-of-canopy fluorescence. The calculation of radiative transfer and the energy balance is fully integrated, allowing for feedback between leaf temperatures, leaf chlorophyll fluorescence and radiative fluxes. Leaf temperatures are calculated on the basis of energy balance closure. Model simulations were evaluated against observations reported in the literature and against data collected during field campaigns. These evaluations showed that SCOPE is able to reproduce realistic radiance spectra, directional radiance and energy balance fluxes. The model may be applied for the design of algorithms for the retrieval of evapotranspiration from optical and thermal earth observation data, for validation of existing methods to monitor vegetation functioning, to help interpret canopy fluorescence measurements, and to study the relationships between synoptic observations with diurnally integrated quantities. The model has been implemented in Matlab and has a modular design, thus allowing for great flexibility and scalability.
Resumo:
The Phosphorus Indicators Tool provides a catchment-scale estimation of diffuse phosphorus (P) loss from agricultural land to surface waters using the most appropriate indicators of P loss. The Tool provides a framework that may be applied across the UK to estimate P loss, which is sensitive not only to land use and management but also to environmental factors such as climate, soil type and topography. The model complexity incorporated in the P Indicators Tool has been adapted to the level of detail in the available data and the need to reflect the impact of changes in agriculture. Currently, the Tool runs on an annual timestep and at a 1 km(2) grid scale. We demonstrate that the P Indicators Tool works in principle and that its modular structure provides a means of accounting for P loss from one layer to the next, and ultimately to receiving waters. Trial runs of the Tool suggest that modelled P delivery to water approximates measured water quality records. The transparency of the structure of the P Indicators Tool means that identification of poorly performing coefficients is possible, and further refinements of the Tool can be made to ensure it is better calibrated and subsequently validated against empirical data, as it becomes available.
Resumo:
We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.