61 resultados para Modular Addition
Resumo:
Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.
Resumo:
The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.
Resumo:
The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.
Resumo:
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.
Resumo:
This study investigates the child (L1) acquisition of properties at the interfaces of morpho-syntax, syntax-semantics and syntax-pragmatics, by focusing on inflected infinitives in European Portuguese (EP). Three child groups were tested, 6–7-year-olds, 9–10-year-olds and 11–12-year-olds, as well as an adult control group. The data demonstrate that children as young as 6 have knowledge of the morpho-syntactic properties of inflected infinitives, although they seem at first glance to show partially insufficient knowledge of their syntax–semantic interface properties (i.e. non-obligatory control properties), differently from children aged 9 and older, who show clearer evidence of knowledge of both types of properties. However, in general, both morpho-syntactic and syntax–semantics interface properties are also accessible to 6–7-year-old children, although these children give preference to a range of interpretations partially different from the adults; in certain cases, they may not appeal to certain pragmatic inferences that permit additional interpretations to adults and older children. Crucially, our data demonstrate that EP children master the two types of properties of inflected infinitives years before Brazilian Portuguese children do (Pires and Rothman, 2009a and Pires and Rothman, 2009b), reasons for and implications of which we discuss in detail.
Resumo:
Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.
The Joint UK Land Environment Simulator (JULES), model description – part 1: energy and water fluxes
Resumo:
This manuscript describes the energy and water components of a new community land surface model called the Joint UK Land Environment Simulator (JULES). This is developed from the Met Office Surface Exchange Scheme (MOSES). It can be used as a stand alone land surface model driven by observed forcing data, or coupled to an atmospheric global circulation model. The JULES model has been coupled to the Met Office Unified Model (UM) and as such provides a unique opportunity for the research community to contribute their research to improve both world-leading operational weather forecasting and climate change prediction systems. In addition JULES, and its forerunner MOSES, have been the basis for a number of very high-profile papers concerning the land-surface and climate over the last decade. JULES has a modular structure aligned to physical processes, providing the basis for a flexible modelling platform.
Resumo:
Grazing systems represent a substantial percentage of the global anthropogenic flux of nitrous oxide (N2O) as a result of nitrogen addition to the soil. The pool of available carbon that is added to the soil from livestock excreta also provides substrate for the production of carbon dioxide (CO2) and methane (CH4) by soil microorganisms. A study into the production and emission of CO2, CH4 and N2O from cattle urine amended pasture was carried out on the Somerset Levels and Moors, UK over a three-month period. Urine-amended plots (50 g N m−2) were compared to control plots to which only water (12 mg N m−2) was applied. CO2 emission peaked at 5200 mg CO2 m−2 d−1 directly after application. CH4 flux decreased to −2000 μg CH4 m−2 d−1 two days after application; however, net CH4 flux was positive from urine treated plots and negative from control plots. N2O emission peaked at 88 mg N2O m−2 d−1 12 days after application. Subsurface CH4 and N2O concentrations were higher in the urine treated plots than the controls. There was no effect of treatment on subsurface CO2 concentrations. Subsurface N2O peaked at 500 ppm 12 days after and 1200 ppm 56 days after application. Subsurface NO3− concentration peaked at approximately 300 mg N kg dry soil−1 12 days after application. Results indicate that denitrification is the key driver for N2O release in peatlands and that this production is strongly related to rainfall events and water-table movement. N2O production at depth continued long after emissions were detected at the surface. Further understanding of the interaction between subsurface gas concentrations, surface emissions and soil hydrological conditions is required to successfully predict greenhouse gas production and emission.
Resumo:
The use of three orthogonally tagged phosphine reagents to assist chemical work-up via phase-switch scavenging in conjunction with a modular flow reactor is described. These techniques (acidic, basic and Click chemistry) are used to prepare various amides and tri-substituted guanidines from in situ generated iminophosphoranes.
Resumo:
The cycloaddition of acetylenes with azides to give the corresponding 1,4-disubstituted 1,2,3-triazoles is reported using immobilised reagents and scavengers in pre-packed glass tubes in a modular flow reactor.
Resumo:
The use of a mesofluidic flow reactor is described for performing Curtius rearrangement reactions of carboxylic acids in the presence of diphenylphosphoryl azide and trapping of the intermediate isocyanates with various nucleophiles.
Resumo:
A scalable method for the preparation of 4,5-disubstituted thiazoles and imidazoles as distinct regioisomeric products using a modular flow microreactor has been devised. The process makes use of microfluidic reaction chips and packed immobilized-reagent columns to effect bifurcation of the reaction pathway.
Resumo:
The Environmental Data Abstraction Library provides a modular data management library for bringing new and diverse datatypes together for visualisation within numerous software packages, including the ncWMS viewing service, which already has very wide international uptake. The structure of EDAL is presented along with examples of its use to compare satellite, model and in situ data types within the same visualisation framework. We emphasize the value of this capability for cross calibration of datasets and evaluation of model products against observations, including preparation for data assimilation.
Resumo:
In vitro, the addition of lipids to a carbohydrate food has been found to increase the digestibility of starch. In contrast, in vivo studies have shown that the addition of fat to a food can reduce the glycaemic response (GR). The aim of this study was to assess if delayed gastric emptying (GE) causes reduced GR with the addition of lipids to a carbohydrate food and if a relationship between GR and in vitro digestion of starch exists for high fat foods. Ten healthy volunteers were tested on five occasions after consuming pancakes containing 50 g of available carbohydrate and 202 kcal of sunflower oil, olive oil, butter, medium chain triglyceride (MCT) oil or a control containing no oil. GR was measured using fingerpick blood samples, satiety using visual analogue scales and GE using the 13C octanoic acid breath test. There was a significant difference in GR between the different pancake breakfasts (p = 0.05). The highest GR was observed following the control pancakes and the lowest following the olive oil pancakes. There were significant differences in GE half time, lag phase and ascension time (p < 0.05) between the different pancakes with the control pancakes having the shortest GE time and the MCT pancakes the longest. There was a significant difference in satiety parameters fullness (p = 0.003) and prospective consumption (p = 0.050), with satiety being lowest following the control pancakes. There was a significant inverse correlation between the GR and all satiety parameters. A significant inverse correlation (p = 0.009) was also observed between the digestibility of starch in vitro and GR in vivo. The paper indicates that the digestibility of starch in vitro does not predict the GR for high fat containing foods
Resumo:
This study investigates the child (L1) acquisition of properties at the interfaces of morphosyntax, syntax-semantics and syntax-pragmatics, by focusing on inflected infinitives in European Portuguese (EP). Three child groups were tested, 6–7-year-olds, 9–10-year-olds and 11–12-year-olds, as well as an adult control group. The data demonstrate that children as young as 6 have knowledge of the morpho-syntactic properties of inflected infinitives, although they seem at first glance to show partially insufficient knowledge of their syntax–semantic interface properties (i.e. non-obligatory control properties), differently from children aged 9 and older, who show clearer evidence of knowledge of both types of properties. However, in general, both morpho-syntactic and syntax–semantics interface properties are also accessible to 6–7-year-old children, although these children give preference to a range of interpretations partially different from the adults; in certain cases, they may not appeal to certain pragmatic inferences that permit additional interpretations to adults and older children. Crucially, our data demonstrate that EP children master the two types of properties of inflected infinitives years before Brazilian Portuguese children do (Pires and Rothman, 2009a,b), reasons for and implications of which we discuss in detail.