35 resultados para modular belt
Resumo:
In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Resumo:
The Distributed Rule Induction (DRI) project at the University of Portsmouth is concerned with distributed data mining algorithms for automatically generating rules of all kinds. In this paper we present a system architecture and its implementation for inducing modular classification rules in parallel in a local area network using a distributed blackboard system. We present initial results of a prototype implementation based on the Prism algorithm.
Resumo:
Induction of classification rules is one of the most important technologies in data mining. Most of the work in this field has concentrated on the Top Down Induction of Decision Trees (TDIDT) approach. However, alternative approaches have been developed such as the Prism algorithm for inducing modular rules. Prism often produces qualitatively better rules than TDIDT but suffers from higher computational requirements. We investigate approaches that have been developed to minimize the computational requirements of TDIDT, in order to find analogous approaches that could reduce the computational requirements of Prism.
Resumo:
Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.
Resumo:
The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.
Resumo:
The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.
Resumo:
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.
Resumo:
This study investigates the child (L1) acquisition of properties at the interfaces of morpho-syntax, syntax-semantics and syntax-pragmatics, by focusing on inflected infinitives in European Portuguese (EP). Three child groups were tested, 6–7-year-olds, 9–10-year-olds and 11–12-year-olds, as well as an adult control group. The data demonstrate that children as young as 6 have knowledge of the morpho-syntactic properties of inflected infinitives, although they seem at first glance to show partially insufficient knowledge of their syntax–semantic interface properties (i.e. non-obligatory control properties), differently from children aged 9 and older, who show clearer evidence of knowledge of both types of properties. However, in general, both morpho-syntactic and syntax–semantics interface properties are also accessible to 6–7-year-old children, although these children give preference to a range of interpretations partially different from the adults; in certain cases, they may not appeal to certain pragmatic inferences that permit additional interpretations to adults and older children. Crucially, our data demonstrate that EP children master the two types of properties of inflected infinitives years before Brazilian Portuguese children do (Pires and Rothman, 2009a and Pires and Rothman, 2009b), reasons for and implications of which we discuss in detail.
Resumo:
The warm conveyor belt (WCB) of an extratropical cyclone generally splits into two branches. One branch (WCB1) turns anticyclonically into the downstream upper-level tropospheric ridge, while the second branch (WCB2) wraps cyclonically around the cyclone centre. Here, the WCB split in a typical North Atlantic cold-season cyclone is analysed using two numerical models: the Met Office Unified Model and the COSMO model. The WCB flow is defined using off-line trajectory analysis. The two models represent the WCB split consistently. The split occurs early in the evolution of the WCB with WCB1 experiencing maximum ascent at lower latitudes and with higher moisture content than WCB2. WCB1 ascends abruptly along the cold front where the resolved ascent rates are greatest and there is also line convection. In contrast, WCB2 remains at lower levels for longer before undergoing saturated large-scale ascent over the system's warm front. The greater moisture in WCB1 inflow results in greater net potential temperature change from latent heat release, which determines the final isentropic level of each branch. WCB1 also exhibits lower outflow potential vorticity values than WCB2. Complementary diagnostics in the two models are utilised to study the influence of individual diabatic processes on the WCB. Total diabatic heating rates along the WCB branches are comparable in the two models with microphysical processes in the large-scale cloud schemes being the major contributor to this heating. However, the different convective parameterisation schemes used by the models cause significantly different contributions to the total heating. These results have implications for studies on the influence of the WCB outflow in Rossby wave evolution and breaking. Key aspects are the net potential temperature change and the isentropic level of the outflow which together will influence the relative mass going into each WCB branch and the associated negative PV anomalies at the tropopause-level flow.
Resumo:
Strong winds equatorwards and rearwards of a cyclone core have often been associated with two phenomena, the cold conveyor belt (CCB) jet and sting jets. Here, detailed observations of the mesoscale structure in this region of an intense cyclone are analysed. The {\it in-situ} and dropsonde observations were obtained during two research flights through the cyclone during the DIAMET (DIAbatic influences on Mesoscale structures in ExTratropical storms) field campaign. A numerical weather prediction model is used to link the strong wind regions with three types of ``air streams'', or coherent ensembles of trajectories: two types are identified with the CCB, hooking around the cyclone center, while the third is identified with a sting jet, descending from the cloud head to the west of the cyclone. Chemical tracer observations show for the first time that the CCB and sting jet air streams are distinct air masses even when the associated low-level wind maxima are not spatially distinct. In the model, the CCB experiences slow latent heating through weak resolved ascent and convection, while the sting jet experiences weak cooling associated with microphysics during its subsaturated descent. Diagnosis of mesoscale instabilities in the model shows that the CCB passes through largely stable regions, while the sting jet spends relatively long periods in locations characterized by conditional symmetric instability (CSI). The relation of CSI to the observed mesoscale structure of the bent-back front and its possible role in the cloud banding is discussed.
Resumo:
We analyse the widely-used international/ Zürich sunspot number record, R, with a view to quantifying a suspected calibration discontinuity around 1945 (which has been termed the “Waldmeier discontinuity” [Svalgaard, 2011]). We compare R against the composite sunspot group data from the Royal Greenwich Observatory (RGO) network and the Solar Optical Observing Network (SOON), using both the number of sunspot groups, N{sub}G{\sub}, and the total area of the sunspots, A{sub}G{\sub}. In addition, we compare R with the recently developed interdiurnal variability geomagnetic indices IDV and IDV(1d). In all four cases, linearity of the relationship with R is not assumed and care is taken to ensure that the relationship of each with R is the same before and after the putative calibration change. It is shown the probability that a correction is not needed is of order 10{sup}−8{\sup} and that R is indeed too low before 1945. The optimum correction to R for values before 1945 is found to be 11.6%, 11.7%, 10.3% and 7.9% using A{sub}G{\sub}, N{sub)G{\sub}, IDV, and IDV(1d), respectively. The optimum value obtained by combining the sunspot group data is 11.6% with an uncertainty range 8.1-14.8% at the 2σ level. The geomagnetic indices provide an independent yet less stringent test but do give values that fall within the 2σ uncertainty band with optimum values are slightly lower than from the sunspot group data. The probability of the correction needed being as large as 20%, as advocated by Svalgaard [2011], is shown to be 1.6 × 10{sup}−5{\sup}.
Resumo:
We investigate the relationship between interdiurnal variation geomagnetic activity indices, IDV and IDV(1d), corrected sunspot number, R{sub}C{\sub}, and the group sunspot number R{sub}G{\sub}. R{sub}C{\sub} uses corrections for both the “Waldmeier discontinuity”, as derived in Paper 1 [Lockwood et al., 2014c], and the “Wolf discontinuity” revealed by Leussu et al. [2013]. We show that the simple correlation of the geomagnetic indices with R{sub}C{\sub}{sup}n{\sup} or R{sub}G{\sub}{sup}n{\sup} masks a considerable solar cycle variation. Using IDV(1d) or IDV to predict or evaluate the sunspot numbers, the errors are almost halved by allowing for the fact that the relationship varies over the solar cycle. The results indicate that differences between R{sub}C{\sub} and R{sub}G{\sub} have a variety of causes and are highly unlikely to be attributable to errors in either R{sub}G{\sub} alone, as has recently been assumed. Because it is not known if R{sub}C{\sub} or R{sub}G{\sub} is a better predictor of open flux emergence before 1874, a simple sunspot number composite is suggested which, like R{sub}G{\sub}, enables modelling of the open solar flux for 1610 onwards in Paper 3, but maintains the characteristics of R{sub}C{\sub}.
Resumo:
From the variation of near-Earth interplanetary conditions, reconstructed for the mid-19th century to the present day using historic geomagnetic activity observations, Lockwood and Owens [2014] have suggested that Earth remains within a broadened streamer belt during solar cycles when the Open Solar Flux (OSF) is low. From this they propose that the Earth was immersed in almost constant slow solar wind during the Maunder minimum (c. 1650-1710). In this paper, we extend continuity modelling of the OSF to predict the streamer belt width using both group sunspot numbers and corrected international sunspot numbers to quantify the emergence rate of new OSF. The results support the idea that the solar wind at Earth was persistently slow during the Maunder minimum because the streamer belt was broad.
Resumo:
The use of three orthogonally tagged phosphine reagents to assist chemical work-up via phase-switch scavenging in conjunction with a modular flow reactor is described. These techniques (acidic, basic and Click chemistry) are used to prepare various amides and tri-substituted guanidines from in situ generated iminophosphoranes.
Resumo:
The cycloaddition of acetylenes with azides to give the corresponding 1,4-disubstituted 1,2,3-triazoles is reported using immobilised reagents and scavengers in pre-packed glass tubes in a modular flow reactor.