27 resultados para Prism Yearbooks
Resumo:
In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Resumo:
The Distributed Rule Induction (DRI) project at the University of Portsmouth is concerned with distributed data mining algorithms for automatically generating rules of all kinds. In this paper we present a system architecture and its implementation for inducing modular classification rules in parallel in a local area network using a distributed blackboard system. We present initial results of a prototype implementation based on the Prism algorithm.
Resumo:
In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.
Resumo:
Induction of classification rules is one of the most important technologies in data mining. Most of the work in this field has concentrated on the Top Down Induction of Decision Trees (TDIDT) approach. However, alternative approaches have been developed such as the Prism algorithm for inducing modular rules. Prism often produces qualitatively better rules than TDIDT but suffers from higher computational requirements. We investigate approaches that have been developed to minimize the computational requirements of TDIDT, in order to find analogous approaches that could reduce the computational requirements of Prism.
Resumo:
The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.
Resumo:
The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.
Resumo:
In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.
Resumo:
Construction professional services (CPSs), such as architecture, engineering, and consultancy, are not only high value-added profit centers in their own right but also have a knock-on effect on other businesses, such as construction and the export of materials and machinery. Arguably, competition in the international construction market has shifted to these knowledge-intensive CPS areas. Yet CPSs represent a research frontier that has received scant attention. This research aims to enrich the body of knowledge on CPSs by examining strengths, weaknesses, opportunities, and threats (SWOT) of Chinese CPSs (CCPSs) in the international context. It does so by triangulating theories with quantitative and qualitative data gleaned from yearbooks, annual reports, interviews, seminars, and interactions with managers in major CCPS companies. It is found that CCPSs present both strengths and weaknesses in talents, administration systems, and development strategies in dealing with the external opportunities and threats brought about by globalization and market evolution. Low price, which has helped the Chinese construction business to succeed in the international market, is also a major CCPS strength. An opportunity for CCPSs is the relatively strong delivery capability possessed by Chinese contractors; by partnering with them CCPSs can better establish themselves in the international arena. This is probably the first ever comprehensive study on the performance of CCPSs in the international marketplace. The research is conducted at an opportune time, particularly when the world is witnessing the burgeoning force of Chinese businesses in many areas including manufacturing, construction, and, potentially, professional services. It adds new insights to the knowledge body of CPSs and provides valuable references to other countries faced with the challenge of developing CPS business efficiently in the international market.
Resumo:
In contrast to their bustling construction counterparts, Chinese construction professional services (CPS) such as architecture, engineering, and consultancy, seem still to be stagnant in the international market. CPS are not only high value-added profit centers in their own right, but also have a knock-on effect on subsequent businesses such as construction, and the export of materials and machinery. Arguably, competition in the international construction market has shifted to knowledge-intensive CPS. Yet,CPS represent a research area that has been paid scant attention. This research aims to add to the body of knowledge of CPS by examining strengths, weaknesses, opportunities, and threats (SWOT) of Chinese CPS (CCPS) in the international context. It does so by triangulating theories with quantitative and qualitative data gleaned from yearbooks, annual reports, interviews, seminars, and interactions with managers in major CCPS companies. It is found that CCPS present both strengths and weaknesses in talents, administration systems, and development strategies in dealing with the external opportunities and threats brought about by globalization and market evolvement. Low price, which has helped the Chinese construction business to succeed in the international market, is also a CCPS major strength. An opportunity for CCPS is the relatively strong delivery capability possessed by Chinese contractors. By partnering with them CCPS can better edge into the international arena. This is probably the first ever comprehensive study in investigating the performance of CCPS in the international market. The research is also timely, particularly when the world is witnessing the burgeoning force of Chinese businesses in many areas including manufacturing, construction, and potentially, professional services.
Resumo:
Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.
Resumo:
Arches, streamers, polar lights, merry dancers… just a few of many names used to describe the aurora borealis in historical documents in the UK. We have compiled a new catalogue of 20591 independent reports of auroral sightings from the British Isles and Ireland for 1700–1975 using observatory yearbooks, the diaries of amateur observers, newspaper reports and the scientific literature. Our aim is to provide an independent data series that can aid understanding of longterm solar variability, alongside cosmogenic isotope data and historic records of geomagnetic activity and sunspots.
Resumo:
Dual-polarisation radar measurements provide valuable information about the shapes and orientations of atmospheric ice particles. For quantitative interpretation of these data in the Rayleigh regime, common practice is to approximate the true ice crystal shape with that of a spheroid. Calculations using the discrete dipole approximation for a wide range of crystal aspect ratios demonstrate that approximating hexagonal plates as spheroids leads to significant errors in the predicted differential reflectivity, by as much as 1.5 dB. An empirical modification of the shape factors in Gans's spheroid theory was made using the numerical data. The resulting simple expressions, like Gans's theory, can be applied to crystals in any desired orientation, illuminated by an arbitrarily polarised wave, but are much more accurate for hexagonal particles. Calculations of the scattering from more complex branched and dendritic crystals indicate that these may be accurately modelled using the new expression, but with a reduced permittivity dependent on the volume of ice relative to an enclosing hexagonal prism.