906 resultados para display rules
Resumo:
The Distributed Rule Induction (DRI) project at the University of Portsmouth is concerned with distributed data mining algorithms for automatically generating rules of all kinds. In this paper we present a system architecture and its implementation for inducing modular classification rules in parallel in a local area network using a distributed blackboard system. We present initial results of a prototype implementation based on the Prism algorithm.
Resumo:
Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.
Resumo:
The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.
Resumo:
The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.
Resumo:
In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.
Resumo:
In addition to the expression of recombinant proteins, baculoviruses have been developed as a platform for the display of complex eukaryotic proteins on the surface of virus particles or infected insect cells. Surface display has been used extensively for antigen presentation and targeted gene delivery but is also a candidate for the display of protein libraries for molecular screening. However, although baculovirus gene libraries can be efficiently expressed and displayed on the surface of insect cells, target gene selection is inefficient probably due to super-infection which gives rise to cells expressing more than one protein. In this report baculovirus superinfection of Sf9 cells has been investigated by the use of two recombinant multiple nucleopolyhedrovirus carrying green or red fluorescent proteins under the control of both early and late promoters (vAcBacGFP and vAcBacDsRed). The reporter gene expression was detected 8 hours after the infection of vAcBacGFP and cells in early and late phases of infection could be distinguished by the fluorescence intensity of the expressed protein. Simultaneous infection with vAcBacGFP and vAcBacDsRed viruses each at 0.5 MOI resulted in 80% of infected cells coexpressing the two fluorescent proteins at 48 hours post infection (hpi), and subsequent infection with the two viruses resulted in similar co-infection rate. Most Sf9 cells were re-infectable within the first several hours post infection, but the reinfection rate then decreased to a very low level by 16 hpi. Our data demonstrate that Sf9 cells were easily super-infectable during baculovirus infection, and super-infection could occur simultaneously at the time of the primary infection or subsequently during secondary infection by progeny viruses. The efficiency of super-infection may explain the difficulties of baculovirus display library screening but would benefit the production of complex proteins requiring co-expression of multiple polypeptides.
Resumo:
There are several scoring rules that one can choose from in order to score probabilistic forecasting models or estimate model parameters. Whilst it is generally agreed that proper scoring rules are preferable, there is no clear criterion for preferring one proper scoring rule above another. This manuscript compares and contrasts some commonly used proper scoring rules and provides guidance on scoring rule selection. In particular, it is shown that the logarithmic scoring rule prefers erring with more uncertainty, the spherical scoring rule prefers erring with lower uncertainty, whereas the other scoring rules are indifferent to either option.
Resumo:
Infrared polarization and intensity imagery provide complementary and discriminative information in image understanding and interpretation. In this paper, a novel fusion method is proposed by effectively merging the information with various combination rules. It makes use of both low-frequency and highfrequency images components from support value transform (SVT), and applies fuzzy logic in the combination process. Images (both infrared polarization and intensity images) to be fused are firstly decomposed into low-frequency component images and support value image sequences by the SVT. Then the low-frequency component images are combined using a fuzzy combination rule blending three sub-combination methods of (1) region feature maximum, (2) region feature weighting average, and (3) pixel value maximum; and the support value image sequences are merged using a fuzzy combination rule fusing two sub-combination methods of (1) pixel energy maximum and (2) region feature weighting. With the variables of two newly defined features, i.e. the low-frequency difference feature for low-frequency component images and the support-value difference feature for support value image sequences, trapezoidal membership functions are proposed and developed in tuning the fuzzy fusion process. Finally the fused image is obtained by inverse SVT operations. Experimental results of visual inspection and quantitative evaluation both indicate the superiority of the proposed method to its counterparts in image fusion of infrared polarization and intensity images.
Resumo:
Native-like use of preterit and imperfect morphology in all contexts by English learners of L2 Spanish is the exception rather than the rule, even for successful learners. Nevertheless, recent research has demonstrated that advanced English learners of L2 Spanish attain a native-like morphosyntactic competence for the preterit/imperfect contrast, as evidenced by their native-like knowledge of associated semantic entailments (Goodin-Mayeda and Rothman 2007, Montrul and Slabakova 2003, Slabakova and Montrul 2003, Rothman and Iverson 2007). In addition to an L2 disassociation of morphology and syntax (e.g., Bruhn de Garavito 2003, Lardiere 1998, 2000, 2005, Prévost and White 1999, 2000, Schwartz 2003), I hypothesize that a system of learned pedagogical rules contributes to target-deviant L2 performance in this domain through the most advanced stages of L2 acquisition via its competition with the generative system. I call this hypothesis the Competing Systems Hypothesis. To test its predictions, I compare and contrast the use of the preterit and imperfect in two production tasks by native, tutored (classroom), and naturalistic learners of L2 Spanish.
Resumo:
Forgetting immediate physical reality and having awareness of one�s location in the simulated world is critical to enjoyment and performance in virtual environments be it an interactive 3D game such as Quake or an online virtual 3d community space such as Second Life. Answer to the question "where am I?" at two levels, whether the locus is in the immediate real world as opposed to the virtual world and whether one is aware of the spatial co-ordinates of that locus, hold the key to any virtual 3D experience. While 3D environments, especially virtual environments and their impact on spatial comprehension has been studied in disciplines such as architecture, it is difficult to determine the relative contributions of specific attributes such as screen size or stereoscopy towards spatial comprehension since most of them treat the technology as monolith (box-centered). Using a variable-centered approach put forth by Nass and Mason (1990) which breaks down the technology into its component variables and their corresponding values as its theoretical basis, this paper looks at the contributions of five variables (Stereoscopy, screen size, field of view, level of realism and level of detail) common to most virtual environments on spatial comprehension and presence. The variable centered approach can be daunting as the increase in the number of variables can exponentially increase the number of conditions and resources required. We overcome this drawback posed by adoption of such a theoretical approach by the use of a fractional factorial design for the experiment. This study has completed the first wave of data collection and starting the next phase in January 2007 and expected to complete by February 2007. Theoretical and practical implications of the study are discussed.
Resumo:
The experience of learning and using a second language (L2) has been shown to affect the grey matter (GM) structure of the brain. Importantly, GM density in several cortical and subcortical areas has been shown to be related to performance in L2 tasks. Here we show that bilingualism can lead to increased GM volume in the cerebellum, a structure that has been related to the processing of grammatical rules. Additionally, the cerebellar GM volume of highly proficient L2 speakers is correlated to their performance in a task tapping on grammatical processing in a L2, demonstrating the importance of the cerebellum for the establishment and use of grammatical rules in a L2.