17 resultados para knowledge classification
em CentAUR: Central Archive University of Reading - UK
Resumo:
Site-specific management requires accurate knowledge of the spatial variation in a range of soil properties within fields. This involves considerable sampling effort, which is costly. Ancillary data, such as crop yield, elevation and apparent electrical conductivity (ECa) of the soil, can provide insight into the spatial variation of some soil properties. A multivariate classification with spatial constraint imposed by the variogram was used to classify data from two arable crop fields. The yield data comprised 5 years of crop yield, and the ancillary data 3 years of yield data, elevation and ECa. Information on soil chemical and physical properties was provided by intensive surveys of the soil. Multivariate variograms computed from these data were used to constrain sites spatially within classes to increase their contiguity. The constrained classifications resulted in coherent classes, and those based on the ancillary data were similar to those from the soil properties. The ancillary data seemed to identify areas in the field where the soil is reasonably homogeneous. The results of targeted sampling showed that these classes could be used as a basis for management and to guide future sampling of the soil.
Resumo:
Two experiments examined the claim for distinct implicit and explicit learning modes in the artificial grammar-learning task (Reber, 1967, 1989). Subjects initially attempted to memorize strings of letters generated by a finite-state grammar and then classified new grammatical and nongrammatical strings. Experiment 1 showed that subjects' assessment of isolated parts of strings was sufficient to account for their classification performance but that the rules elicited in free report were not sufficient. Experiment 2 showed that performing a concurrent random number generation task under different priorities interfered with free report and classification performance equally. Furthermore, giving different groups of subjects incidental or intentional learning instructions did not affect classification or free report.
Resumo:
Purpose - The purpose of this paper is to provide a quantitative multicriteria decision-making approach to knowledge management in construction entrepreneurship education by means of an analytic knowledge network process (KANP) Design/methodology/approach- The KANP approach in the study integrates a standard industrial classification with the analytic network process (ANP). For the construction entrepreneurship education, a decision-making model named KANP.CEEM is built to apply the KANP method in the evaluation of teaching cases to facilitate the case method, which is widely adopted in entrepreneurship education at business schools. Findings- The study finds that there are eight clusters and 178 nodes in the KANP.CEEM model, and experimental research on the evaluation of teaching cases discloses that the KANP method is effective in conducting knowledge management to the entrepreneurship education. Research limitations/implications- As an experimental research, this paper ignores the concordance between a selected standard classification and others, which perhaps limits the usefulness of KANP.CEEM model elsewhere. Practical implications- As the KANP.CEEM model is built based on the standard classification codes and the embedded ANP, it is thus expected that the model has a wide potential in evaluating knowledge-based teaching materials for any education purpose with a background from the construction industry, and can be used by both faculty and students. Originality/value- This paper fulfils a knowledge management need and offers a practical tool for an academic starting out on the development of knowledge-based teaching cases and other teaching materials or for a student going through the case studies and other learning materials.
Resumo:
The artificial grammar (AG) learning literature (see, e.g., Mathews et al., 1989; Reber, 1967) has relied heavily on a single measure of implicitly acquired knowledge. Recent work comparing this measure (string classification) with a more indirect measure in which participants make liking ratings of novel stimuli (e.g., Manza & Bornstein, 1995; Newell & Bright, 2001) has shown that string classification (which we argue can be thought of as an explicit, rather than an implicit, measure of memory) gives rise to more explicit knowledge of the grammatical structure in learning strings and is more resilient to changes in surface features and processing between encoding and retrieval. We report data from two experiments that extend these findings. In Experiment 1, we showed that a divided attention manipulation (at retrieval) interfered with explicit retrieval of AG knowledge but did not interfere with implicit retrieval. In Experiment 2, we showed that forcing participants to respond within a very tight deadline resulted in the same asymmetric interference pattern between the tasks. In both experiments, we also showed that the type of information being retrieved influenced whether interference was observed. The results are discussed in terms of the relatively automatic nature of implicit retrieval and also with respect to the differences between analytic and nonanalytic processing (Whittlesea Price, 2001).
Resumo:
This work analyzes the use of linear discriminant models, multi-layer perceptron neural networks and wavelet networks for corporate financial distress prediction. Although simple and easy to interpret, linear models require statistical assumptions that may be unrealistic. Neural networks are able to discriminate patterns that are not linearly separable, but the large number of parameters involved in a neural model often causes generalization problems. Wavelet networks are classification models that implement nonlinear discriminant surfaces as the superposition of dilated and translated versions of a single "mother wavelet" function. In this paper, an algorithm is proposed to select dilation and translation parameters that yield a wavelet network classifier with good parsimony characteristics. The models are compared in a case study involving failed and continuing British firms in the period 1997-2000. Problems associated with over-parameterized neural networks are illustrated and the Optimal Brain Damage pruning technique is employed to obtain a parsimonious neural model. The results, supported by a re-sampling study, show that both neural and wavelet networks may be a valid alternative to classical linear discriminant models.
Resumo:
The SPE taxonomy of evolving software systems, first proposed by Lehman in 1980, is re-examined in this work. The primary concepts of software evolution are related to generic theories of evolution, particularly Dawkins' concept of a replicator, to the hermeneutic tradition in philosophy and to Kuhn's concept of paradigm. These concepts provide the foundations that are needed for understanding the phenomenon of software evolution and for refining the definitions of the SPE categories. In particular, this work argues that a software system should be defined as of type P if its controlling stakeholders have made a strategic decision that the system must comply with a single paradigm in its representation of domain knowledge. The proposed refinement of SPE is expected to provide a more productive basis for developing testable hypotheses and models about possible differences in the evolution of E- and P-type systems than is provided by the original scheme. Copyright (C) 2005 John Wiley & Sons, Ltd.
Resumo:
This paper is concerned with the selection of inputs for classification models based on ratios of measured quantities. For this purpose, all possible ratios are built from the quantities involved and variable selection techniques are used to choose a convenient subset of ratios. In this context, two selection techniques are proposed: one based on a pre-selection procedure and another based on a genetic algorithm. In an example involving the financial distress prediction of companies, the models obtained from ratios selected by the proposed techniques compare favorably to a model using ratios usually found in the financial distress literature.
Resumo:
In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Resumo:
The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.
Resumo:
The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.
Resumo:
Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques.
Resumo:
In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.
Resumo:
In this paper, various types of fault detection methods for fuel cells are compared. For example, those that use a model based approach or a data driven approach or a combination of the two. The potential advantages and drawbacks of each method are discussed and comparisons between methods are made. In particular, classification algorithms are investigated, which separate a data set into classes or clusters based on some prior knowledge or measure of similarity. In particular, the application of classification methods to vectors of reconstructed currents by magnetic tomography or to vectors of magnetic field measurements directly is explored. Bases are simulated using the finite integration technique (FIT) and regularization techniques are employed to overcome ill-posedness. Fisher's linear discriminant is used to illustrate these concepts. Numerical experiments show that the ill-posedness of the magnetic tomography problem is a part of the classification problem on magnetic field measurements as well. This is independent of the particular working mode of the cell but influenced by the type of faulty behavior that is studied. The numerical results demonstrate the ill-posedness by the exponential decay behavior of the singular values for three examples of fault classes.
Resumo:
Background: Since their inception, Twitter and related microblogging systems have provided a rich source of information for researchers and have attracted interest in their affordances and use. Since 2009 PubMed has included 123 journal articles on medicine and Twitter, but no overview exists as to how the field uses Twitter in research. // Objective: This paper aims to identify published work relating to Twitter indexed by PubMed, and then to classify it. This classification will provide a framework in which future researchers will be able to position their work, and to provide an understanding of the current reach of research using Twitter in medical disciplines. Limiting the study to papers indexed by PubMed ensures the work provides a reproducible benchmark. // Methods: Papers, indexed by PubMed, on Twitter and related topics were identified and reviewed. The papers were then qualitatively classified based on the paper’s title and abstract to determine their focus. The work that was Twitter focused was studied in detail to determine what data, if any, it was based on, and from this a categorization of the data set size used in the studies was developed. Using open coded content analysis additional important categories were also identified, relating to the primary methodology, domain and aspect. // Results: As of 2012, PubMed comprises more than 21 million citations from biomedical literature, and from these a corpus of 134 potentially Twitter related papers were identified, eleven of which were subsequently found not to be relevant. There were no papers prior to 2009 relating to microblogging, a term first used in 2006. Of the remaining 123 papers which mentioned Twitter, thirty were focussed on Twitter (the others referring to it tangentially). The early Twitter focussed papers introduced the topic and highlighted the potential, not carrying out any form of data analysis. The majority of published papers used analytic techniques to sort through thousands, if not millions, of individual tweets, often depending on automated tools to do so. Our analysis demonstrates that researchers are starting to use knowledge discovery methods and data mining techniques to understand vast quantities of tweets: the study of Twitter is becoming quantitative research. // Conclusions: This work is to the best of our knowledge the first overview study of medical related research based on Twitter and related microblogging. We have used five dimensions to categorise published medical related research on Twitter. This classification provides a framework within which researchers studying development and use of Twitter within medical related research, and those undertaking comparative studies of research relating to Twitter in the area of medicine and beyond, can position and ground their work.
Resumo:
In The Conduct of Inquiry in International Relations, Patrick Jackson situates methodologies in International Relations in relation to their underlying philosophical assumptions. One of his aims is to map International Relations debates in a way that ‘capture[s] current controversies’ (p. 40). This ambition is overstated: whilst Jackson’s typology is useful as a clarificatory tool, (re)classifying existing scholarship in International Relations is more problematic. One problem with Jackson’s approach is that he tends to run together the philosophical assumptions which decisively differentiate his methodologies (by stipulating a distinctive warrant for knowledge claims) and the explanatory strategies that are employed to generate such knowledge claims, suggesting that the latter are entailed by the former. In fact, the explanatory strategies which Jackson associates with each methodology reflect conventional practice in International Relations just as much as they reflect philosophical assumptions. This makes it more difficult to identify each methodology at work than Jackson implies. I illustrate this point through a critical analysis of Jackson’s controversial reclassification of Waltz as an analyticist, showing that whilst Jackson’s typology helps to expose inconsistencies in Waltz’s approach, it does not fully support the proposed reclassification. The conventional aspect of methodologies in International Relations also raises questions about the limits of Jackson’s ‘engaged pluralism’.