791 resultados para Rule-Based Classification
Resumo:
Objective: This paper presents a detailed study of fractal-based methods for texture characterization of mammographic mass lesions and architectural distortion. The purpose of this study is to explore the use of fractal and lacunarity analysis for the characterization and classification of both tumor lesions and normal breast parenchyma in mammography. Materials and methods: We conducted comparative evaluations of five popular fractal dimension estimation methods for the characterization of the texture of mass lesions and architectural distortion. We applied the concept of lacunarity to the description of the spatial distribution of the pixel intensities in mammographic images. These methods were tested with a set of 57 breast masses and 60 normal breast parenchyma (dataset1), and with another set of 19 architectural distortions and 41 normal breast parenchyma (dataset2). Support vector machines (SVM) were used as a pattern classification method for tumor classification. Results: Experimental results showed that the fractal dimension of region of interest (ROIs) depicting mass lesions and architectural distortion was statistically significantly lower than that of normal breast parenchyma for all five methods. Receiver operating characteristic (ROC) analysis showed that fractional Brownian motion (FBM) method generated the highest area under ROC curve (A z = 0.839 for dataset1, 0.828 for dataset2, respectively) among five methods for both datasets. Lacunarity analysis showed that the ROIs depicting mass lesions and architectural distortion had higher lacunarities than those of ROIs depicting normal breast parenchyma. The combination of FBM fractal dimension and lacunarity yielded the highest A z value (0.903 and 0.875, respectively) than those based on single feature alone for both given datasets. The application of the SVM improved the performance of the fractal-based features in differentiating tumor lesions from normal breast parenchyma by generating higher A z value. Conclusion: FBM texture model is the most appropriate model for characterizing mammographic images due to self-affinity assumption of the method being a better approximation. Lacunarity is an effective counterpart measure of the fractal dimension in texture feature extraction in mammographic images. The classification results obtained in this work suggest that the SVM is an effective method with great potential for classification in mammographic image analysis.
Resumo:
We introduce a classification-based approach to finding occluding texture boundaries. The classifier is composed of a set of weak learners, which operate on image intensity discriminative features that are defined on small patches and are fast to compute. A database that is designed to simulate digitized occluding contours of textured objects in natural images is used to train the weak learners. The trained classifier score is then used to obtain a probabilistic model for the presence of texture transitions, which can readily be used for line search texture boundary detection in the direction normal to an initial boundary estimate. This method is fast and therefore suitable for real-time and interactive applications. It works as a robust estimator, which requires a ribbon-like search region and can handle complex texture structures without requiring a large number of observations. We demonstrate results both in the context of interactive 2D delineation and of fast 3D tracking and compare its performance with other existing methods for line search boundary detection.
Resumo:
This paper reviews the ways that quality can be assessed in standing waters, a subject that has hitherto attracted little attention but which is now a legal requirement in Europe. It describes a scheme for the assessment and monitoring of water and ecological quality in standing waters greater than about I ha in area in England & Wales although it is generally relevant to North-west Europe. Thirteen hydrological, chemical and biological variables are used to characterise the standing water body in any current sampling. These are lake volume, maximum depth, onductivity, Secchi disc transparency, pH, total alkalinity, calcium ion concentration, total N concentration,winter total oxidised inorganic nitrogen (effectively nitrate) concentration, total P concentration, potential maximum chlorophyll a concentration, a score based on the nature of the submerged and emergent plant community, and the presence or absence of a fish community. Inter alia these variables are key indicators of the state of eutrophication, acidification, salinisation and infilling of a water body.
Resumo:
In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Resumo:
The Distributed Rule Induction (DRI) project at the University of Portsmouth is concerned with distributed data mining algorithms for automatically generating rules of all kinds. In this paper we present a system architecture and its implementation for inducing modular classification rules in parallel in a local area network using a distributed blackboard system. We present initial results of a prototype implementation based on the Prism algorithm.
Resumo:
Induction of classification rules is one of the most important technologies in data mining. Most of the work in this field has concentrated on the Top Down Induction of Decision Trees (TDIDT) approach. However, alternative approaches have been developed such as the Prism algorithm for inducing modular rules. Prism often produces qualitatively better rules than TDIDT but suffers from higher computational requirements. We investigate approaches that have been developed to minimize the computational requirements of TDIDT, in order to find analogous approaches that could reduce the computational requirements of Prism.
Resumo:
Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.
Resumo:
The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.
Resumo:
The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.
Resumo:
In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.
Resumo:
The bewildering complexity of cortical microcircuits at the single cell level gives rise to surprisingly robust emergent activity patterns at the level of laminar and columnar local field potentials (LFPs) in response to targeted local stimuli. Here we report the results of our multivariate data-analytic approach based on simultaneous multi-site recordings using micro-electrode-array chips for investigation of the microcircuitary of rat somatosensory (barrel) cortex. We find high repeatability of stimulus-induced responses, and typical spatial distributions of LFP responses to stimuli in supragranular, granular, and infragranular layers, where the last form a particularly distinct class. Population spikes appear to travel with about 33 cm/s from granular to infragranular layers. Responses within barrel related columns have different profiles than those in neighbouring columns to the left or interchangeably to the right. Variations between slices occur, but can be minimized by strictly obeying controlled experimental protocols. Cluster analysis on normalized recordings indicates specific spatial distributions of time series reflecting the location of sources and sinks independent of the stimulus layer. Although the precise correspondences between single cell activity and LFPs are still far from clear, a sophisticated neuroinformatics approach in combination with multi-site LFP recordings in the standardized slice preparation is suitable for comparing normal conditions to genetically or pharmacologically altered situations based on real cortical microcircuitry.
Resumo:
We propose a new class of neurofuzzy construction algorithms with the aim of maximizing generalization capability specifically for imbalanced data classification problems based on leave-one-out (LOO) cross validation. The algorithms are in two stages, first an initial rule base is constructed based on estimating the Gaussian mixture model with analysis of variance decomposition from input data; the second stage carries out the joint weighted least squares parameter estimation and rule selection using orthogonal forward subspace selection (OFSS)procedure. We show how different LOO based rule selection criteria can be incorporated with OFSS, and advocate either maximizing the leave-one-out area under curve of the receiver operating characteristics, or maximizing the leave-one-out Fmeasure if the data sets exhibit imbalanced class distribution. Extensive comparative simulations illustrate the effectiveness of the proposed algorithms.
Resumo:
Scene classification based on latent Dirichlet allocation (LDA) is a more general modeling method known as a bag of visual words, in which the construction of a visual vocabulary is a crucial quantization process to ensure success of the classification. A framework is developed using the following new aspects: Gaussian mixture clustering for the quantization process, the use of an integrated visual vocabulary (IVV), which is built as the union of all centroids obtained from the separate quantization process of each class, and the usage of some features, including edge orientation histogram, CIELab color moments, and gray-level co-occurrence matrix (GLCM). The experiments are conducted on IKONOS images with six semantic classes (tree, grassland, residential, commercial/industrial, road, and water). The results show that the use of an IVV increases the overall accuracy (OA) by 11 to 12% and 6% when it is implemented on the selected and all features, respectively. The selected features of CIELab color moments and GLCM provide a better OA than the implementation over CIELab color moment or GLCM as individuals. The latter increases the OA by only ∼2 to 3%. Moreover, the results show that the OA of LDA outperforms the OA of C4.5 and naive Bayes tree by ∼20%. © 2014 Society of Photo-Optical Instrumentation Engineers (SPIE) [DOI: 10.1117/1.JRS.8.083690]
Resumo:
Various complex oscillatory processes are involved in the generation of the motor command. The temporal dynamics of these processes were studied for movement detection from single trial electroencephalogram (EEG). Autocorrelation analysis was performed on the EEG signals to find robust markers of movement detection. The evolution of the autocorrelation function was characterised via the relaxation time of the autocorrelation by exponential curve fitting. It was observed that the decay constant of the exponential curve increased during movement, indicating that the autocorrelation function decays slowly during motor execution. Significant differences were observed between movement and no moment tasks. Additionally, a linear discriminant analysis (LDA) classifier was used to identify movement trials with a peak accuracy of 74%.
Resumo:
This paper discusses ECG classification after parametrizing the ECG waveforms in the wavelet domain. The aim of the work is to develop an accurate classification algorithm that can be used to diagnose cardiac beat abnormalities detected using a mobile platform such as smart-phones. Continuous time recurrent neural network classifiers are considered for this task. Records from the European ST-T Database are decomposed in the wavelet domain using discrete wavelet transform (DWT) filter banks and the resulting DWT coefficients are filtered and used as inputs for training the neural network classifier. Advantages of the proposed methodology are the reduced memory requirement for the signals which is of relevance to mobile applications as well as an improvement in the ability of the neural network in its generalization ability due to the more parsimonious representation of the signal to its inputs.