979 resultados para Text classification
Resumo:
Information was collated on the seed storage behaviour of 67 tree species native to the Amazon rainforest of Brazil; 38 appeared to show orthodox, 23 recalcitrant and six intermediate seed storage behaviour. A double-criteria key based on thousand-seed weight and seed moisture content at shedding to estimate likely seed storage behaviour, developed previously, showed good agreement with the above classifications. The key can aid seed storage behaviour identification considerably.
Resumo:
This paper discusses ECG classification after parametrizing the ECG waveforms in the wavelet domain. The aim of the work is to develop an accurate classification algorithm that can be used to diagnose cardiac beat abnormalities detected using a mobile platform such as smart-phones. Continuous time recurrent neural network classifiers are considered for this task. Records from the European ST-T Database are decomposed in the wavelet domain using discrete wavelet transform (DWT) filter banks and the resulting DWT coefficients are filtered and used as inputs for training the neural network classifier. Advantages of the proposed methodology are the reduced memory requirement for the signals which is of relevance to mobile applications as well as an improvement in the ability of the neural network in its generalization ability due to the more parsimonious representation of the signal to its inputs.
Resumo:
This paper discusses ECG signal classification after parametrizing the ECG waveforms in the wavelet domain. Signal decomposition using perfect reconstruction quadrature mirror filter banks can provide a very parsimonious representation of ECG signals. In the current work, the filter parameters are adjusted by a numerical optimization algorithm in order to minimize a cost function associated to the filter cut-off sharpness. The goal consists of achieving a better compromise between frequency selectivity and time resolution at each decomposition level than standard orthogonal filter banks such as those of the Daubechies and Coiflet families. Our aim is to optimally decompose the signals in the wavelet domain so that they can be subsequently used as inputs for training to a neural network classifier.
Resumo:
Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.
Resumo:
Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.
Resumo:
Involuntary musical imagery (INMI) is the subject of much recent research interest. INMI covers a number of experience types such as musical obsessions and musical hallucinations. One type of experience has been called earworms, for which the literature provides a number of definitions. In this paper we consider the origins of the term earworm in the German language literature and compare that usage with the English language literature. We consider the published literature on earworms and conclude that there is merit in distinguishing between earworms and other types of types of involuntary musical imagery described in the scientific literature: e.g. musical hallucinations, musical obsessions. We also describe other experiences that can be considered under the term INMI. The aim of future research could be to ascertain similarities and differences between types of INMI with a view to refining the classification scheme proposed here.
Resumo:
The objective of this article is to study the problem of pedestrian classification across different light spectrum domains (visible and far-infrared (FIR)) and modalities (intensity, depth and motion). In recent years, there has been a number of approaches for classifying and detecting pedestrians in both FIR and visible images, but the methods are difficult to compare, because either the datasets are not publicly available or they do not offer a comparison between the two domains. Our two primary contributions are the following: (1) we propose a public dataset, named RIFIR , containing both FIR and visible images collected in an urban environment from a moving vehicle during daytime; and (2) we compare the state-of-the-art features in a multi-modality setup: intensity, depth and flow, in far-infrared over visible domains. The experiments show that features families, intensity self-similarity (ISS), local binary patterns (LBP), local gradient patterns (LGP) and histogram of oriented gradients (HOG), computed from FIR and visible domains are highly complementary, but their relative performance varies across different modalities. In our experiments, the FIR domain has proven superior to the visible one for the task of pedestrian classification, but the overall best results are obtained by a multi-domain multi-modality multi-feature fusion.
Resumo:
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.
Resumo:
This work investigates the problem of feature selection in neuroimaging features from structural MRI brain images for the classification of subjects as healthy controls, suffering from Mild Cognitive Impairment or Alzheimer’s Disease. A Genetic Algorithm wrapper method for feature selection is adopted in conjunction with a Support Vector Machine classifier. In very large feature sets, feature selection is found to be redundant as the accuracy is often worsened when compared to an Support Vector Machine with no feature selection. However, when just the hippocampal subfields are used, feature selection shows a significant improvement of the classification accuracy. Three-class Support Vector Machines and two-class Support Vector Machines combined with weighted voting are also compared with the former and found more useful. The highest accuracy achieved at classifying the test data was 65.5% using a genetic algorithm for feature selection with a three-class Support Vector Machine classifier.
Resumo:
The personalised conditioning system (PCS) is widely studied. Potentially, it is able to reduce energy consumption while securing occupants’ thermal comfort requirements. It has been suggested that automatic optimised operation schemes for PCS should be introduced to avoid energy wastage and discomfort caused by inappropriate operation. In certain automatic operation schemes, personalised thermal sensation models are applied as key components to help in setting targets for PCS operation. In this research, a novel personal thermal sensation modelling method based on the C-Support Vector Classification (C-SVC) algorithm has been developed for PCS control. The personal thermal sensation modelling has been regarded as a classification problem. During the modelling process, the method ‘learns’ an occupant’s thermal preferences from his/her feedback, environmental parameters and personal physiological and behavioural factors. The modelling method has been verified by comparing the actual thermal sensation vote (TSV) with the modelled one based on 20 individual cases. Furthermore, the accuracy of each individual thermal sensation model has been compared with the outcomes of the PMV model. The results indicate that the modelling method presented in this paper is an effective tool to model personal thermal sensations and could be integrated within the PCS for optimised system operation and control.
Resumo:
Sea-ice concentrations in the Laptev Sea simulated by the coupled North Atlantic-Arctic Ocean-Sea-Ice Model and Finite Element Sea-Ice Ocean Model are evaluated using sea-ice concentrations from Advanced Microwave Scanning Radiometer-Earth Observing System satellite data and a polynya classification method for winter 2007/08. While developed to simulate largescale sea-ice conditions, both models are analysed here in terms of polynya simulation. The main modification of both models in this study is the implementation of a landfast-ice mask. Simulated sea-ice fields from different model runs are compared with emphasis placed on the impact of this prescribed landfast-ice mask. We demonstrate that sea-ice models are not able to simulate flaw polynyas realistically when used without fast-ice description. Our investigations indicate that without landfast ice and with coarse horizontal resolution the models overestimate the fraction of open water in the polynya. This is not because a realistic polynya appears but due to a larger-scale reduction of ice concentrations and smoothed ice-concentration fields. After implementation of a landfast-ice mask, the polynya location is realistically simulated but the total open-water area is still overestimated in most cases. The study shows that the fast-ice parameterization is essential for model improvements. However, further improvements are necessary in order to progress from the simulation of large-scale features in the Arctic towards a more detailed simulation of smaller-scaled features (here polynyas) in an Arctic shelf sea.
Resumo:
This project is based on Artificial Intelligence (A.I) and Digital Image processing (I.P) for automatic condition monitoring of sleepers in the railway track. Rail inspection is a very important task in railway maintenance for traffic safety issues and in preventing dangerous situations. Monitoring railway track infrastructure is an important aspect in which the periodical inspection of rail rolling plane is required.Up to the present days the inspection of the railroad is operated manually by trained personnel. A human operator walks along the railway track searching for sleeper anomalies. This monitoring way is not more acceptable for its slowness and subjectivity. Hence, it is desired to automate such intuitive human skills for the development of more robust and reliable testing methods. Images of wooden sleepers have been used as data for my project. The aim of this project is to present a vision based technique for inspecting railway sleepers (wooden planks under the railway track) by automatic interpretation of Non Destructive Test (NDT) data using A.I. techniques in determining the results of inspection.
Resumo:
Condition monitoring of wooden railway sleepers applications are generallycarried out by visual inspection and if necessary some impact acoustic examination iscarried out intuitively by skilled personnel. In this work, a pattern recognition solutionhas been proposed to automate the process for the achievement of robust results. Thestudy presents a comparison of several pattern recognition techniques together withvarious nonstationary feature extraction techniques for classification of impactacoustic emissions. Pattern classifiers such as multilayer perceptron, learning cectorquantization and gaussian mixture models, are combined with nonstationary featureextraction techniques such as Short Time Fourier Transform, Continuous WaveletTransform, Discrete Wavelet Transform and Wigner-Ville Distribution. Due to thepresence of several different feature extraction and classification technqies, datafusion has been investigated. Data fusion in the current case has mainly beeninvestigated on two levels, feature level and classifier level respectively. Fusion at thefeature level demonstrated best results with an overall accuracy of 82% whencompared to the human operator.
Resumo:
Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.
Resumo:
The aim of this thesis is to investigate computerized voice assessment methods to classify between the normal and Dysarthric speech signals. In this proposed system, computerized assessment methods equipped with signal processing and artificial intelligence techniques have been introduced. The sentences used for the measurement of inter-stress intervals (ISI) were read by each subject. These sentences were computed for comparisons between normal and impaired voice. Band pass filter has been used for the preprocessing of speech samples. Speech segmentation is performed using signal energy and spectral centroid to separate voiced and unvoiced areas in speech signal. Acoustic features are extracted from the LPC model and speech segments from each audio signal to find the anomalies. The speech features which have been assessed for classification are Energy Entropy, Zero crossing rate (ZCR), Spectral-Centroid, Mean Fundamental-Frequency (Meanf0), Jitter (RAP), Jitter (PPQ), and Shimmer (APQ). Naïve Bayes (NB) has been used for speech classification. For speech test-1 and test-2, 72% and 80% accuracies of classification between healthy and impaired speech samples have been achieved respectively using the NB. For speech test-3, 64% correct classification is achieved using the NB. The results direct the possibility of speech impairment classification in PD patients based on the clinical rating scale.