979 resultados para classification task


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Funded by BBSRC funded grant, BB/H019731/1.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users. Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration. Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified. Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring. Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers. Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..

Relevância:

70.00% 70.00%

Publicador:

Resumo:

To classify each stage for a progressing disease such as Alzheimer’s disease is a key issue for the disease prevention and treatment. In this study, we derived structural brain networks from diffusion-weighted MRI using whole-brain tractography since there is growing interest in relating connectivity measures to clinical, cognitive, and genetic data. Relatively little work has usedmachine learning to make inferences about variations in brain networks in the progression of the Alzheimer’s disease. Here we developed a framework to utilize generalized low rank approximations of matrices (GLRAM) and modified linear discrimination analysis for unsupervised feature learning and classification of connectivity matrices. We apply the methods to brain networks derived from DWI scans of 41 people with Alzheimer’s disease, 73 people with EMCI, 38 people with LMCI, 47 elderly healthy controls and 221 young healthy controls. Our results show that this new framework can significantly improve classification accuracy when combining multiple datasets; this suggests the value of using data beyond the classification task at hand to model variations in brain connectivity.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Many visual datasets are traditionally used to analyze the performance of different learning techniques. The evaluation is usually done within each dataset, therefore it is questionable if such results are a reliable indicator of true generalization ability. We propose here an algorithm to exploit the existing data resources when learning on a new multiclass problem. Our main idea is to identify an image representation that decomposes orthogonally into two subspaces: a part specific to each dataset, and a part generic to, and therefore shared between, all the considered source sets. This allows us to use the generic representation as un-biased reference knowledge for a novel classification task. By casting the method in the multi-view setting, we also make it possible to use different features for different databases. We call the algorithm MUST, Multitask Unaligned Shared knowledge Transfer. Through extensive experiments on five public datasets, we show that MUST consistently improves the cross-datasets generalization performance. © 2013 Springer-Verlag.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Binary image classifiction is a problem that has received much attention in recent years. In this paper we evaluate a selection of popular techniques in an effort to find a feature set/ classifier combination which generalizes well to full resolution image data. We then apply that system to images at one-half through one-sixteenth resolution, and consider the corresponding error rates. In addition, we further observe generalization performance as it depends on the number of training images, and lastly, compare the system's best error rates to that of a human performing an identical classification task given teh same set of test images.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The nonlinear, noisy and outlier characteristics of electroencephalography (EEG) signals inspire the employment of fuzzy logic due to its power to handle uncertainty. This paper introduces an approach to classify motor imagery EEG signals using an interval type-2 fuzzy logic system (IT2FLS) in a combination with wavelet transformation. Wavelet coefficients are ranked based on the statistics of the receiver operating characteristic curve criterion. The most informative coefficients serve as inputs to the IT2FLS for the classification task. Two benchmark datasets, named Ia and Ib, downloaded from the brain-computer interface (BCI) competition II, are employed for the experiments. Classification performance is evaluated using accuracy, sensitivity, specificity and F-measure. Widely-used classifiers, including feedforward neural network, support vector machine, k-nearest neighbours, AdaBoost and adaptive neuro-fuzzy inference system, are also implemented for comparisons. The wavelet-IT2FLS method considerably dominates the comparable classifiers on both datasets, and outperforms the best performance on the Ia and Ib datasets reported in the BCI competition II by 1.40% and 2.27% respectively. The proposed approach yields great accuracy and requires low computational cost, which can be applied to a real-time BCI system for motor imagery data analysis.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents two approaches of Artificial Immune System for Pattern Recognition (CLONALG and Parallel AIRS2) to classify automatically the well drilling operation stages. The classification is carried out through the analysis of some mud-logging parameters. In order to validate the performance of AIS techniques, the results were compared with others classification methods: neural network, support vector machine and lazy learning.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An important tool for the heart disease diagnosis is the analysis of electrocardiogram (ECG) signals, since the non-invasive nature and simplicity of the ECG exam. According to the application, ECG data analysis consists of steps such as preprocessing, segmentation, feature extraction and classification aiming to detect cardiac arrhythmias (i.e.; cardiac rhythm abnormalities). Aiming to made a fast and accurate cardiac arrhythmia signal classification process, we apply and analyze a recent and robust supervised graph-based pattern recognition technique, the optimum-path forest (OPF) classifier. To the best of our knowledge, it is the first time that OPF classifier is used to the ECG heartbeat signal classification task. We then compare the performance (in terms of training and testing time, accuracy, specificity, and sensitivity) of the OPF classifier to the ones of other three well-known expert system classifiers, i.e.; support vector machine (SVM), Bayesian and multilayer artificial neural network (MLP), using features extracted from six main approaches considered in literature for ECG arrhythmia analysis. In our experiments, we use the MIT-BIH Arrhythmia Database and the evaluation protocol recommended by The Association for the Advancement of Medical Instrumentation. A discussion on the obtained results shows that OPF classifier presents a robust performance, i.e.; there is no need for parameter setup, as well as a high accuracy at an extremely low computational cost. Moreover, in average, the OPF classifier yielded greater performance than the MLP and SVM classifiers in terms of classification time and accuracy, and to produce quite similar performance to the Bayesian classifier, showing to be a promising technique for ECG signal analysis. © 2012 Elsevier Ltd. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Hierarchical multi-label classification is a complex classification task where the classes involved in the problem are hierarchically structured and each example may simultaneously belong to more than one class in each hierarchical level. In this paper, we extend our previous works, where we investigated a new local-based classification method that incrementally trains a multi-layer perceptron for each level of the classification hierarchy. Predictions made by a neural network in a given level are used as inputs to the neural network responsible for the prediction in the next level. We compare the proposed method with one state-of-the-art decision-tree induction method and two decision-tree induction methods, using several hierarchical multi-label classification datasets. We perform a thorough experimental analysis, showing that our method obtains competitive results to a robust global method regarding both precision and recall evaluation measures.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Unconscious perception is commonly described as a phenomenon that is not under intentional control and relies on automatic processes. We challenge this view by arguing that some automatic processes may indeed be under intentional control, which is implemented in task-sets that define how the task is to be performed. In consequence, those prime attributes that are relevant to the task will be most effective. To investigate this hypothesis, we used a paradigm which has been shown to yield reliable short-lived priming in tasks based on semantic classification of words. This type of study uses fast, well practised classification responses, whereby responses to targets are much less accurate if prime and target belong to a different category than if they belong to the same category. In three experiments, we investigated whether the intention to classify the same words with respect to different semantic categories had a differential effect on priming. The results suggest that this was indeed the case: Priming varied with the task in all experiments. However, although participants reported not seeing the primes, they were able to classify the primes better than chance using the classification task they had used before with the targets. When a lexical task was used for discrimination in experiment 4, masked primes could however not be discriminated. Also, priming was as pronounced when the primes were visible as when they were invisible. The pattern of results suggests that participants had intentional control on prime processing, even if they reported not seeing the primes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Fast Classification (FC) networks were inspired by a biologically plausible mechanism for short term memory where learning occurs instantaneously. Both weights and the topology for an FC network are mapped directly from the training samples by using a prescriptive training scheme. Only two presentations of the training data are required to train an FC network. Compared with iterative learning algorithms such as Back-propagation (which may require many hundreds of presentations of the training data), the training of FC networks is extremely fast and learning convergence is always guaranteed. Thus FC networks may be suitable for applications where real-time classification is needed. In this paper, the FC networks are applied for the real-time extraction of gene expressions for Chlamydia microarray data. Both the classification performance and learning time of the FC networks are compared with the Multi-Layer Proceptron (MLP) networks and support-vector-machines (SVM) in the same classification task. The FC networks are shown to have extremely fast learning time and comparable classification accuracy.