809 resultados para Machine learning classification


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Learning from mistakes has proven to be an effective way of learning in the interactive document classifications. In this paper we propose an approach to effectively learning from mistakes in the email filtering process. Our system has employed both SVM and Winnow machine learning algorithms to learn from misclassified email documents and refine the email filtering process accordingly. Our experiments have shown that the training of an email filter becomes much effective and faster

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper illustrates the prediction of opponent behaviour in a competitive, highly dynamic, multi-agent and partially observableenvironment, namely RoboCup small size league robot soccer. The performance is illustrated in the context of the highly successful robot soccer team, the RoboRoos. The project is broken into three tasks; classification of behaviours, modelling and prediction of behaviours and integration of the predictions into the existing planning system. A probabilistic approach is taken to dealing with the uncertainty in the observations and with representing the uncertainty in the prediction of the behaviours. Results are shown for a classification system using a Naïve Bayesian Network that determines the opponent’s current behaviour. These results are compared to an expert designed fuzzy behaviour classification system. The paper illustrates how the modelling system will use the information from behaviour classification to produce probability distributions that model the manner with which the opponents perform their behaviours. These probability distributions are show to match well with the existing multi-agent planning system (MAPS) that forms the core of the RoboRoos system.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The blood types determination is essential to perform safe blood transfusions. In emergency situations isadministrated the “universal donor” blood type. However, sometimes, this blood type can cause incom-patibilities in the transfusion receptor. A mechatronic prototype was developed to solve this problem.The prototype was built to meet specific goals, incorporating all the necessary components. The obtainedsolution is close to the final system that will be produced later, at industrial scale, as a medical device.The prototype is a portable and low cost device, and can be used in remote locations. A computer appli-cation, previously developed is used to operate with the developed mechatronic prototype, and obtainautomatically test results. It allows image acquisition, processing and analysis, based on Computer Visionalgorithms, Machine Learning algorithms and deterministic algorithms. The Machine Learning algorithmsenable the classification of occurrence, or alack of agglutination in the mixture (blood/reagents), and amore reliable and a safer methodology as test data are stored in a database. The work developed allowsthe administration of a compatible blood type in emergency situations, avoiding the discontinuity of the“universal donor” blood type stocks, and reducing the occurrence of human errors in the transfusion practice.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Automatic ontology building is a vital issue in many fields where they are currently built manually. This paper presents a user-centred methodology for ontology construction based on the use of Machine Learning and Natural Language Processing. In our approach, the user selects a corpus of texts and sketches a preliminary ontology (or selects an existing one) for a domain with a preliminary vocabulary associated to the elements in the ontology (lexicalisations). Examples of sentences involving such lexicalisation (e.g. ISA relation) in the corpus are automatically retrieved by the system. Retrieved examples are validated by the user and used by an adaptive Information Extraction system to generate patterns that discover other lexicalisations of the same objects in the ontology, possibly identifying new concepts or relations. New instances are added to the existing ontology or used to tune it. This process is repeated until a satisfactory ontology is obtained. The methodology largely automates the ontology construction process and the output is an ontology with an associated trained leaner to be used for further ontology modifications.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An overview of neural networks, covering multilayer perceptrons, radial basis functions, constructive algorithms, Kohonen and K-means unupervised algorithms, RAMnets, first and second order training methods, and Bayesian regularisation methods.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Vapnik-Chervonenkis (VC) dimension is a combinatorial measure of a certain class of machine learning problems, which may be used to obtain upper and lower bounds on the number of training examples needed to learn to prescribed levels of accuracy. Most of the known bounds apply to the Probably Approximately Correct (PAC) framework, which is the framework within which we work in this paper. For a learning problem with some known VC dimension, much is known about the order of growth of the sample-size requirement of the problem, as a function of the PAC parameters. The exact value of sample-size requirement is however less well-known, and depends heavily on the particular learning algorithm being used. This is a major obstacle to the practical application of the VC dimension. Hence it is important to know exactly how the sample-size requirement depends on VC dimension, and with that in mind, we describe a general algorithm for learning problems having VC dimension 1. Its sample-size requirement is minimal (as a function of the PAC parameters), and turns out to be the same for all non-trivial learning problems having VC dimension 1. While the method used cannot be naively generalised to higher VC dimension, it suggests that optimal algorithm-dependent bounds may improve substantially on current upper bounds.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A theoretical model is presented which describes selection in a genetic algorithm (GA) under a stochastic fitness measure and correctly accounts for finite population effects. Although this model describes a number of selection schemes, we only consider Boltzmann selection in detail here as results for this form of selection are particularly transparent when fitness is corrupted by additive Gaussian noise. Finite population effects are shown to be of fundamental importance in this case, as the noise has no effect in the infinite population limit. In the limit of weak selection we show how the effects of any Gaussian noise can be removed by increasing the population size appropriately. The theory is tested on two closely related problems: the one-max problem corrupted by Gaussian noise and generalization in a perceptron with binary weights. The averaged dynamics can be accurately modelled for both problems using a formalism which describes the dynamics of the GA using methods from statistical mechanics. The second problem is a simple example of a learning problem and by considering this problem we show how the accurate characterization of noise in the fitness evaluation may be relevant in machine learning. The training error (negative fitness) is the number of misclassified training examples in a batch and can be considered as a noisy version of the generalization error if an independent batch is used for each evaluation. The noise is due to the finite batch size and in the limit of large problem size and weak selection we show how the effect of this noise can be removed by increasing the population size. This allows the optimal batch size to be determined, which minimizes computation time as well as the total number of training examples required.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we introduce and illustrate non-trivial upper and lower bounds on the learning curves for one-dimensional Gaussian Processes. The analysis is carried out emphasising the effects induced on the bounds by the smoothness of the random process described by the Modified Bessel and the Squared Exponential covariance functions. We present an explanation of the early, linearly-decreasing behavior of the learning curves and the bounds as well as a study of the asymptotic behavior of the curves. The effects of the noise level and the lengthscale on the tightness of the bounds are also discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work explores the creation of ambiguous images, i.e., images that may induce multistable perception, by evolutionary means. Ambiguous images are created using a general purpose approach, composed of an expression-based evolutionary engine and a set of object detectors, which are trained in advance using Machine Learning techniques. Images are evolved using Genetic Programming and object detectors are used to classify them. The information gathered during classification is used to assign fitness. In a first stage, the system is used to evolve images that resemble a single object. In a second stage, the discovery of ambiguous images is promoted by combining pairs of object detectors. The analysis of the results highlights the ability of the system to evolve ambiguous images and the differences between computational and human ambiguous images.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains (e.g. biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper addresses the task of learning classifiers from streams of labelled data. In this case we can face the problem that the underlying concepts can change over time. The paper studies two mechanisms developed for dealing with changing concepts. Both are based on the time window idea. The first one forgets gradually, by assigning to the examples weight that gradually decreases over time. The second one uses a statistical test to detect changes in concept and then optimizes the size of the time window, aiming to maximise the classification accuracy on the new examples. Both methods are general in nature and can be used with any learning algorithm. The objectives of the conducted experiments were to compare the mechanisms and explore whether they can be combined to achieve a synergetic e ect. Results from experiments with three basic learning algorithms (kNN, ID3 and NBC) using four datasets are reported and discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The problem of recognition on finite set of events is considered. The generalization ability of classifiers for this problem is studied within the Bayesian approach. The method for non-uniform prior distribution specification on recognition tasks is suggested. It takes into account the assumed degree of intersection between classes. The results of the analysis are applied for pruning of classification trees.