779 resultados para Supervised machine learning
Resumo:
Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.
Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.
Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.
Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.
Resumo:
Gun related violence is a complex issue and accounts for a large proportion of violent incidents. In the research reported in this paper, we set out to investigate the pro-gun and anti-gun sentiments expressed on a social media platform, namely Twitter, in response to the 2012 Sandy Hook Elementary School shooting in Connecticut, USA. Machine learning techniques are applied to classify a data corpus of over 700,000 tweets. The sentiments are captured using a public sentiment score that considers the volume of tweets as well as population. A web-based interactive tool is developed to visualise the sentiments and is available at this http://www.gunsontwitter.com. The key findings from this research are: (i) There are elevated rates of both pro-gun and anti-gun sentiments on the day of the shooting. Surprisingly, the pro-gun sentiment remains high for a number of days following the event but the anti-gun sentiment quickly falls to pre-event levels. (ii) There is a different public response from each state, with the highest pro-gun sentiment not coming from those with highest gun ownership levels but rather from California, Texas and New York.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Programa de doctorado: Tecnología Industrial. La fecha de publicación es la fecha de lectura.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Computational intelligent support for decision making is becoming increasingly popular and essential among medical professionals. Also, with the modern medical devices being capable to communicate with ICT, created models can easily find practical translation into software. Machine learning solutions for medicine range from the robust but opaque paradigms of support vector machines and neural networks to the also performant, yet more comprehensible, decision trees and rule-based models. So how can such different techniques be combined such that the professional obtains the whole spectrum of their particular advantages? The presented approaches have been conceived for various medical problems, while permanently bearing in mind the balance between good accuracy and understandable interpretation of the decision in order to truly establish a trustworthy ‘artificial’ second opinion for the medical expert.
Resumo:
Evolutionary algorithms alone cannot solve optimization problems very efficiently since there are many random (not very rational) decisions in these algorithms. Combination of evolutionary algorithms and other techniques have been proven to be an efficient optimization methodology. In this talk, I will explain the basic ideas of our three algorithms along this line (1): Orthogonal genetic algorithm which treats crossover/mutation as an experimental design problem, (2) Multiobjective evolutionary algorithm based on decomposition (MOEA/D) which uses decomposition techniques from traditional mathematical programming in multiobjective optimization evolutionary algorithm, and (3) Regular model based multiobjective estimation of distribution algorithms (RM-MEDA) which uses the regular property and machine learning methods for improving multiobjective evolutionary algorithms.
Resumo:
In this thesis, a machine learning approach was used to develop a predictive model for residual methanol concentration in industrial formalin produced at the Akzo Nobel factory in Kristinehamn, Sweden. The MATLABTM computational environment supplemented with the Statistics and Machine LearningTM toolbox from the MathWorks were used to test various machine learning algorithms on the formalin production data from Akzo Nobel. As a result, the Gaussian Process Regression algorithm was found to provide the best results and was used to create the predictive model. The model was compiled to a stand-alone application with a graphical user interface using the MATLAB CompilerTM.
Resumo:
Developers strive to create innovative Artificial Intelligence (AI) behaviour in their games as a key selling point. Machine Learning is an area of AI that looks at how applications and agents can be programmed to learn their own behaviour without the need to manually design and implement each aspect of it. Machine learning methods have been utilised infrequently within games and are usually trained to learn offline before the game is released to the players. In order to investigate new ways AI could be applied innovatively to games it is wise to explore how machine learning methods could be utilised in real-time as the game is played, so as to allow AI agents to learn directly from the player or their environment. Two machine learning methods were implemented into a simple 2D Fighter test game to allow the agents to fully showcase their learned behaviour as the game is played. The methods chosen were: Q-Learning and an NGram based system. It was found that N-Grams and QLearning could significantly benefit game developers as they facilitate fast, realistic learning at run-time.
Resumo:
The purpose of this work in progress study was to test the concept of recognising plants using images acquired by image sensors in a controlled noise-free environment. The presence of vegetation on railway trackbeds and embankments presents potential problems. Woody plants (e.g. Scots pine, Norway spruce and birch) often establish themselves on railway trackbeds. This may cause problems because legal herbicides are not effective in controlling them; this is particularly the case for conifers. Thus, if maintenance administrators knew the spatial position of plants along the railway system, it may be feasible to mechanically harvest them. Primary data were collected outdoors comprising around 700 leaves and conifer seedlings from 11 species. These were then photographed in a laboratory environment. In order to classify the species in the acquired image set, a machine learning approach known as Bag-of-Features (BoF) was chosen. Irrespective of the chosen type of feature extraction and classifier, the ability to classify a previously unseen plant correctly was greater than 85%. The maintenance planning of vegetation control could be improved if plants were recognised and localised. It may be feasible to mechanically harvest them (in particular, woody plants). In addition, listed endangered species growing on the trackbeds can be avoided. Both cases are likely to reduce the amount of herbicides, which often is in the interest of public opinion. Bearing in mind that natural objects like plants are often more heterogeneous within their own class rather than outside it, the results do indeed present a stable classification performance, which is a sound prerequisite in order to later take the next step to include a natural background. Where relevant, species can also be listed under the Endangered Species Act.
Resumo:
Interactions in mobile devices normally happen in an explicit manner, which means that they are initiated by the users. Yet, users are typically unaware that they also interact implicitly with their devices. For instance, our hand pose changes naturally when we type text messages. Whilst the touchscreen captures finger touches, hand movements during this interaction however are unused. If this implicit hand movement is observed, it can be used as additional information to support or to enhance the users’ text entry experience. This thesis investigates how implicit sensing can be used to improve existing, standard interaction technique qualities. In particular, this thesis looks into enhancing front-of-device interaction through back-of-device and hand movement implicit sensing. We propose the investigation through machine learning techniques. We look into problems on how sensor data via implicit sensing can be used to predict a certain aspect of an interaction. For instance, one of the questions that this thesis attempts to answer is whether hand movement during a touch targeting task correlates with the touch position. This is a complex relationship to understand but can be best explained through machine learning. Using machine learning as a tool, such correlation can be measured, quantified, understood and used to make predictions on future touch position. Furthermore, this thesis also evaluates the predictive power of the sensor data. We show this through a number of studies. In Chapter 5 we show that probabilistic modelling of sensor inputs and recorded touch locations can be used to predict the general area of future touches on touchscreen. In Chapter 7, using SVM classifiers, we show that data from implicit sensing from general mobile interactions is user-specific. This can be used to identify users implicitly. In Chapter 6, we also show that touch interaction errors can be detected from sensor data. In our experiment, we show that there are sufficient distinguishable patterns between normal interaction signals and signals that are strongly correlated with interaction error. In all studies, we show that performance gain can be achieved by combining sensor inputs.
Resumo:
Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O(1/ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log(1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.