23 resultados para machine learning algorithms


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Spiking neural networks are usually limited in their applications due to their complex mathematical models and the lack of intuitive learning algorithms. In this paper, a simpler, novel neural network derived from a leaky integrate and fire neuron model, the ‘cavalcade’ neuron, is presented. A simulation for the neural network has been developed and two basic learning algorithms implemented within the environment. These algorithms successfully learn some basic temporal and instantaneous problems. Inspiration for neural network structures from these experiments are then taken and applied to process sensor information so as to successfully control a mobile robot.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing size of databases, many existing rule learning algorithms have proved to be computational expensive on large datasets. To tackle the problem of scalability, parallel classification rule induction algorithms have been introduced. As TDIDT is the most popular classifier, even though there are strongly competitive alternative algorithms, most parallel approaches to inducing classification rules are based on TDIDT. In this paper we describe work on a distributed classifier that induces classification rules in a parallel manner based on Prism.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Our digital universe is rapidly expanding,more and more daily activities are digitally recorded, data arrives in streams, it needs to be analyzed in real time and may evolve over time. In the last decade many adaptive learning algorithms and prediction systems, which can automatically update themselves with the new incoming data, have been developed. The majority of those algorithms focus on improving the predictive performance and assume that model update is always desired as soon as possible and as frequently as possible. In this study we consider potential model update as an investment decision, which, as in the financial markets, should be taken only if a certain return on investment is expected. We introduce and motivate a new research problem for data streams ? cost-sensitive adaptation. We propose a reference framework for analyzing adaptation strategies in terms of costs and benefits. Our framework allows to characterize and decompose the costs of model updates, and to asses and interpret the gains in performance due to model adaptation for a given learning algorithm on a given prediction task. Our proof-of-concept experiment demonstrates how the framework can aid in analyzing and managing adaptation decisions in the chemical industry.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Twitter has become a dependable microblogging tool for real time information dissemination and newsworthy events broadcast. Its users sometimes break news on the network faster than traditional newsagents due to their presence at ongoing real life events at most times. Different topic detection methods are currently used to match Twitter posts to real life news of mainstream media. In this paper, we analyse tweets relating to the English FA Cup finals 2012 by applying our novel method named TRCM to extract association rules present in hash tag keywords of tweets in different time-slots. Our system identify evolving hash tag keywords with strong association rules in each time-slot. We then map the identified hash tag keywords to event highlights of the game as reported in the ground truth of the main stream media. The performance effectiveness measure of our experiments show that our method perform well as a Topic Detection and Tracking approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the methodology of providing multiprobability predictions for proteomic mass spectrometry data. The methodology is based on a newly developed machine learning framework called Venn machines. Is allows to output a valid probability interval. The methodology is designed for mass spectrometry data. For demonstrative purposes, we applied this methodology to MALDI-TOF data sets in order to predict the diagnosis of heart disease and early diagnoses of ovarian cancer and breast cancer. The experiments showed that probability intervals are narrow, that is, the output of the multiprobability predictor is similar to a single probability distribution. In addition, probability intervals produced for heart disease and ovarian cancer data were more accurate than the output of corresponding probability predictor. When Venn machines were forced to make point predictions, the accuracy of such predictions is for the most data better than the accuracy of the underlying algorithm that outputs single probability distribution of a label. Application of this methodology to MALDI-TOF data sets empirically demonstrates the validity. The accuracy of the proposed method on ovarian cancer data rises from 66.7 % 11 months in advance of the moment of diagnosis to up to 90.2 % at the moment of diagnosis. The same approach has been applied to heart disease data without time dependency, although the achieved accuracy was not as high (up to 69.9 %). The methodology allowed us to confirm mass spectrometry peaks previously identified as carrying statistically significant information for discrimination between controls and cases.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background Major Depressive Disorder (MDD) is among the most prevalent and disabling medical conditions worldwide. Identification of clinical and biological markers (“biomarkers”) of treatment response could personalize clinical decisions and lead to better outcomes. This paper describes the aims, design, and methods of a discovery study of biomarkers in antidepressant treatment response, conducted by the Canadian Biomarker Integration Network in Depression (CAN-BIND). The CAN-BIND research program investigates and identifies biomarkers that help to predict outcomes in patients with MDD treated with antidepressant medication. The primary objective of this initial study (known as CAN-BIND-1) is to identify individual and integrated neuroimaging, electrophysiological, molecular, and clinical predictors of response to sequential antidepressant monotherapy and adjunctive therapy in MDD. Methods CAN-BIND-1 is a multisite initiative involving 6 academic health centres working collaboratively with other universities and research centres. In the 16-week protocol, patients with MDD are treated with a first-line antidepressant (escitalopram 10–20 mg/d) that, if clinically warranted after eight weeks, is augmented with an evidence-based, add-on medication (aripiprazole 2–10 mg/d). Comprehensive datasets are obtained using clinical rating scales; behavioural, dimensional, and functioning/quality of life measures; neurocognitive testing; genomic, genetic, and proteomic profiling from blood samples; combined structural and functional magnetic resonance imaging; and electroencephalography. De-identified data from all sites are aggregated within a secure neuroinformatics platform for data integration, management, storage, and analyses. Statistical analyses will include multivariate and machine-learning techniques to identify predictors, moderators, and mediators of treatment response. Discussion From June 2013 to February 2015, a cohort of 134 participants (85 outpatients with MDD and 49 healthy participants) has been evaluated at baseline. The clinical characteristics of this cohort are similar to other studies of MDD. Recruitment at all sites is ongoing to a target sample of 290 participants. CAN-BIND will identify biomarkers of treatment response in MDD through extensive clinical, molecular, and imaging assessments, in order to improve treatment practice and clinical outcomes. It will also create an innovative, robust platform and database for future research.