75 resultados para Artificial intelligence (AI)


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing size of databases, many existing rule learning algorithms have proved to be computational expensive on large datasets. To tackle the problem of scalability, parallel classification rule induction algorithms have been introduced. As TDIDT is the most popular classifier, even though there are strongly competitive alternative algorithms, most parallel approaches to inducing classification rules are based on TDIDT. In this paper we describe work on a distributed classifier that induces classification rules in a parallel manner based on Prism.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Twitter network has been labelled the most commonly used microblogging application around today. With about 500 million estimated registered users as of June, 2012, Twitter has become a credible medium of sentiment/opinion expression. It is also a notable medium for information dissemination; including breaking news on diverse issues since it was launched in 2007. Many organisations, individuals and even government bodies follow activities on the network in order to obtain knowledge on how their audience reacts to tweets that affect them. We can use postings on Twitter (known as tweets) to analyse patterns associated with events by detecting the dynamics of the tweets. A common way of labelling a tweet is by including a number of hashtags that describe its contents. Association Rule Mining can find the likelihood of co-occurrence of hashtags. In this paper, we propose the use of temporal Association Rule Mining to detect rule dynamics, and consequently dynamics of tweets. We coined our methodology Transaction-based Rule Change Mining (TRCM). A number of patterns are identifiable in these rule dynamics including, new rules, emerging rules, unexpected rules and ?dead' rules. Also the linkage between the different types of rule dynamics is investigated experimentally in this paper.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Planning is one of the key problems for autonomous vehicles operating in road scenarios. Present planning algorithms operate with the assumption that traffic is organised in predefined speed lanes, which makes it impossible to allow autonomous vehicles in countries with unorganised traffic. Unorganised traffic is though capable of higher traffic bandwidths when constituting vehicles vary in their speed capabilities and sizes. Diverse vehicles in an unorganised exhibit unique driving behaviours which are analysed in this paper by a simulation study. The aim of the work reported here is to create a planning algorithm for mixed traffic consisting of both autonomous and non-autonomous vehicles without any inter-vehicle communication. The awareness (e.g. vision) of every vehicle is restricted to nearby vehicles only and a straight infinite road is assumed for decision making regarding navigation in the presence of multiple vehicles. Exhibited behaviours include obstacle avoidance, overtaking, giving way for vehicles to overtake from behind, vehicle following, adjusting the lateral lane position and so on. A conflict of plans is a major issue which will almost certainly arise in the absence of inter-vehicle communication. Hence each vehicle needs to continuously track other vehicles and rectify plans whenever a collision seems likely. Further it is observed here that driver aggression plays a vital role in overall traffic dynamics, hence this has also been factored in accordingly. This work is hence a step forward towards achieving autonomous vehicles in unorganised traffic, while similar effort would be required for planning problems such as intersections, mergers, diversions and other modules like localisation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ensemble learning can be used to increase the overall classification accuracy of a classifier by generating multiple base classifiers and combining their classification results. A frequently used family of base classifiers for ensemble learning are decision trees. However, alternative approaches can potentially be used, such as the Prism family of algorithms that also induces classification rules. Compared with decision trees, Prism algorithms generate modular classification rules that cannot necessarily be represented in the form of a decision tree. Prism algorithms produce a similar classification accuracy compared with decision trees. However, in some cases, for example, if there is noise in the training and test data, Prism algorithms can outperform decision trees by achieving a higher classification accuracy. However, Prism still tends to overfit on noisy data; hence, ensemble learners have been adopted in this work to reduce the overfitting. This paper describes the development of an ensemble learner using a member of the Prism family as the base classifier to reduce the overfitting of Prism algorithms on noisy datasets. The developed ensemble classifier is compared with a stand-alone Prism classifier in terms of classification accuracy and resistance to noise.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the most challenging tasks in financial management for large governmental and industrial organizations is Planning and Budgeting (P&B). The processes involved with P&B are cost and time intensive, especially when dealing with uncertainties and budget adjustments during the planning horizon. This work builds on our previous research in which we proposed and evaluated a fuzzy approach that allows optimizing the budget interactively beyond the initial planning stage. In this research we propose an extension that handles financial stress (i.e. drastic budget cuts) occurred during the budget period. This is done by introducing fuzzy stress parameters which are used to re-distribute the budget in order to minimize the negative impact of the financial stress. The benefits and possible issues of this approach are analyzed critically using a real world case study from the Nuremberg Institute of Technology (NIT). Additionally, ongoing and future research directions are presented.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper considers the use of Association Rule Mining (ARM) and our proposed Transaction based Rule Change Mining (TRCM) to identify the rule types present in tweet’s hashtags over a specific consecutive period of time and their linkage to real life occurrences. Our novel algorithm was termed TRCM-RTI in reference to Rule Type Identification. We created Time Frame Windows (TFWs) to detect evolvement statuses and calculate the lifespan of hashtags in online tweets. We link RTI to real life events by monitoring and recording rule evolvement patterns in TFWs on the Twitter network.

Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper considers variations of a neuron pool selection method known as Affordable Neural Network (AfNN). A saliency measure, based on the second derivative of the objective function is proposed to assess the ability of a trained AfNN to provide neuronal redundancy. The discrepancies between the various affordability variants are explained by correlating unique sub group selections with relevant saliency variations. Overall this study shows that the method in which neurons are selected from a pool is more relevant to how salient individual neurons are, than how often a particular neuron is used during training. The findings herein are relevant to not only providing an analogy to brain function but, also, in optimizing the way a neural network using the affordability method is trained.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Paraconsistent logics are non-classical logics which allow non-trivial and consistent reasoning about inconsistent axioms. They have been pro- posed as a formal basis for handling inconsistent data, as commonly arise in human enterprises, and as methods for fuzzy reasoning, with applica- tions in Artificial Intelligence and the control of complex systems. Formalisations of paraconsistent logics usually require heroic mathe- matical efforts to provide a consistent axiomatisation of an inconsistent system. Here we use transreal arithmetic, which is known to be consis- tent, to arithmetise a paraconsistent logic. This is theoretically simple and should lead to efficient computer implementations. We introduce the metalogical principle of monotonicity which is a very simple way of making logics paraconsistent. Our logic has dialetheaic truth values which are both False and True. It allows contradictory propositions, allows variable contradictions, but blocks literal contradictions. Thus literal reasoning, in this logic, forms an on-the- y, syntactic partition of the propositions into internally consistent sets. We show how the set of all paraconsistent, possible worlds can be represented in a transreal space. During the development of our logic we discuss how other paraconsistent logics could be arithmetised in transreal arithmetic.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Parkinson is a neurodegenerative disease, in which tremor is the main symptom. This paper investigates the use of different classification methods to identify tremors experienced by Parkinsonian patients.Some previous research has focussed tremor analysis on external body signals (e.g., electromyography, accelerometer signals, etc.). Our advantage is that we have access to sub-cortical data, which facilitates the applicability of the obtained results into real medical devices since we are dealing with brain signals directly. Local field potentials (LFP) were recorded in the subthalamic nucleus of 7 Parkinsonian patients through the implanted electrodes of a deep brain stimulation (DBS) device prior to its internalization. Measured LFP signals were preprocessed by means of splinting, down sampling, filtering, normalization and rec-tification. Then, feature extraction was conducted through a multi-level decomposition via a wavelettrans form. Finally, artificial intelligence techniques were applied to feature selection, clustering of tremor types, and tremor detection.The key contribution of this paper is to present initial results which indicate, to a high degree of certainty, that there appear to be two distinct subgroups of patients within the group-1 of patients according to the Consensus Statement of the Movement Disorder Society on Tremor. Such results may well lead to different resultant treatments for the patients involved, depending on how their tremor has been classified. Moreover, we propose a new approach for demand driven stimulation, in which tremor detection is also based on the subtype of tremor the patient has. Applying this knowledge to the tremor detection problem, it can be concluded that the results improve when patient clustering is applied prior to detection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the methodology of providing multiprobability predictions for proteomic mass spectrometry data. The methodology is based on a newly developed machine learning framework called Venn machines. Is allows to output a valid probability interval. The methodology is designed for mass spectrometry data. For demonstrative purposes, we applied this methodology to MALDI-TOF data sets in order to predict the diagnosis of heart disease and early diagnoses of ovarian cancer and breast cancer. The experiments showed that probability intervals are narrow, that is, the output of the multiprobability predictor is similar to a single probability distribution. In addition, probability intervals produced for heart disease and ovarian cancer data were more accurate than the output of corresponding probability predictor. When Venn machines were forced to make point predictions, the accuracy of such predictions is for the most data better than the accuracy of the underlying algorithm that outputs single probability distribution of a label. Application of this methodology to MALDI-TOF data sets empirically demonstrates the validity. The accuracy of the proposed method on ovarian cancer data rises from 66.7 % 11 months in advance of the moment of diagnosis to up to 90.2 % at the moment of diagnosis. The same approach has been applied to heart disease data without time dependency, although the achieved accuracy was not as high (up to 69.9 %). The methodology allowed us to confirm mass spectrometry peaks previously identified as carrying statistically significant information for discrimination between controls and cases.