44 resultados para machine learning algorithms

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of similar to 32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 x 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computionally efficient sequential learning algorithms are developed for direct-link resource-allocating networks (DRANs). These are achieved by decomposing existing recursive training algorithms on a layer by layer and neuron by neuron basis. This allows network weights to be updated in an efficient parallel manner and facilitates the implementation of minimal update extensions that yield a significant reduction in computation load per iteration compared to existing sequential learning methods employed in resource-allocation network (RAN) and minimal RAN (MRAN) approaches. The new algorithms, which also incorporate a pruning strategy to control network growth, are evaluated on three different system identification benchmark problems and shown to outperform existing methods both in terms of training error convergence and computational efficiency. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The concentration of organic acids in anaerobic digesters is one of the most critical parameters for monitoring and advanced control of anaerobic digestion processes. Thus, a reliable online-measurement system is absolutely necessary. A novel approach to obtaining these measurements indirectly and online using UV/vis spectroscopic probes, in conjunction with powerful pattern recognition methods, is presented in this paper. An UV/vis spectroscopic probe from S::CAN is used in combination with a custom-built dilution system to monitor the absorption of fully fermented sludge at a spectrum from 200 to 750 nm. Advanced pattern recognition methods are then used to map the non-linear relationship between measured absorption spectra to laboratory measurements of organic acid concentrations. Linear discriminant analysis, generalized discriminant analysis (GerDA), support vector machines (SVM), relevance vector machines, random forest and neural networks are investigated for this purpose and their performance compared. To validate the approach, online measurements have been taken at a full-scale 1.3-MW industrial biogas plant. Results show that whereas some of the methods considered do not yield satisfactory results, accurate prediction of organic acid concentration ranges can be obtained with both GerDA and SVM-based classifiers, with classification rates in excess of 87% achieved on test data.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mobile malware has continued to grow at an alarming rate despite on-going mitigation efforts. This has been much more prevalent on Android due to being an open platform that is rapidly overtaking other competing platforms in the mobile smart devices market. Recently, a new generation of Android malware families has emerged with advanced evasion capabilities which make them much more difficult to detect using conventional methods. This paper proposes and investigates a parallel machine learning based classification approach for early detection of Android malware. Using real malware samples and benign applications, a composite classification model is developed from parallel combination of heterogeneous classifiers. The empirical evaluation of the model under different combination schemes demonstrates its efficacy and potential to improve detection accuracy. More importantly, by utilizing several classifiers with diverse characteristics, their strengths can be harnessed not only for enhanced Android malware detection but also quicker white box analysis by means of the more interpretable constituent classifiers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper a multiple classifier machine learning methodology for Predictive Maintenance (PdM) is presented. PdM is a prominent strategy for dealing with maintenance issues given the increasing need to minimize downtime and associated costs. One of the challenges with PdM is generating so called ’health factors’ or quantitative indicators of the status of a system associated with a given maintenance issue, and determining their relationship to operating costs and failure risk. The proposed PdM methodology allows dynamical decision rules to be adopted for maintenance management and can be used with high-dimensional and censored data problems. This is achieved by training multiple classification modules with different prediction horizons to provide different performance trade-offs in terms of frequency of unexpected breaks and unexploited lifetime and then employing this information in an operating cost based maintenance decision system to minimise expected costs. The effectiveness of the methodology is demonstrated using a simulated example and a benchmark semiconductor manufacturing maintenance problem.