731 resultados para internet traffic classification machine learning apache spark hadoop big data word2vec
Resumo:
This thesis presents new methods for classification and thematic grouping of billions of web pages, at scales previously not achievable. This process is also known as document clustering, where similar documents are automatically associated with clusters that represent various distinct topic. These automatically discovered topics are in turn used to improve search engine performance by only searching the topics that are deemed relevant to particular user queries.
Resumo:
Background Designing novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone. Results We present STAR, Site Targeted Amino acid Recombination predictor, which produces a score indicating the structural disruption caused by recombination, for each position in an amino acid sequence. Example predictions contrasted with those of alternative tools, illustrate STAR'S utility to assist in determining useful recombination sites. Overall, the correlation coefficient between the output of the experimentally validated protein design algorithm SCHEMA and the prediction of STAR is very high (0.89). Conclusion STAR allows the user to explore useful recombination sites in amino acid sequences with unknown structure and unknown evolutionary origin. The predictor service is available from http://pprowler.itee.uq.edu.au/star.
Resumo:
Impaired driver alertness increases the likelihood of drivers’ making mistakes and reacting too late to unexpected events while driving. This is particularly a concern on monotonous roads, where a driver’s attention can decrease rapidly. While effective countermeasures do not currently exist, the development of in-vehicle sensors opens avenues for monitoring driving behavior in real-time. The aim of this study is to predict drivers’ level of alertness through surrogate measures collected from in-vehicle sensors. Electroencephalographic activity is used as a reference to evaluate alertness. Based on a sample of 25 drivers, data was collected in a driving simulator instrumented with an eye tracking system, a heart rate monitor and an electrodermal activity device. Various classification models were tested from linear regressions to Bayesians and data mining techniques. Results indicated that Neural Networks were the most efficient model in detecting lapses in alertness. Findings also show that reduced alertness can be predicted up to 5 minutes in advance with 90% accuracy, using surrogate measures such as time to line crossing, blink frequency and skin conductance level. Such a method could be used to warn drivers of their alertness level through the development of an in-vehicle device monitoring, in real-time, drivers' behavior on highways.
Resumo:
Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in terms of stability and robustness, of active learning models built using conditional random fields (CRFs) for information extraction applications. Stability, defined as a small variation of performance when small variation of the training data or a small variation of the parameters occur, is a major issue for machine learning models, but even more so in the active learning framework which aims to minimise the amount of training data required. The factors we investigate are a) the choice of incremental vs. standard active learning, b) the feature set used as a representation of the text (i.e., morphological features, syntactic features, or semantic features) and c) Gaussian prior variance as one of the important CRFs parameters. Our empirical findings show that incremental learning and the Gaussian prior variance lead to more stable and robust models across iterations. Our study also demonstrates that orthographical, morphological and contextual features as a group of basic features play an important role in learning effective models across all iterations.
Resumo:
We present a Connected Learning Analytics (CLA) toolkit, which enables data to be extracted from social media and imported into a Learning Record Store (LRS), as defined by the new xAPI standard. Core to the toolkit is the notion of learner access to their own data. A number of implementational issues are discussed, and an ontology of xAPI verb/object/activity statements as they might be unified across 7 different social media and online environments is introduced. After considering some of the analytics that learners might be interested in discovering about their own processes (the delivery of which is prioritised for the toolkit) we propose a set of learning activities that could be easily implemented, and their data tracked by anyone using the toolkit and a LRS.
Resumo:
Reflective writing is an important learning task to help foster reflective practice, but even when assessed it is rarely analysed or critically reviewed due to its subjective and affective nature. We propose a process for capturing subjective and affective analytics based on the identification and recontextualisation of anomalous features within reflective text. We evaluate 2 human supervised trials of the process, and so demonstrate the potential for an automated Anomaly Recontextualisation process for Learning Analytics.
Resumo:
This thesis develops a novel approach to robot control that learns to account for a robot's dynamic complexities while executing various control tasks using inspiration from biological sensorimotor control and machine learning. A robot that can learn its own control system can account for complex situations and adapt to changes in control conditions to maximise its performance and reliability in the real world. This research has developed two novel learning methods, with the aim of solving issues with learning control of non-rigid robots that incorporate additional dynamic complexities. The new learning control system was evaluated on a real three degree-of-freedom elastic joint robot arm with a number of experiments: initially validating the learning method and testing its ability to generalise to new tasks, then evaluating the system during a learning control task requiring continuous online model adaptation.
Resumo:
In this paper we propose a novel approach to multi-action recognition that performs joint segmentation and classification. This approach models each action using a Gaussian mixture using robust low-dimensional action features. Segmentation is achieved by performing classification on overlapping temporal windows, which are then merged to produce the final result. This approach is considerably less complicated than previous methods which use dynamic programming or computationally expensive hidden Markov models (HMMs). Initial experiments on a stitched version of the KTH dataset show that the proposed approach achieves an accuracy of 78.3%, outperforming a recent HMM-based approach which obtained 71.2%.
Resumo:
Rolling-element bearing failures are the most frequent problems in rotating machinery, which can be catastrophic and cause major downtime. Hence, providing advance failure warning and precise fault detection in such components are pivotal and cost-effective. The vast majority of past research has focused on signal processing and spectral analysis for fault diagnostics in rotating components. In this study, a data mining approach using a machine learning technique called anomaly detection (AD) is presented. This method employs classification techniques to discriminate between defect examples. Two features, kurtosis and Non-Gaussianity Score (NGS), are extracted to develop anomaly detection algorithms. The performance of the developed algorithms was examined through real data from a test to failure bearing. Finally, the application of anomaly detection is compared with one of the popular methods called Support Vector Machine (SVM) to investigate the sensitivity and accuracy of this approach and its ability to detect the anomalies in early stages.
Resumo:
Aerial surveys conducted using manned or unmanned aircraft with customized camera payloads can generate a large number of images. Manual review of these images to extract data is prohibitive in terms of time and financial resources, thus providing strong incentive to automate this process using computer vision systems. There are potential applications for these automated systems in areas such as surveillance and monitoring, precision agriculture, law enforcement, asset inspection, and wildlife assessment. In this paper, we present an efficient machine learning system for automating the detection of marine species in aerial imagery. The effectiveness of our approach can be credited to the combination of a well-suited region proposal method and the use of Deep Convolutional Neural Networks (DCNNs). In comparison to previous algorithms designed for the same purpose, we have been able to dramatically improve recall to more than 80% and improve precision to 27% by using DCNNs as the core approach.
Resumo:
The quality of species distribution models (SDMs) relies to a large degree on the quality of the input data, from bioclimatic indices to environmental and habitat descriptors (Austin, 2002). Recent reviews of SDM techniques, have sought to optimize predictive performance e.g. Elith et al., 2006. In general SDMs employ one of three approaches to variable selection. The simplest approach relies on the expert to select the variables, as in environmental niche models Nix, 1986 or a generalized linear model without variable selection (Miller and Franklin, 2002). A second approach explicitly incorporates variable selection into model fitting, which allows examination of particular combinations of variables. Examples include generalized linear or additive models with variable selection (Hastie et al. 2002); or classification trees with complexity or model based pruning (Breiman et al., 1984, Zeileis, 2008). A third approach uses model averaging, to summarize the overall contribution of a variable, without considering particular combinations. Examples include neural networks, boosted or bagged regression trees and Maximum Entropy as compared in Elith et al. 2006. Typically, users of SDMs will either consider a small number of variable sets, via the first approach, or else supply all of the candidate variables (often numbering more than a hundred) to the second or third approaches. Bayesian SDMs exist, with several methods for eliciting and encoding priors on model parameters (see review in Low Choy et al. 2010). However few methods have been published for informative variable selection; one example is Bayesian trees (O’Leary 2008). Here we report an elicitation protocol that helps makes explicit a priori expert judgements on the quality of candidate variables. This protocol can be flexibly applied to any of the three approaches to variable selection, described above, Bayesian or otherwise. We demonstrate how this information can be obtained then used to guide variable selection in classical or machine learning SDMs, or to define priors within Bayesian SDMs.
Resumo:
This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images.
Resumo:
Data-driven approaches such as Gaussian Process (GP) regression have been used extensively in recent robotics literature to achieve estimation by learning from experience. To ensure satisfactory performance, in most cases, multiple learning inputs are required. Intuitively, adding new inputs can often contribute to better estimation accuracy, however, it may come at the cost of a new sensor, larger training dataset and/or more complex learning, some- times for limited benefits. Therefore, it is crucial to have a systematic procedure to determine the actual impact each input has on the estimation performance. To address this issue, in this paper we propose to analyse the impact of each input on the estimate using a variance-based sensitivity analysis method. We propose an approach built on Analysis of Variance (ANOVA) decomposition, which can characterise how the prediction changes as one or more of the input changes, and also quantify the prediction uncertainty as attributed from each of the inputs in the framework of dependent inputs. We apply the proposed approach to a terrain-traversability estimation method we proposed in prior work, which is based on multi-task GP regression, and we validate this implementation experimentally using a rover on a Mars-analogue terrain.
Resumo:
This PhD research has proposed new machine learning techniques to improve human action recognition based on local features. Several novel video representation and classification techniques have been proposed to increase the performance with lower computational complexity. The major contributions are the construction of new feature representation techniques, based on advanced machine learning techniques such as multiple instance dictionary learning, Latent Dirichlet Allocation (LDA) and Sparse coding. A Binary-tree based classification technique was also proposed to deal with large amounts of action categories. These techniques are not only improving the classification accuracy with constrained computational resources but are also robust to challenging environmental conditions. These developed techniques can be easily extended to a wide range of video applications to provide near real-time performance.