68 resultados para Gaussian processes for machine learning
Resumo:
The operations and processes that the human brain employs to achieve fast visual categorization remain a matter of debate. A first issue concerns the timing and place of rapid visual categorization and to what extent it can be performed with an early feed-forward pass of information through the visual system. A second issue involves the categorization of stimuli that do not reach visual awareness. There is disagreement over the degree to which these stimuli activate the same early mechanisms as stimuli that are consciously perceived. We employed continuous flash suppression (CFS), EEG recordings, and machine learning techniques to study visual categorization of seen and unseen stimuli. Our classifiers were able to predict from the EEG recordings the category of stimuli on seen trials but not on unseen trials. Rapid categorization of conscious images could be detected around 100?ms on the occipital electrodes, consistent with a fast, feed-forward mechanism of target detection. For the invisible stimuli, however, CFS eliminated all traces of early processing. Our results support the idea of a fast mechanism of categorization and suggest that this early categorization process plays an important role in later, more subtle categorizations, and perceptual processes.
Resumo:
Smart management of maintenances has become fundamental in manufacturing environments in order to decrease downtime and costs associated with failures. Predictive Maintenance (PdM) systems based on Machine Learning (ML) techniques have the possibility with low added costs of drastically decrease failures-related expenses; given the increase of availability of data and capabilities of ML tools, PdM systems are becoming really popular, especially in semiconductor manufacturing. A PdM module based on Classification methods is presented here for the prediction of integral type faults that are related to machine usage and stress of equipment parts. The module has been applied to an important class of semiconductor processes, ion-implantation, for the prediction of ion-source tungsten filament breaks. The PdM has been tested on a real production dataset. © 2013 IEEE.
Resumo:
The problem of learning from imbalanced data is of critical importance in a large number of application domains and can be a bottleneck in the performance of various conventional learning methods that assume the data distribution to be balanced. The class imbalance problem corresponds to dealing with the situation where one class massively outnumbers the other. The imbalance between majority and minority would lead machine learning to be biased and produce unreliable outcomes if the imbalanced data is used directly. There has been increasing interest in this research area and a number of algorithms have been developed. However, independent evaluation of the algorithms is limited. This paper aims at evaluating the performance of five representative data sampling methods namely SMOTE, ADASYN, BorderlineSMOTE, SMOTETomek and RUSBoost that deal with class imbalance problems. A comparative study is conducted and the performance of each method is critically analysed in terms of assessment metrics. © 2013 Springer-Verlag.
Resumo:
This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-and-bound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before.
Resumo:
With over 50 billion downloads and more than 1.3 million apps in Google’s official market, Android has continued to gain popularity amongst smartphone users worldwide. At the same time there has been a rise in malware targeting the platform, with more recent strains employing highly sophisticated detection avoidance techniques. As traditional signature based methods become less potent in detecting unknown malware, alternatives are needed for timely zero-day discovery. Thus this paper proposes an approach that utilizes ensemble learning for Android malware detection. It combines advantages of static analysis with the efficiency and performance of ensemble machine learning to improve Android malware detection accuracy. The machine learning models are built using a large repository of malware samples and benign apps from a leading antivirus vendor. Experimental results and analysis presented shows that the proposed method which uses a large feature space to leverage the power of ensemble learning is capable of 97.3 % to 99% detection accuracy with very low false positive rates.
Resumo:
The need for companies to consider the environmental impact of their operations and supply chains has been highlighted in the literature. However, few studies appear to consider how companies’ progress from proactive environmental strategies implemented at the internal, operations level to proactive environmental strategies implemented at the supply chain level. This study assesses the implementation process through the lens of the natural resource-based view and dynamic capabilities perspective. First, the link between the internal strategy ‘pollution prevention’ and the supply chain strategy ‘process stewardship’ is assessed. Second, the mediating influence of the internal support processes ‘integration’ and ‘learning’ on the implementation process is considered. Data collected from a sample of 1200 UK-based food manufacturing companies is analysed using multiple regression analysis. The findings suggest that the progression to environmental efforts at the supply chain level begins with internally-based environmental efforts and that the integration of these efforts and experience gained from them are important supporting factors in this progression.
Resumo:
Abstract
Publicly available, outdoor webcams continuously view the world and share images. These cameras include traffic cams, campus cams, ski-resort cams, etc. The Archive of Many Outdoor Scenes (AMOS) is a project aiming to geolocate, annotate, archive, and visualize these cameras and images to serve as a resource for a wide variety of scientific applications. The AMOS dataset has archived over 750 million images of outdoor environments from 27,000 webcams since 2006. Our goal is to utilize the AMOS image dataset and crowdsourcing to develop reliable and valid tools to improve physical activity assessment via online, outdoor webcam capture of global physical activity patterns and urban built environment characteristics.
This project’s grand scale-up of capturing physical activity patterns and built environments is a methodological step forward in advancing a real-time, non-labor intensive assessment using webcams, crowdsourcing, and eventually machine learning. The combined use of webcams capturing outdoor scenes every 30 min and crowdsources providing the labor of annotating the scenes allows for accelerated public health surveillance related to physical activity across numerous built environments. The ultimate goal of this public health and computer vision collaboration is to develop machine learning algorithms that will automatically identify and calculate physical activity patterns.
Resumo:
The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.
Resumo:
The identification and classification of network traffic and protocols is a vital step in many quality of service and security systems. Traffic classification strategies must evolve, alongside the protocols utilising the Internet, to overcome the use of ephemeral or masquerading port numbers and transport layer encryption. This research expands the concept of using machine learning on the initial statistics of flow of packets to determine its underlying protocol. Recognising the need for efficient training/retraining of a classifier and the requirement for fast classification, the authors investigate a new application of k-means clustering referred to as 'two-way' classification. The 'two-way' classification uniquely analyses a bidirectional flow as two unidirectional flows and is shown, through experiments on real network traffic, to improve classification accuracy by as much as 18% when measured against similar proposals. It achieves this accuracy while generating fewer clusters, that is, fewer comparisons are needed to classify a flow. A 'two-way' classification offers a new way to improve accuracy and efficiency of machine learning statistical classifiers while still maintaining the fast training times associated with the k-means.
Resumo:
We present novel topological mappings between graphs, trees and generalized trees that means between structured objects with different properties. The two major contributions of this paper are, first, to clarify the relation between graphs, trees and generalized trees, a graph class recently introduced. Second, these transformations provide a unique opportunity to transform structured objects into a representation that might be beneficial for a processing, e.g., by machine learning techniques for graph classification. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) is the first competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. This paper first describes the challenge participation conditions. Next follows the data used – the SEMAINE corpus – and its partitioning into train, development, and test partitions for the challenge with labelling in four dimensions, namely activity, expectation, power, and valence. Further, audio and video baseline features are introduced as well as baseline results that use these features for the three sub-challenges of audio, video, and audiovisual emotion recognition.
Resumo:
The relationships among organisms and their surroundings can be of immense complexity. To describe and understand an ecosystem as a tangled bank, multiple ways of interaction and their effects have to be considered, such as predation, competition, mutualism and facilitation. Understanding the resulting interaction networks is a challenge in changing environments, e.g. to predict knock-on effects of invasive species and to understand how climate change impacts biodiversity. The elucidation of complex ecological systems with their interactions will benefit enormously from the development of new machine learning tools that aim to infer the structure of interaction networks from field data. In the present study, we propose a novel Bayesian regression and multiple changepoint model (BRAM) for reconstructing species interaction networks from observed species distributions. The model has been devised to allow robust inference in the presence of spatial autocorrelation and distributional heterogeneity. We have evaluated the model on simulated data that combines a trophic niche model with a stochastic population model on a 2-dimensional lattice, and we have compared the performance of our model with L1-penalized sparse regression (LASSO) and non-linear Bayesian networks with the BDe scoring scheme. In addition, we have applied our method to plant ground coverage data from the western shore of the Outer Hebrides with the objective to infer the ecological interactions. (C) 2012 Elsevier B.V. All rights reserved.