807 resultados para Machine learning experiments
Resumo:
We present novel topological mappings between graphs, trees and generalized trees that means between structured objects with different properties. The two major contributions of this paper are, first, to clarify the relation between graphs, trees and generalized trees, a graph class recently introduced. Second, these transformations provide a unique opportunity to transform structured objects into a representation that might be beneficial for a processing, e.g., by machine learning techniques for graph classification. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
Wind power generation differs from conventional thermal generation due to the stochastic nature of wind. Thus wind power forecasting plays a key role in dealing with the challenges of balancing supply and demand in any electricity system, given the uncertainty associated with the wind farm power output. Accurate wind power forecasting reduces the need for additional balancing energy and reserve power to integrate wind power. Wind power forecasting tools enable better dispatch, scheduling and unit commitment of thermal generators, hydro plant and energy storage plant and more competitive market trading as wind power ramps up and down on the grid. This paper presents an in-depth review of the current methods and advances in wind power forecasting and prediction. Firstly, numerical wind prediction methods from global to local scales, ensemble forecasting, upscaling and downscaling processes are discussed. Next the statistical and machine learning approach methods are detailed. Then the techniques used for benchmarking and uncertainty analysis of forecasts are overviewed, and the performance of various approaches over different forecast time horizons is examined. Finally, current research activities, challenges and potential future developments are appraised. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) is the first competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. This paper first describes the challenge participation conditions. Next follows the data used – the SEMAINE corpus – and its partitioning into train, development, and test partitions for the challenge with labelling in four dimensions, namely activity, expectation, power, and valence. Further, audio and video baseline features are introduced as well as baseline results that use these features for the three sub-challenges of audio, video, and audiovisual emotion recognition.
Resumo:
The relationships among organisms and their surroundings can be of immense complexity. To describe and understand an ecosystem as a tangled bank, multiple ways of interaction and their effects have to be considered, such as predation, competition, mutualism and facilitation. Understanding the resulting interaction networks is a challenge in changing environments, e.g. to predict knock-on effects of invasive species and to understand how climate change impacts biodiversity. The elucidation of complex ecological systems with their interactions will benefit enormously from the development of new machine learning tools that aim to infer the structure of interaction networks from field data. In the present study, we propose a novel Bayesian regression and multiple changepoint model (BRAM) for reconstructing species interaction networks from observed species distributions. The model has been devised to allow robust inference in the presence of spatial autocorrelation and distributional heterogeneity. We have evaluated the model on simulated data that combines a trophic niche model with a stochastic population model on a 2-dimensional lattice, and we have compared the performance of our model with L1-penalized sparse regression (LASSO) and non-linear Bayesian networks with the BDe scoring scheme. In addition, we have applied our method to plant ground coverage data from the western shore of the Outer Hebrides with the objective to infer the ecological interactions. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The optimization of full-scale biogas plant operation is of great importance to make biomass a competitive source of renewable energy. The implementation of innovative control and optimization algorithms, such as Nonlinear Model Predictive Control, requires an online estimation of operating states of biogas plants. This state estimation allows for optimal control and operating decisions according to the actual state of a plant. In this paper such a state estimator is developed using a calibrated simulation model of a full-scale biogas plant, which is based on the Anaerobic Digestion Model No.1. The use of advanced pattern recognition methods shows that model states can be predicted from basic online measurements such as biogas production, CH4 and CO2 content in the biogas, pH value and substrate feed volume of known substrates. The machine learning methods used are trained and evaluated using synthetic data created with the biogas plant model simulating over a wide range of possible plant operating regions. Results show that the operating state vector of the modelled anaerobic digestion process can be predicted with an overall accuracy of about 90%. This facilitates the application of state-based optimization and control algorithms on full-scale biogas plants and therefore fosters the production of eco-friendly energy from biomass.
Resumo:
Mobile malware has been growing in scale and complexity spurred by the unabated uptake of smartphones worldwide. Android is fast becoming the most popular mobile platform resulting in sharp increase in malware targeting the platform. Additionally, Android malware is evolving rapidly to evade detection by traditional signature-based scanning. Despite current detection measures in place, timely discovery of new malware is still a critical issue. This calls for novel approaches to mitigate the growing threat of zero-day Android malware. Hence, the authors develop and analyse proactive machine-learning approaches based on Bayesian classification aimed at uncovering unknown Android malware via static analysis. The study, which is based on a large malware sample set of majority of the existing families, demonstrates detection capabilities with high accuracy. Empirical results and comparative analysis are presented offering useful insight towards development of effective static-analytic Bayesian classification-based solutions for detecting unknown Android malware.
Resumo:
The operations and processes that the human brain employs to achieve fast visual categorization remain a matter of debate. A first issue concerns the timing and place of rapid visual categorization and to what extent it can be performed with an early feed-forward pass of information through the visual system. A second issue involves the categorization of stimuli that do not reach visual awareness. There is disagreement over the degree to which these stimuli activate the same early mechanisms as stimuli that are consciously perceived. We employed continuous flash suppression (CFS), EEG recordings, and machine learning techniques to study visual categorization of seen and unseen stimuli. Our classifiers were able to predict from the EEG recordings the category of stimuli on seen trials but not on unseen trials. Rapid categorization of conscious images could be detected around 100?ms on the occipital electrodes, consistent with a fast, feed-forward mechanism of target detection. For the invisible stimuli, however, CFS eliminated all traces of early processing. Our results support the idea of a fast mechanism of categorization and suggest that this early categorization process plays an important role in later, more subtle categorizations, and perceptual processes.
Resumo:
Both embodied and symbolic accounts of conceptual organization would predict partial sharing and partial differentiation between the neural activations seen for concepts activated via different stimulus modalities. But cross-participant and cross-session variability in BOLD activity patterns makes analyses of such patterns with MVPA methods challenging. Here, we examine the effect of cross-modal and individual variation on the machine learning analysis of fMRI data recorded during a word property generation task. We present the same set of living and non-living concepts (land-mammals, or work tools) to a cohort of Japanese participants in two sessions: the first using auditory presentation of spoken words; the second using visual presentation of words written in Japanese characters. Classification accuracies confirmed that these semantic categories could be detected in single trials, with within-session predictive accuracies of 80-90%. However cross-session prediction (learning from auditory-task data to classify data from the written-word-task, or vice versa) suffered from a performance penalty, achieving 65-75% (still individually significant at p « 0.05). We carried out several follow-on analyses to investigate the reason for this shortfall, concluding that distributional differences in neither time nor space alone could account for it. Rather, combined spatio-temporal patterns of activity need to be identified for successful cross-session learning, and this suggests that feature selection strategies could be modified to take advantage of this.
Resumo:
Smart management of maintenances has become fundamental in manufacturing environments in order to decrease downtime and costs associated with failures. Predictive Maintenance (PdM) systems based on Machine Learning (ML) techniques have the possibility with low added costs of drastically decrease failures-related expenses; given the increase of availability of data and capabilities of ML tools, PdM systems are becoming really popular, especially in semiconductor manufacturing. A PdM module based on Classification methods is presented here for the prediction of integral type faults that are related to machine usage and stress of equipment parts. The module has been applied to an important class of semiconductor processes, ion-implantation, for the prediction of ion-source tungsten filament breaks. The PdM has been tested on a real production dataset. © 2013 IEEE.
Resumo:
In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.
Resumo:
The momentum term has long been used in machine learning algorithms, especially back-propagation, to improve their speed of convergence. In this paper, we derive an expression to prove the O(1/k2) convergence rate of the online gradient method, with momentum type updates, when the individual gradients are constrained by a growth condition. We then apply these type of updates to video background modelling by using it in the update equations of the Region-based Mixture of Gaussians algorithm. Extensive evaluations are performed on both simulated data, as well as challenging real world scenarios with dynamic backgrounds, to show that these regularised updates help the mixtures converge faster than the conventional approach and consequently improve the algorithm’s performance.
Resumo:
Real-world graphs or networks tend to exhibit a well-known set of properties, such as heavy-tailed degree distributions, clustering and community formation. Much effort has been directed into creating realistic and tractable models for unlabelled graphs, which has yielded insights into graph structure and evolution. Recently, attention has moved to creating models for labelled graphs: many real-world graphs are labelled with both discrete and numeric attributes. In this paper, we presentAgwan (Attribute Graphs: Weighted and Numeric), a generative model for random graphs with discrete labels and weighted edges. The model is easily generalised to edges labelled with an arbitrary number of numeric attributes. We include algorithms for fitting the parameters of the Agwanmodel to real-world graphs and for generating random graphs from the model. Using real-world directed and undirected graphs as input, we compare our approach to state-of-the-art random labelled graph generators and draw conclusions about the contribution of discrete vertex labels and edge weights to graph structure.