38 resultados para machine learning, decision tree, concept drift, ensemble learning, classication, random forest
Resumo:
In this study, we introduce an original distance definition for graphs, called the Markov-inverse-F measure (MiF). This measure enables the integration of classical graph theory indices with new knowledge pertaining to structural feature extraction from semantic networks. MiF improves the conventional Jaccard and/or Simpson indices, and reconciles both the geodesic information (random walk) and co-occurrence adjustment (degree balance and distribution). We measure the effectiveness of graph-based coefficients through the application of linguistic graph information for a neural activity recorded during conceptual processing in the human brain. Specifically, the MiF distance is computed between each of the nouns used in a previous neural experiment and each of the in-between words in a subgraph derived from the Edinburgh Word Association Thesaurus of English. From the MiF-based information matrix, a machine learning model can accurately obtain a scalar parameter that specifies the degree to which each voxel in (the MRI image of) the brain is activated by each word or each principal component of the intermediate semantic features. Furthermore, correlating the voxel information with the MiF-based principal components, a new computational neurolinguistics model with a network connectivity paradigm is created. This allows two dimensions of context space to be incorporated with both semantic and neural distributional representations.
Resumo:
Abstract
Publicly available, outdoor webcams continuously view the world and share images. These cameras include traffic cams, campus cams, ski-resort cams, etc. The Archive of Many Outdoor Scenes (AMOS) is a project aiming to geolocate, annotate, archive, and visualize these cameras and images to serve as a resource for a wide variety of scientific applications. The AMOS dataset has archived over 750 million images of outdoor environments from 27,000 webcams since 2006. Our goal is to utilize the AMOS image dataset and crowdsourcing to develop reliable and valid tools to improve physical activity assessment via online, outdoor webcam capture of global physical activity patterns and urban built environment characteristics.
This project’s grand scale-up of capturing physical activity patterns and built environments is a methodological step forward in advancing a real-time, non-labor intensive assessment using webcams, crowdsourcing, and eventually machine learning. The combined use of webcams capturing outdoor scenes every 30 min and crowdsources providing the labor of annotating the scenes allows for accelerated public health surveillance related to physical activity across numerous built environments. The ultimate goal of this public health and computer vision collaboration is to develop machine learning algorithms that will automatically identify and calculate physical activity patterns.
Resumo:
In this paper, an automatic Smart Irrigation Decision Support System, SIDSS, is proposed to manage irrigation in agriculture. Our system estimates the weekly irrigations needs of a plantation, on the basis of both soil measurements and climatic variables gathered by several autonomous nodes deployed in field. This enables a closed loop control scheme to adapt the decision support system to local perturbations and estimation errors. Two machine learning techniques, PLSR and ANFIS, are proposed as reasoning engine of our SIDSS. Our approach is validated on three commercial plantations of citrus trees located in the South-East of Spain. Performance is tested against decisions taken by a human expert.
Resumo:
The identification and classification of network traffic and protocols is a vital step in many quality of service and security systems. Traffic classification strategies must evolve, alongside the protocols utilising the Internet, to overcome the use of ephemeral or masquerading port numbers and transport layer encryption. This research expands the concept of using machine learning on the initial statistics of flow of packets to determine its underlying protocol. Recognising the need for efficient training/retraining of a classifier and the requirement for fast classification, the authors investigate a new application of k-means clustering referred to as 'two-way' classification. The 'two-way' classification uniquely analyses a bidirectional flow as two unidirectional flows and is shown, through experiments on real network traffic, to improve classification accuracy by as much as 18% when measured against similar proposals. It achieves this accuracy while generating fewer clusters, that is, fewer comparisons are needed to classify a flow. A 'two-way' classification offers a new way to improve accuracy and efficiency of machine learning statistical classifiers while still maintaining the fast training times associated with the k-means.
Resumo:
Wind power generation differs from conventional thermal generation due to the stochastic nature of wind. Thus wind power forecasting plays a key role in dealing with the challenges of balancing supply and demand in any electricity system, given the uncertainty associated with the wind farm power output. Accurate wind power forecasting reduces the need for additional balancing energy and reserve power to integrate wind power. Wind power forecasting tools enable better dispatch, scheduling and unit commitment of thermal generators, hydro plant and energy storage plant and more competitive market trading as wind power ramps up and down on the grid. This paper presents an in-depth review of the current methods and advances in wind power forecasting and prediction. Firstly, numerical wind prediction methods from global to local scales, ensemble forecasting, upscaling and downscaling processes are discussed. Next the statistical and machine learning approach methods are detailed. Then the techniques used for benchmarking and uncertainty analysis of forecasts are overviewed, and the performance of various approaches over different forecast time horizons is examined. Finally, current research activities, challenges and potential future developments are appraised. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
This work examines the conformational ensemble involved in β-hairpin folding by means of advanced molecular dynamics simulations and dimensionality reduction. A fully atomistic description of the protein and the surrounding solvent molecules is used, and this complex energy landscape is sampled by means of parallel tempering metadynamics simulations. The ensemble of configurations explored is analyzed using the recently proposed sketch-map algorithm. Further simulations allow us to probe how mutations affect the structures adopted by this protein. We find that many of the configurations adopted by a mutant are the same as those adopted by the wild-type protein. Furthermore, certain mutations destabilize secondary-structure-containing configurations by preventing the formation of hydrogen bonds or by promoting the formation of new intramolecular contacts. Our analysis demonstrates that machine-learning techniques can be used to study the energy landscapes of complex molecules and that the visualizations that are generated in this way provide a natural basis for examining how the stabilities of particular configurations of the molecule are affected by factors such as temperature or structural mutations.
Resumo:
One of the most popular techniques of generating classifier ensembles is known as stacking which is based on a meta-learning approach. In this paper, we introduce an alternative method to stacking which is based on cluster analysis. Similar to stacking, instances from a validation set are initially classified by all base classifiers. The output of each classifier is subsequently considered as a new attribute of the instance. Following this, a validation set is divided into clusters according to the new attributes and a small subset of the original attributes of the instances. For each cluster, we find its centroid and calculate its class label. The collection of centroids is considered as a meta-classifier. Experimental results show that the new method outperformed all benchmark methods, namely Majority Voting, Stacking J48, Stacking LR, AdaBoost J48, and Random Forest, in 12 out of 22 data sets. The proposed method has two advantageous properties: it is very robust to relatively small training sets and it can be applied in semi-supervised learning problems. We provide a theoretical investigation regarding the proposed method. This demonstrates that for the method to be successful, the base classifiers applied in the ensemble should have greater than 50% accuracy levels.
Resumo:
The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.