904 resultados para Learning algorithm
Resumo:
Reinforcement learning (RL) is a very suitable technique for robot learning, as it can learn in unknown environments and in real-time computation. The main difficulties in adapting classic RL algorithms to robotic systems are the generalization problem and the correct observation of the Markovian state. This paper attempts to solve the generalization problem by proposing the semi-online neural-Q_learning algorithm (SONQL). The algorithm uses the classic Q_learning technique with two modifications. First, a neural network (NN) approximates the Q_function allowing the use of continuous states and actions. Second, a database of the most representative learning samples accelerates and stabilizes the convergence. The term semi-online is referred to the fact that the algorithm uses the current but also past learning samples. However, the algorithm is able to learn in real-time while the robot is interacting with the environment. The paper shows simulated results with the "mountain-car" benchmark and, also, real results with an underwater robot in a target following behavior
Resumo:
This paper presents a hybrid behavior-based scheme using reinforcement learning for high-level control of autonomous underwater vehicles (AUVs). Two main features of the presented approach are hybrid behavior coordination and semi on-line neural-Q_learning (SONQL). Hybrid behavior coordination takes advantages of robustness and modularity in the competitive approach as well as efficient trajectories in the cooperative approach. SONQL, a new continuous approach of the Q_learning algorithm with a multilayer neural network is used to learn behavior state/action mapping online. Experimental results show the feasibility of the presented approach for AUVs
Resumo:
This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task
Predicting sense of community and participation by applying machine learning to open government data
Resumo:
Community capacity is used to monitor socio-economic development. It is composed of a number of dimensions, which can be measured to understand the possible issues in the implementation of a policy or the outcome of a project targeting a community. Measuring community capacity dimensions is usually expensive and time consuming, requiring locally organised surveys. Therefore, we investigate a technique to estimate them by applying the Random Forests algorithm on secondary open government data. This research focuses on the prediction of measures for two dimensions: sense of community and participation. The most important variables for this prediction were determined. The variables included in the datasets used to train the predictive models complied with two criteria: nationwide availability; sufficiently fine-grained geographic breakdown, i.e. neighbourhood level. The models explained 77% of the sense of community measures and 63% of participation. Due to the low geographic detail of the outcome measures available, further research is required to apply the predictive models to a neighbourhood level. The variables that were found to be more determinant for prediction were only partially in agreement with the factors that, according to the social science literature consulted, are the most influential for sense of community and participation. This finding should be further investigated from a social science perspective, in order to be understood in depth.
Resumo:
In this paper, we employ techniques from artificial intelligence such as reinforcement learning and agent based modeling as building blocks of a computational model for an economy based on conventions. First we model the interaction among firms in the private sector. These firms behave in an information environment based on conventions, meaning that a firm is likely to behave as its neighbors if it observes that their actions lead to a good pay off. On the other hand, we propose the use of reinforcement learning as a computational model for the role of the government in the economy, as the agent that determines the fiscal policy, and whose objective is to maximize the growth of the economy. We present the implementation of a simulator of the proposed model based on SWARM, that employs the SARSA(λ) algorithm combined with a multilayer perceptron as the function approximation for the action value function.
Resumo:
This paper represents the first step in an on-going work for designing an unsupervised method based on genetic algorithm for intrusion detection. Its main role in a broader system is to notify of an unusual traffic and in that way provide the possibility of detecting unknown attacks. Most of the machine-learning techniques deployed for intrusion detection are supervised as these techniques are generally more accurate, but this implies the need of labeling the data for training and testing which is time-consuming and error-prone. Hence, our goal is to devise an anomaly detector which would be unsupervised, but at the same time robust and accurate. Genetic algorithms are robust and able to avoid getting stuck in local optima, unlike the rest of clustering techniques. The model is verified on KDD99 benchmark dataset, generating a solution competitive with the solutions of the state-of-the-art which demonstrates high possibilities of the proposed method.
Resumo:
The present work presents a new method for activity extraction and reporting from video based on the aggregation of fuzzy relations. Trajectory clustering is first employed mainly to discover the points of entry and exit of mobiles appearing in the scene. In a second step, proximity relations between resulting clusters of detected mobiles and contextual elements from the scene are modeled employing fuzzy relations. These can then be aggregated employing typical soft-computing algebra. A clustering algorithm based on the transitive closure calculation of the fuzzy relations allows building the structure of the scene and characterises the ongoing different activities of the scene. Discovered activity zones can be reported as activity maps with different granularities thanks to the analysis of the transitive closure matrix. Taking advantage of the soft relation properties, activity zones and related activities can be labeled in a more human-like language. We present results obtained on real videos corresponding to apron monitoring in the Toulouse airport in France.
Resumo:
A neural network enhanced proportional, integral and derivative (PID) controller is presented that combines the attributes of neural network learning with a generalized minimum-variance self-tuning control (STC) strategy. The neuro PID controller is structured with plant model identification and PID parameter tuning. The plants to be controlled are approximated by an equivalent model composed of a simple linear submodel to approximate plant dynamics around operating points, plus an error agent to accommodate the errors induced by linear submodel inaccuracy due to non-linearities and other complexities. A generalized recursive least-squares algorithm is used to identify the linear submodel, and a layered neural network is used to detect the error agent in which the weights are updated on the basis of the error between the plant output and the output from the linear submodel. The procedure for controller design is based on the equivalent model, and therefore the error agent is naturally functioned within the control law. In this way the controller can deal not only with a wide range of linear dynamic plants but also with those complex plants characterized by severe non-linearity, uncertainties and non-minimum phase behaviours. Two simulation studies are provided to demonstrate the effectiveness of the controller design procedure.
Resumo:
In this paper a new nonlinear digital baseband predistorter design is introduced based on direct learning, together with a new Wiener system modeling approach for the high power amplifiers (HPA) based on the B-spline neural network. The contribution is twofold. Firstly, by assuming that the nonlinearity in the HPA is mainly dependent on the input signal amplitude the complex valued nonlinear static function is represented by two real valued B-spline neural networks, one for the amplitude distortion and another for the phase shift. The Gauss-Newton algorithm is applied for the parameter estimation, in which the De Boor recursion is employed to calculate both the B-spline curve and the first order derivatives. Secondly, we derive the predistorter algorithm calculating the inverse of the complex valued nonlinear static function according to B-spline neural network based Wiener models. The inverse of the amplitude and phase shift distortion are then computed and compensated using the identified phase shift model. Numerical examples have been employed to demonstrate the efficacy of the proposed approaches.
Resumo:
The problem of a manipulator operating in a noisy workspace and required to move from an initial fixed position P0 to a final position Pf is considered. However, Pf is corrupted by noise, giving rise to Pˆf, which may be obtained by sensors. The use of learning automata is proposed to tackle this problem. An automaton is placed at each joint of the manipulator which moves according to the action chosen by the automaton (forward, backward, stationary) at each instant. The simultaneous reward or penalty of the automata enables avoiding any inverse kinematics computations that would be necessary if the distance of each joint from the final position had to be calculated. Three variable-structure learning algorithms are used, i.e., the discretized linear reward-penalty (DLR-P, the linear reward-penalty (LR-P ) and a nonlinear scheme. Each algorithm is separately tested with two (forward, backward) and three forward, backward, stationary) actions.
Resumo:
This contribution introduces a new digital predistorter to compensate serious distortions caused by memory high power amplifiers (HPAs) which exhibit output saturation characteristics. The proposed design is based on direct learning using a data-driven B-spline Wiener system modeling approach. The nonlinear HPA with memory is first identified based on the B-spline neural network model using the Gauss-Newton algorithm, which incorporates the efficient De Boor algorithm with both B-spline curve and first derivative recursions. The estimated Wiener HPA model is then used to design the Hammerstein predistorter. In particular, the inverse of the amplitude distortion of the HPA's static nonlinearity can be calculated effectively using the Newton-Raphson formula based on the inverse of De Boor algorithm. A major advantage of this approach is that both the Wiener HPA identification and the Hammerstein predistorter inverse can be achieved very efficiently and accurately. Simulation results obtained are presented to demonstrate the effectiveness of this novel digital predistorter design.
Resumo:
This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets.
Resumo:
Traditional dictionary learning algorithms are used for finding a sparse representation on high dimensional data by transforming samples into a one-dimensional (1D) vector. This 1D model loses the inherent spatial structure property of data. An alternative solution is to employ Tensor Decomposition for dictionary learning on their original structural form —a tensor— by learning multiple dictionaries along each mode and the corresponding sparse representation in respect to the Kronecker product of these dictionaries. To learn tensor dictionaries along each mode, all the existing methods update each dictionary iteratively in an alternating manner. Because atoms from each mode dictionary jointly make contributions to the sparsity of tensor, existing works ignore atoms correlations between different mode dictionaries by treating each mode dictionary independently. In this paper, we propose a joint multiple dictionary learning method for tensor sparse coding, which explores atom correlations for sparse representation and updates multiple atoms from each mode dictionary simultaneously. In this algorithm, the Frequent-Pattern Tree (FP-tree) mining algorithm is employed to exploit frequent atom patterns in the sparse representation. Inspired by the idea of K-SVD, we develop a new dictionary update method that jointly updates elements in each pattern. Experimental results demonstrate our method outperforms other tensor based dictionary learning algorithms.
Resumo:
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
We study opinion dynamics in a population of interacting adaptive agents voting on a set of issues represented by vectors. We consider agents who can classify issues into one of two categories and can arrive at their opinions using an adaptive algorithm. Adaptation comes from learning and the information for the learning process comes from interacting with other neighboring agents and trying to change the internal state in order to concur with their opinions. The change in the internal state is driven by the information contained in the issue and in the opinion of the other agent. We present results in a simple yet rich context where each agent uses a Boolean perceptron to state their opinion. If the update occurs with information asynchronously exchanged among pairs of agents, then the typical case, if the number of issues is kept small, is the evolution into a society torn by the emergence of factions with extreme opposite beliefs. This occurs even when seeking consensus with agents with opposite opinions. If the number of issues is large, the dynamics becomes trapped, the society does not evolve into factions and a distribution of moderate opinions is observed. The synchronous case is technically simpler and is studied by formulating the problem in terms of differential equations that describe the evolution of order parameters that measure the consensus between pairs of agents. We show that for a large number of issues and unidirectional information flow, global consensus is a fixed point; however, the approach to this consensus is glassy for large societies.