885 resultados para Q-learning algorithm
Resumo:
This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task
Resumo:
Q?Web és un projecte de programari emmarcat dins l'àmbit de l'anomenat Technology Enhanced Learning (TEL). Aquest àrea de coneixement fa referència a l'ús de les TIC's per donar suport a qualsevol activitat d'aprenentatge.
Resumo:
We propose and validate a multivariate classification algorithm for characterizing changes in human intracranial electroencephalographic data (iEEG) after learning motor sequences. The algorithm is based on a Hidden Markov Model (HMM) that captures spatio-temporal properties of the iEEG at the level of single trials. Continuous intracranial iEEG was acquired during two sessions (one before and one after a night of sleep) in two patients with depth electrodes implanted in several brain areas. They performed a visuomotor sequence (serial reaction time task, SRTT) using the fingers of their non-dominant hand. Our results show that the decoding algorithm correctly classified single iEEG trials from the trained sequence as belonging to either the initial training phase (day 1, before sleep) or a later consolidated phase (day 2, after sleep), whereas it failed to do so for trials belonging to a control condition (pseudo-random sequence). Accurate single-trial classification was achieved by taking advantage of the distributed pattern of neural activity. However, across all the contacts the hippocampus contributed most significantly to the classification accuracy for both patients, and one fronto-striatal contact for one patient. Together, these human intracranial findings demonstrate that a multivariate decoding approach can detect learning-related changes at the level of single-trial iEEG. Because it allows an unbiased identification of brain sites contributing to a behavioral effect (or experimental condition) at the level of single subject, this approach could be usefully applied to assess the neural correlates of other complex cognitive functions in patients implanted with multiple electrodes.
Resumo:
In this paper we study the relevance of multiple kernel learning (MKL) for the automatic selection of time series inputs. Recently, MKL has gained great attention in the machine learning community due to its flexibility in modelling complex patterns and performing feature selection. In general, MKL constructs the kernel as a weighted linear combination of basis kernels, exploiting different sources of information. An efficient algorithm wrapping a Support Vector Regression model for optimizing the MKL weights, named SimpleMKL, is used for the analysis. In this sense, MKL performs feature selection by discarding inputs/kernels with low or null weights. The approach proposed is tested with simulated linear and nonlinear time series (AutoRegressive, Henon and Lorenz series).
Resumo:
The potential of type-2 fuzzy sets for managing high levels of uncertainty in the subjective knowledge of experts or of numerical information has focused on control and pattern classification systems in recent years. One of the main challenges in designing a type-2 fuzzy logic system is how to estimate the parameters of type-2 fuzzy membership function (T2MF) and the Footprint of Uncertainty (FOU) from imperfect and noisy datasets. This paper presents an automatic approach for learning and tuning Gaussian interval type-2 membership functions (IT2MFs) with application to multi-dimensional pattern classification problems. T2MFs and their FOUs are tuned according to the uncertainties in the training dataset by a combination of genetic algorithm (GA) and crossvalidation techniques. In our GA-based approach, the structure of the chromosome has fewer genes than other GA methods and chromosome initialization is more precise. The proposed approach addresses the application of the interval type-2 fuzzy logic system (IT2FLS) for the problem of nodule classification in a lung Computer Aided Detection (CAD) system. The designed IT2FLS is compared with its type-1 fuzzy logic system (T1FLS) counterpart. The results demonstrate that the IT2FLS outperforms the T1FLS by more than 30% in terms of classification accuracy.
Resumo:
In this paper we present a novel approach to assigning roles to robots in a team of physical heterogeneous robots. Its members compete for these roles and get rewards for them. The rewards are used to determine each agent’s preferences and which agents are better adapted to the environment. These aspects are included in the decision making process. Agent interactions are modelled using the concept of an ecosystem in which each robot is a species, resulting in emergent behaviour of the whole set of agents. One of the most important features of this approach is its high adaptability. Unlike some other learning techniques, this approach does not need to start a whole exploitation process when the environment changes. All this is exemplified by means of experiments run on a simulator. In addition, the algorithm developed was applied as applied to several teams of robots in order to analyse the impact of heterogeneity in these systems
Resumo:
The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.
Resumo:
This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.
Resumo:
Context: Ovarian tumors (OT) typing is a competency expected from pathologists, with significant clinical implications. OT however come in numerous different types, some rather rare, with the consequence of few opportunities for practice in some departments. Aim: Our aim was to design a tool for pathologists to train in less common OT typing. Method and Results: Representative slides of 20 less common OT were scanned (Nano Zoomer Digital Hamamatsu®) and the diagnostic algorithm proposed by Young and Scully applied to each case (Young RH and Scully RE, Seminars in Diagnostic Pathology 2001, 18: 161-235) to include: recognition of morphological pattern(s); shortlisting of differential diagnosis; proposition of relevant immunohistochemical markers. The next steps of this project will be: evaluation of the tool in several post-graduate training centers in Europe and Québec; improvement of its design based on evaluation results; diffusion to a larger public. Discussion: In clinical medicine, solving many cases is recognized as of utmost importance for a novice to become an expert. This project relies on the virtual slides technology to provide pathologists with a learning tool aimed at increasing their skills in OT typing. After due evaluation, this model might be extended to other uncommon tumors.
Resumo:
Traditionally simulators have been used extensively in robotics to develop robotic systems without the need to build expensive hardware. However, simulators can be also be used as a “memory”for a robot. This allows the robot to try out actions in simulation before executing them for real. The key obstacle to this approach is an uncertainty of knowledge about the environment. The goal of the Master’s Thesis work was to develop a method, which allows updating the simulation model based on actual measurements to achieve a success of the planned task. OpenRAVE was chosen as an experimental simulation environment on planning,trial and update stages. Steepest Descent algorithm in conjunction with Golden Section search procedure form the principle part of optimization process. During experiments, the properties of the proposed method, such as sensitivity to different parameters, including gradient and error function, were examined. The limitations of the approach were established, based on analyzing the regions of convergence.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
We studied the effects of chronic intoxication with the heavy metals lead (Pb2+) and zinc (Zn2+) on memory formation in mice. Animals were intoxicated through drinking water during the pre- and postnatal periods and then tested in the step-through inhibitory avoidance memory task. Chronic postnatal intoxication with Pb2+ did not change the step-through latency values recorded during the 4 weeks of the test (ANOVA, P>0.05). In contrast, mice intoxicated during the prenatal period showed significantly reduced latency values when compared to the control group (day 1: q = 4.62, P<0.05; day 7: q = 4.42, P<0.05; day 14: q = 5.65, P<0.05; day 21: q = 3.96, P<0.05, and day 28: q = 6.09, P<0.05). Although chronic postnatal intoxication with Zn2+ did not alter a memory retention test performed 24 h after training, we noticed a gradual decrease in latency at subsequent 4-week intervals (F = 3.07, P<0.05), an effect that was not observed in the control or in the Pb2+-treated groups. These results suggest an impairment of memory formation by Pb2+ when the animals are exposed during the critical period of neurogenesis, while Zn2+ appears to facilitate learning extinction.
Resumo:
The loss of brain volume has been used as a marker of tissue destruction and can be used as an index of the progression of neurodegenerative diseases, such as multiple sclerosis. In the present study, we tested a new method for tissue segmentation based on pixel intensity threshold using generalized Tsallis entropy to determine a statistical segmentation parameter for each single class of brain tissue. We compared the performance of this method using a range of different q parameters and found a different optimal q parameter for white matter, gray matter, and cerebrospinal fluid. Our results support the conclusion that the differences in structural correlations and scale invariant similarities present in each tissue class can be accessed by generalized Tsallis entropy, obtaining the intensity limits for these tissue class separations. In order to test this method, we used it for analysis of brain magnetic resonance images of 43 patients and 10 healthy controls matched for gender and age. The values found for the entropic q index were 0.2 for cerebrospinal fluid, 0.1 for white matter and 1.5 for gray matter. With this algorithm, we could detect an annual loss of 0.98% for the patients, in agreement with literature data. Thus, we can conclude that the entropy of Tsallis adds advantages to the process of automatic target segmentation of tissue classes, which had not been demonstrated previously.
Resumo:
People who suffer from traumatic brain injury (TBI) often experience cognitive deficits in spatial reference and working memory. The possible roles of cyclooxygenase-1 (COX-1) in learning and memory impairment in mice with TBI are far from well known. Adult mice subjected to TBI were treated with the COX-1 selective inhibitor SC560. Performance in the open field and on the beam walk was then used to assess motor and behavioral function 1, 3, 7, 14, and 21 days following injury. Acquisition of spatial learning and memory retention was assessed using the Morris water maze on day 15 post-TBI. The expressions of COX-1, prostaglandin E2 (PGE2), interleukin (IL)-6, brain-derived neurotrophic factor (BDNF), platelet-derived growth factor BB (PDGF-BB), synapsin-I, and synaptophysin were detected in TBI mice. Administration of SC560 improved performance of beam walk tasks as well as spatial learning and memory after TBI. SC560 also reduced expressions of inflammatory markers IL-6 and PGE2, and reversed the expressions of COX-1, BDNF, PDGF-BB, synapsin-I, and synaptophysin in TBI mice. The present findings demonstrated that COX-1 might play an important role in cognitive deficits after TBI and that selective COX-1 inhibition should be further investigated as a potential therapeutic approach for TBI.