14 resultados para reinforcement learning,cryptography,machine learning,deep learning,Deep Q-Learning (DQN),AES
em CentAUR: Central Archive University of Reading - UK
Resumo:
Researchers at the University of Reading have developed over many years some simple mobile robots that explore an environment they perceive through simple ultrasonic sensors. Information from these sensors has allowed the robots to learn the simple task of moving around while avoiding dynamic obstacles using a static set of fuzzy automata, the choice of which has been criticised, due to its arbitrary nature. This paper considers how a dynamic set of automata can overcome this criticism. In addition, a new reinforcement learning function is outlined which is both scalable to different numbers and types of sensors. The innovations compare successfully with earlier work.
Resumo:
We examined the maturation of decision-making from early adolescence to mid-adulthood using fMRI of a variant of the Iowa gambling task. We have previously shown that performance in this task relies on sensitivity to accumulating negative outcomes in ventromedial PFC and dorsolateral PFC. Here, we further formalize outcome evaluation (as driven by prediction errors [PE], using a reinforcement learning model) and examine its development. Task performance improved significantly during adolescence, stabilizing in adulthood. Performance relied on greater impact of negative compared with positive PEs, the relative impact of which matured from adolescence into adulthood. Adolescents also showed increased exploratory behavior, expressed as a propensity to shift responding between options independently of outcome quality, whereas adults showed no systematic shifting patterns. The correlation between PE representation and improved performance strengthened with age for activation in ventral and dorsal PFC, ventral striatum, and temporal and parietal cortices. There was a medial-lateral distinction in the prefrontal substrates of effective PE utilization between adults and adolescents: Increased utilization of negative PEs, a hallmark of successful performance in the task, was associated with increased activation in ventromedial PFC in adults, but decreased activation in ventrolateral PFC and striatum in adolescents. These results suggest that adults and adolescents engage qualitatively distinct neural and psychological processes during decision-making, the development of which is not exclusively dependent on reward-processing maturation.
Resumo:
Contrary to the widespread belief that people are positively motivated by reward incentives, some studies have shown that performance-based extrinsic reward can actually undermine a person's intrinsic motivation to engage in a task. This “undermining effect” has timely practical implications, given the burgeoning of performance-based incentive systems in contemporary society. It also presents a theoretical challenge for economic and reinforcement learning theories, which tend to assume that monetary incentives monotonically increase motivation. Despite the practical and theoretical importance of this provocative phenomenon, however, little is known about its neural basis. Herein we induced the behavioral undermining effect using a newly developed task, and we tracked its neural correlates using functional MRI. Our results show that performance-based monetary reward indeed undermines intrinsic motivation, as assessed by the number of voluntary engagements in the task. We found that activity in the anterior striatum and the prefrontal areas decreased along with this behavioral undermining effect. These findings suggest that the corticobasal ganglia valuation system underlies the undermining effect through the integration of extrinsic reward value and intrinsic task value.
Resumo:
Perirhinal cortex in monkeys has been thought to be involved in visual associative learning. The authors examined rats' ability to make associations between visual stimuli in a visual secondary reinforcement task. Rats learned 2-choice visual discriminations for secondary visual reinforcement. They showed significant learning of discriminations before any primary reinforcement. Following bilateral perirhinal cortex lesions, rats continued to learn visual discriminations for visual secondary reinforcement at the same rate as before surgery. Thus, this study does not support a critical role of perirhinal cortex in learning for visual secondary reinforcement. Contrasting this result with other positive results, the authors suggest that the role of perirhinal cortex is in "within-object" associations and that it plays a much lesser role in stimulus-stimulus associations between objects.
Resumo:
Current e-learning systems are increasing their importance in higher education. However, the state of the art of e-learning applications, besides the state of the practice, does not achieve the level of interactivity that current learning theories advocate. In this paper, the possibility of enhancing e-learning systems to achieve deep learning has been studied by replicating an experiment in which students had to learn basic software engineering principles. One group learned these principles using a static approach, while the other group learned the same principles using a system-dynamics-based approach, which provided interactivity and feedback. The results show that, quantitatively, the latter group achieved a better understanding of the principles; furthermore, qualitatively, they enjoyed the learning experience
Resumo:
Foundation construction process has been an important key point in a successful construction engineering. The frequency of using diaphragm wall construction method among many deep excavation construction methods in Taiwan is the highest in the world. The traditional view of managing diaphragm wall unit in the sequencing of construction activities is to establish each phase of the sequencing of construction activities by heuristics. However, it conflicts final phase of engineering construction with unit construction and effects planning construction time. In order to avoid this kind of situation, we use management of science in the study of diaphragm wall unit construction to formulate multi-objective combinational optimization problem. Because the characteristic (belong to NP-Complete problem) of problem mathematic model is multi-objective and combining explosive, it is advised that using the 2-type Self-Learning Neural Network (SLNN) to solve the N=12, 24, 36 of diaphragm wall unit in the sequencing of construction activities program problem. In order to compare the liability of the results, this study will use random researching method in comparison with the SLNN. It is found that the testing result of SLNN is superior to random researching method in whether solution-quality or Solving-efficiency.
Resumo:
Background. In separate studies and research from different perspectives, five factors are found to be among those related to higher quality outcomes of student learning (academic achievement). Those factors are higher self-efficacy, deeper approaches to learning, higher quality teaching, students’ perceptions that their workload is appropriate, and greater learning motivation. University learning improvement strategies have been built on these research results. Aim. To investigate how students’ evoked prior experience, perceptions of their learning environment, and their approaches to learning collectively contribute to academic achievement. This is the first study to investigate motivation and self-efficacy in the same educational context as conceptions of learning, approaches to learning and perceptions of the learning environment. Sample. Undergraduate students (773) from the full range of disciplines were part of a group of over 2,300 students who volunteered to complete a survey of their learning experience. On completing their degrees 6 and 18 months later, their academic achievement was matched with their learning experience survey data. Method. A 77-item questionnaire was used to gather students’ self-report of their evoked prior experience (self-efficacy, learning motivation, and conceptions of learning), perceptions of learning context (teaching quality and appropriate workload), and approaches to learning (deep and surface). Academic achievement was measured using the English honours degree classification system. Analyses were conducted using correlational and multi-variable (structural equation modelling) methods. Results. The results from the correlation methods confirmed those found in numerous earlier studies. The results from the multi-variable analyses indicated that surface approach to learning was the strongest predictor of academic achievement, with self-efficacy and motivation also found to be directly related. In contrast to the correlation results, a deep approach to learning was not related to academic achievement, and teaching quality and conceptions of learning were only indirectly related to achievement. Conclusions. Research aimed at understanding how students experience their learning environment and how that experience relates to the quality of their learning needs to be conducted using a wider range of variables and more sophisticated analytical methods. In this study of one context, some of the relations found in earlier bivariate studies, and on which learning intervention strategies have been built, are not confirmed when more holistic teaching–learning contexts are analysed using multi-variable methods.
Resumo:
This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets.
Resumo:
We extend extreme learning machine (ELM) classifiers to complex Reproducing Kernel Hilbert Spaces (RKHS) where the input/output variables as well as the optimization variables are complex-valued. A new family of classifiers, called complex-valued ELM (CELM) suitable for complex-valued multiple-input–multiple-output processing is introduced. In the proposed method, the associated Lagrangian is computed using induced RKHS kernels, adopting a Wirtinger calculus approach formulated as a constrained optimization problem similarly to the conventional ELM classifier formulation. When training the CELM, the Karush–Khun–Tuker (KKT) theorem is used to solve the dual optimization problem that consists of satisfying simultaneously smallest training error as well as smallest norm of output weights criteria. The proposed formulation also addresses aspects of quaternary classification within a Clifford algebra context. For 2D complex-valued inputs, user-defined complex-coupled hyper-planes divide the classifier input space into four partitions. For 3D complex-valued inputs, the formulation generates three pairs of complex-coupled hyper-planes through orthogonal projections. The six hyper-planes then divide the 3D space into eight partitions. It is shown that the CELM problem formulation is equivalent to solving six real-valued ELM tasks, which are induced by projecting the chosen complex kernel across the different user-defined coordinate planes. A classification example of powdered samples on the basis of their terahertz spectral signatures is used to demonstrate the advantages of the CELM classifiers compared to their SVM counterparts. The proposed classifiers retain the advantages of their ELM counterparts, in that they can perform multiclass classification with lower computational complexity than SVM classifiers. Furthermore, because of their ability to perform classification tasks fast, the proposed formulations are of interest to real-time applications.
Resumo:
This paper presents an enhanced hypothesis verification strategy for 3D object recognition. A new learning methodology is presented which integrates the traditional dichotomic object-centred and appearance-based representations in computer vision giving improved hypothesis verification under iconic matching. The "appearance" of a 3D object is learnt using an eigenspace representation obtained as it is tracked through a scene. The feature representation implicitly models the background and the objects observed enabling the segmentation of the objects from the background. The method is shown to enhance model-based tracking, particularly in the presence of clutter and occlusion, and to provide a basis for identification. The unified approach is discussed in the context of the traffic surveillance domain. The approach is demonstrated on real-world image sequences and compared to previous (edge-based) iconic evaluation techniques.
Resumo:
The authors consider the problem of a robot manipulator operating in a noisy workspace. The manipulator is required to move from an initial position P(i) to a final position P(f). P(i) is assumed to be completely defined. However, P(f) is obtained by a sensing operation and is assumed to be fixed but unknown. The authors approach to this problem involves the use of three learning algorithms, the discretized linear reward-penalty (DLR-P) automaton, the linear reward-penalty (LR-P) automaton and a nonlinear reinforcement scheme. An automaton is placed at each joint of the robot and by acting as a decision maker, plans the trajectory based on noisy measurements of P(f).
Resumo:
The ability to change an established stimulus–behavior association based on feedback is critical for adaptive social behaviors. This ability has been examined in reversal learning tasks, where participants first learn a stimulus–response association (e.g., select a particular object to get a reward) and then need to alter their response when reinforcement contingencies change. Although substantial evidence demonstrates that the OFC is a critical region for reversal learning, previous studies have not distinguished reversal learning for emotional associations from neutral associations. The current study examined whether OFC plays similar roles in emotional versus neutral reversal learning. The OFC showed greater activity during reversals of stimulus–outcome associations for negative outcomes than for neutral outcomes. Similar OFC activity was also observed during reversals involving positive outcomes. Furthermore, OFC activity is more inversely correlated with amygdala activity during negative reversals than during neutral reversals. Overall, our results indicate that the OFC is more activated by emotional than neutral reversal learning and that OFC's interactions with the amygdala are greater for negative than neutral reversal learning.