957 resultados para Expected learning
Resumo:
This paper investigates a method of automatic pronunciation scoring for use in computer-assisted language learning (CALL) systems. The method utilizes a likelihood-based `Goodness of Pronunciation' (GOP) measure which is extended to include individual thresholds for each phone based on both averaged native confidence scores and on rejection statistics provided by human judges. Further improvements are obtained by incorporating models of the subject's native language and by augmenting the recognition networks to include expected pronunciation errors. The various GOP measures are assessed using a specially recorded database of non-native speakers which has been annotated to mark phone-level pronunciation errors. Since pronunciation assessment is highly subjective, a set of four performance measures has been designed, each of them measuring different aspects of how well computer-derived phone-level scores agree with human scores. These performance measures are used to cross-validate the reference annotations and to assess the basic GOP algorithm and its refinements. The experimental results suggest that a likelihood-based pronunciation scoring metric can achieve usable performance, especially after applying the various enhancements.
Resumo:
This article presents a novel algorithm for learning parameters in statistical dialogue systems which are modeled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy that selects the system's responses based on the inferred state; and a reward function that specifies the desired behavior of the system. Ideally both the model parameters and the policy would be designed to maximize the cumulative reward. However, while there are many techniques available for learning the optimal policy, no good ways of learning the optimal model parameters that scale to real-world dialogue systems have been found yet. The presented algorithm, called the Natural Actor and Belief Critic (NABC), is a policy gradient method that offers a solution to this problem. Based on observed rewards, the algorithm estimates the natural gradient of the expected cumulative reward. The resulting gradient is then used to adapt both the prior distribution of the dialogue model parameters and the policy parameters. In addition, the article presents a variant of the NABC algorithm, called the Natural Belief Critic (NBC), which assumes that the policy is fixed and only the model parameters need to be estimated. The algorithms are evaluated on a spoken dialogue system in the tourist information domain. The experiments show that model parameters estimated to maximize the expected cumulative reward result in significantly improved performance compared to the baseline hand-crafted model parameters. The algorithms are also compared to optimization techniques using plain gradients and state-of-the-art random search algorithms. In all cases, the algorithms based on the natural gradient work significantly better. © 2011 ACM.
Resumo:
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.
Resumo:
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
Resumo:
Computer simulation experiments were performed to examine the effectiveness of OR- and comparative-reinforcement learning algorithms. In the simulation, human rewards were given as +1 and -1. Two models of human instruction that determine which reward is to be given in every step of a human instruction were used. Results show that human instruction may have a possibility of including both model-A and model-B characteristics, and it can be expected that the comparative-reinforcement learning algorithm is more effective for learning by human instructions.
Resumo:
We present a unifying framework in which "object-independent" modes of variation are learned from continuous-time data such as video sequences. These modes of variation can be used as "generators" to produce a manifold of images of a new object from a single example of that object. We develop the framework in the context of a well-known example: analyzing the modes of spatial deformations of a scene under camera movement. Our method learns a close approximation to the standard affine deformations that are expected from the geometry of the situation, and does so in a completely unsupervised (i.e. ignorant of the geometry of the situation) fashion. We stress that it is learning a "parameterization", not just the parameter values, of the data. We then demonstrate how we have used the same framework to derive a novel data-driven model of joint color change in images due to common lighting variations. The model is superior to previous models of color change in describing non-linear color changes due to lighting.
Resumo:
This thesis examines the problem of an autonomous agent learning a causal world model of its environment. Previous approaches to learning causal world models have concentrated on environments that are too "easy" (deterministic finite state machines) or too "hard" (containing much hidden state). We describe a new domain --- environments with manifest causal structure --- for learning. In such environments the agent has an abundance of perceptions of its environment. Specifically, it perceives almost all the relevant information it needs to understand the environment. Many environments of interest have manifest causal structure and we show that an agent can learn the manifest aspects of these environments quickly using straightforward learning techniques. We present a new algorithm to learn a rule-based causal world model from observations in the environment. The learning algorithm includes (1) a low level rule-learning algorithm that converges on a good set of specific rules, (2) a concept learning algorithm that learns concepts by finding completely correlated perceptions, and (3) an algorithm that learns general rules. In addition this thesis examines the problem of finding a good expert from a sequence of experts. Each expert has an "error rate"; we wish to find an expert with a low error rate. However, each expert's error rate and the distribution of error rates are unknown. A new expert-finding algorithm is presented and an upper bound on the expected error rate of the expert is derived.
Resumo:
Recent electrophysical data inspired the claim that dopaminergic neurons adapt their mismatch sensitivities to reflect variances of expected rewards. This contradicts reward prediction error theory and most basal ganglia models. Application of learning principles points to a testable alternative interpretation-of the same data-that is compatible with existing theory.
Resumo:
The percentage of subjects recalling each unit in a list or prose passage is considered as a dependent measure. When the same units are recalled in different tasks, processing is assumed to be the same; when different units are recalled, processing is assumed to be different. Two collections of memory tasks are presented, one for lists and one for prose. The relations found in these two collections are supported by an extensive reanalysis of the existing prose memory literature. The same set of words were learned by 13 different groups of subjects under 13 different conditions. Included were intentional free-recall tasks, incidental free recall following lexical decision, and incidental free recall following ratings of orthographic distinctiveness and emotionality. Although the nine free-recall tasks varied widely with regard to the amount of recall, the relative probability of recall for the words was very similar among the tasks. Imagery encoding and recognition produced relative probabilities of recall that were different from each other and from the free-recall tasks. Similar results were obtained with a prose passage. A story was learned by 13 different groups of subjects under 13 different conditions. Eight free-recall tasks, which varied with respect to incidental or intentional learning, retention interval, and the age of the subjects, produced similar relative probabilities of recall, whereas recognition and prompted recall produced relative probabilities of recall that were different from each other and from the free-recall tasks. A review of the prose literature was undertaken to test the generality of these results. Analysis of variance is the most common statistical procedure in this literature. If the relative probability of recall of units varied across conditions, a units by condition interaction would be expected. For the 12 studies that manipulated retention interval, an average of 21% of the variance was accounted for by the main effect of retention interval, 17% by the main effect of units, and only 2% by the retention interval by units interaction. Similarly, for the 12 studies that varied the age of the subjects, 6% of the variance was accounted for by the main effect of age, 32% by the main effect of units, and only 1% by the interaction of age by units.(ABSTRACT TRUNCATED AT 400 WORDS)
Resumo:
This paper tells the story of how a set of university lectures developed during the last six years. The idea is to show how (1) content, (2) communication and (3) assessment have evolved in steps which are named “generations of web learning”. The reader is offered a stepwise description of both didactic foundations of university lectures and practical implementation on a widely available web platform. The relative weight of directive elements has gradually decreased through the “three generations”, whereas characteristics of self-responsibility and self-guided learning have gained in importance. -Content was in early times presented and expected to be learned but in later phases expected to be constructed for examples of case studies. -Communication meant in early phases to deliver assignments to the lecturer but later on to form teams, exchange standpoints and review mutually. -Assessment initially consisted in marks invented and added up by the lecturer but was later enriched by peer review, mutual grading and voting procedures. How much “added value” can the web provide for teaching, training and learning? Six years of experience suggest: mainly insofar as new (collaborative and selfdirected) didactic scenarios are implemented! (DIPF/Orig.)
Resumo:
Few research studies examine the prevalence or mental health needs of people with a Learning Disability (LD) detained in police custody. This paper describes the population of detainees with an LD who presented to an inner city inter-agency police liaison service during a three-year period. Two forensically trained Community Mental Health Nurses (CMHNs) screened all custody record forms (n=9014) for evidence of a mental health problem or LD. The CMHNs interviewed positively screened detainees (n=1089) using a battery of measures designed to assess mental health status, risk-related behaviour and alcohol or drug abuse. Almost one-in-ten of those interviewed (95/1089) were judged to have a possible or definite LD. Fifty-two per cent were cases on the General Health Questionnaire (GHQ) whilst 61% attained 'above threshold' Brief Psychiatric Rating Scale (BPRS) scores. The majority (63%) had a history of causing harm to others while 56 per cent had a history of self-harm. More than half (56%) regularly consumed harmful levels of alcohol while one-in-four (27%) reported abusing drugs. Higher than expected numbers of detainees have a learning disability and most have complex mental health needs. A police liaison service offers a way of identifying people with LD and connecting them with appropriate health and social care agencies.
Resumo:
The effect of additivity pretraining on blocking has been taken as evidence for a reasoning account of human and animal causal learning. If inferential reasoning underpins this effect, then developmental differences in the magnitude of this effect in children would be expected. Experiment 1 examined cue competition effects in children's (4- to 5-year-olds and 6- to 7-year-olds) causal learning using a new paradigm analogous to the food allergy task used in studies of human adult causal learning. Blocking was stronger in the older than the younger children, and additivity pretraining only affected blocking in the older group. Unovershadowing was not affected by age or by pretraining. In experiment 2, levels of blocking were found to be correlated with the ability to answer questions that required children to reason about additivity. Our results support an inferential reasoning explanation of cue competition effects. (c) 2012 APA, all rights reserved.
Resumo:
A questionnaire was developed to investigate pharmacists' attitudes to distance learning (DL) as a vehicle for continuing education (CE). It was included in each of a two part DL course on Health Screening. Part One was mailed to all community pharmacists in England (16,400) and returns were received from 1487. The questionnaire in Part Two was returned by 436 pharmacists. Attitude statements were scored using a five-point Likert scale. The mean response to all attitude statements was positive. Participants were significantly more satisfied than non-participants with DL in general and the DL course studied (P less than or equal to 0.05). Over 80 percent of respondents completing the course found DL to be enjoyable and more suitable than other CE methods. More females and less males than expected (based on registration statistics) requested (P less than or equal to 0.001) and completed the course (P less than or equal to 0.001). Pharmacists of all ages participated, although those recently qualified showed greater interest.
Resumo:
This article examines the relationship between the learning organisation and the implementation of curriculum innovation within schools. It also compares the extent of innovative activity undertaken by schools in the public and the private sectors. A learning organisation is characterised by long-term goals, participatory decision-making processes, collaboration with external stakeholders, effective mechanisms for the internal communication of knowledge and information, and the use of rewards for its members. These characteristics are expected to promote curriculum innovation, once a number of control factors have been taken into account. The article reports on a study carried out in 197 Greek public and private primary schools in the 1999-2000 school year. Structured interviews with school principals were used as a method of data collection. According to the statistical results, the most important determinants of the innovative activity of a school are the extent of its collaboration with other organisations (i.e. openness to society), and the implementation of development programmes for teachers and parents (i.e. communication of knowledge and information). Contrary to expectations, the existence of long-term goals, the extent of shared decision-making, and the use of teacher rewards had no impact on curriculum innovation. The study also suggests that the private sector, as such, has an additional positive effect on the implementation of curriculum innovation, once a number of human, financial, material, and management resources have been controlled for. The study concludes by making recommendations for future research that would shed more light on unexpected outcomes and would help explore the causal link between variables in the research model.
Resumo:
In this paper a multiple classifier machine learning methodology for Predictive Maintenance (PdM) is presented. PdM is a prominent strategy for dealing with maintenance issues given the increasing need to minimize downtime and associated costs. One of the challenges with PdM is generating so called ’health factors’ or quantitative indicators of the status of a system associated with a given maintenance issue, and determining their relationship to operating costs and failure risk. The proposed PdM methodology allows dynamical decision rules to be adopted for maintenance management and can be used with high-dimensional and censored data problems. This is achieved by training multiple classification modules with different prediction horizons to provide different performance trade-offs in terms of frequency of unexpected breaks and unexploited lifetime and then employing this information in an operating cost based maintenance decision system to minimise expected costs. The effectiveness of the methodology is demonstrated using a simulated example and a benchmark semiconductor manufacturing maintenance problem.