72 resultados para Reinforcement Learning,resource-constrained devices,iOS devices,on-device machine learning
Resumo:
We present a model for plasticity induction in reinforcement learning which is based on a cascade of synaptic memory traces. In the cascade of these so called eligibility traces presynaptic input is first corre lated with postsynaptic events, next with the behavioral decisions and finally with the external reinforcement. A population of leaky integrate and fire neurons endowed with this plasticity scheme is studied by simulation on different tasks. For operant co nditioning with delayed reinforcement, learning succeeds even when the delay is so large that the delivered reward reflects the appropriateness, not of the immediately preceeding response, but of a decision made earlier on in the stimulus - decision sequence . So the proposed model does not rely on the temporal contiguity between decision and pertinent reward and thus provides a viable means of addressing the temporal credit assignment problem. In the same task, learning speeds up with increasing population si ze, showing that the plasticity cascade simultaneously addresses the spatial problem of assigning credit to the different population neurons. Simulations on other task such as sequential decision making serve to highlight the robustness of the proposed sch eme and, further, contrast its performance to that of temporal difference based approaches to reinforcement learning.
Resumo:
n learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.
Resumo:
Given the complex structure of the brain, how can synaptic plasticity explain the learning and forgetting of associations when these are continuously changing? We address this question by studying different reinforcement learning rules in a multilayer network in order to reproduce monkey behavior in a visuomotor association task. Our model can only reproduce the learning performance of the monkey if the synaptic modifications depend on the pre- and postsynaptic activity, and if the intrinsic level of stochasticity is low. This favored learning rule is based on reward modulated Hebbian synaptic plasticity and shows the interesting feature that the learning performance does not substantially degrade when adding layers to the network, even for a complex problem.
Resource-allocation capabilities of commercial project management software. An experimental analysis
Resumo:
When project managers determine schedules for resource-constrained projects, they commonly use commercial project management software packages. Which resource-allocation methods are implemented in these packages is proprietary information. The resource-allocation problem is in general computationally difficult to solve to optimality. Hence, the question arises if and how various project management software packages differ in quality with respect to their resource-allocation capabilities. None of the few existing papers on this subject uses a sizeable data set and recent versions of common software packages. We experimentally analyze the resource-allocation capabilities of Acos Plus.1, AdeptTracker Professional, CS Project Professional, Microsoft Office Project 2007, Primavera P6, Sciforma PS8, and Turbo Project Professional. Our analysis is based on 1560 instances of the precedence- and resource-constrained project scheduling problem RCPSP. The experiment shows that using the resource-allocation feature of these packages may lead to a project duration increase of almost 115% above the best known feasible schedule. The increase gets larger with increasing resource scarcity and with increasing number of activities. We investigate the impact of different complexity scenarios and priority rules on the project duration obtained by the software packages. We provide a decision table to support managers in selecting a software package and a priority rule.
Resumo:
Recent modeling of spike-timing-dependent plasticity indicates that plasticity involves as a third factor a local dendritic potential, besides pre- and postsynaptic firing times. We present a simple compartmental neuron model together with a non-Hebbian, biologically plausible learning rule for dendritic synapses where plasticity is modulated by these three factors. In functional terms, the rule seeks to minimize discrepancies between somatic firings and a local dendritic potential. Such prediction errors can arise in our model from stochastic fluctuations as well as from synaptic input, which directly targets the soma. Depending on the nature of this direct input, our plasticity rule subserves supervised or unsupervised learning. When a reward signal modulates the learning rate, reinforcement learning results. Hence a single plasticity rule supports diverse learning paradigms.
Resumo:
Various software packages for project management include a procedure for resource-constrained scheduling. In several packages, the user can influence this procedure by selecting a priority rule. However, the resource-allocation methods that are implemented in the procedures are proprietary information; therefore, the question of how the priority-rule selection impacts the performance of the procedures arises. We experimentally evaluate the resource-allocation methods of eight recent software packages using the 600 instances of the PSPLIB J120 test set. The results of our analysis indicate that applying the default rule tends to outperform a randomly selected rule, whereas applying two randomly selected rules tends to outperform the default rule. Applying a small set of more than two rules further improves the project durations considerably. However, a large number of rules must be applied to obtain the best possible project durations.
Resumo:
Disturbances in reward processing have been implicated in bulimia nervosa (BN). Abnormalities in processing reward-related stimuli might be linked to dysfunctions of the catecholaminergic neurotransmitter system, but findings have been inconclusive. A powerful way to investigate the relationship between catecholaminergic function and behavior is to examine behavioral changes in response to experimental catecholamine depletion (CD). The purpose of this study was to uncover putative catecholaminergic dysfunction in remitted subjects with BN who performed a reinforcement-learning task after CD. CD was achieved by oral alpha-methyl-para-tyrosine (AMPT) in 19 unmedicated female subjects with remitted BN (rBN) and 28 demographically matched healthy female controls (HC). Sham depletion administered identical capsules containing diphenhydramine. The study design consisted of a randomized, double-blind, placebo-controlled crossover, single-site experimental trial. The main outcome measures were reward learning in a probabilistic reward task analyzed using signal-detection theory. Secondary outcome measures included self-report assessments, including the Eating Disorder Examination-Questionnaire. Relative to healthy controls, rBN subjects were characterized by blunted reward learning in the AMPT-but not in placebo-condition. Highlighting the specificity of these findings, groups did not differ in their ability to perceptually distinguish between stimuli. Increased CD-induced anhedonic (but not eating disorder) symptoms were associated with a reduced response bias toward a more frequently rewarded stimulus. In conclusion, under CD, rBN subjects showed reduced reward learning compared with healthy control subjects. These deficits uncover disturbance of the central reward processing systems in rBN related to altered brain catecholamine levels, which might reflect a trait-like deficit increasing vulnerability to BN.
Resumo:
A novel adaptive approach for glucose control in individuals with type 1 diabetes under sensor-augmented pump therapy is proposed. The controller, is based on Actor-Critic (AC) learning and is inspired by the principles of reinforcement learning and optimal control theory. The main characteristics of the proposed controller are (i) simultaneous adjustment of both the insulin basal rate and the bolus dose, (ii) initialization based on clinical procedures, and (iii) real-time personalization. The effectiveness of the proposed algorithm in terms of glycemic control has been investigated in silico in adults, adolescents and children under open-loop and closed-loop approaches, using announced meals with uncertainties in the order of ±25% in the estimation of carbohydrates. The results show that glucose regulation is efficient in all three groups of patients, even with uncertainties in the level of carbohydrates in the meal. The percentages in the A+B zones of the Control Variability Grid Analysis (CVGA) were 100% for adults, and 93% for both adolescents and children. The AC based controller seems to be a promising approach for the automatic adjustment of insulin infusion in order to improve glycemic control. After optimization of the algorithm, the controller will be tested in a clinical trial.
Resumo:
Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.
Resumo:
OBJECTIVES In resource-constrained settings, tuberculosis (TB) is a common opportunistic infection and cause of death in HIV-infected persons. TB may be present at the start of antiretroviral therapy (ART), but it is often under-diagnosed. We describe approaches to TB diagnosis and screening of TB in ART programs in low- and middle-income countries. METHODS AND FINDINGS We surveyed ART programs treating HIV-infected adults in sub-Saharan Africa, Asia and Latin America in 2012 using online questionnaires to collect program-level and patient-level data. Forty-seven sites from 26 countries participated. Patient-level data were collected on 987 adult TB patients from 40 sites (median age 34.7 years; 54% female). Sputum smear microscopy and chest radiograph were available in 47 (100%) sites, TB culture in 44 (94%), and Xpert MTB/RIF in 23 (49%). Xpert MTB/RIF was rarely available in Central Africa and South America. In sites with access to these diagnostics, microscopy was used in 745 (76%) patients diagnosed with TB, culture in 220 (24%), and chest X-ray in 688 (70%) patients. When free of charge culture was done in 27% of patients, compared to 21% when there was a fee (p = 0.033). Corresponding percentages for Xpert MTB/RIF were 26% and 15% of patients (p = 0.001). Screening practices for active disease before starting ART included symptom screening (46 sites, 98%), chest X-ray (38, 81%), sputum microscopy (37, 79%), culture (16, 34%), and Xpert MTB/RIF (5, 11%). CONCLUSIONS Mycobacterial culture was infrequently used despite its availability at most sites, while Xpert MTB/RIF was not generally available. Use of available diagnostics was higher when offered free of charge.
Resumo:
Background: WHO's 2013 revisions to its Consolidated Guidelines on antiretroviral drugs recommend routine viral load monitoring, rather than clinical or immunological monitoring, as the preferred monitoring approach on the basis of clinical evidence. However, HIV programmes in resource-limited settings require guidance on the most cost-effective use of resources in view of other competing priorities such as expansion of antiretroviral therapy coverage. We assessed the cost-effectiveness of alternative patient monitoring strategies. Methods: We evaluated a range of monitoring strategies, including clinical, CD4 cell count, and viral load monitoring, alone and together, at different frequencies and with different criteria for switching to second-line therapies. We used three independently constructed and validated models simultaneously. We estimated costs on the basis of resource use projected in the models and associated unit costs; we quantified impact as disability-adjusted life years (DALYs) averted. We compared alternatives using incremental cost-effectiveness analysis. Findings: All models show that clinical monitoring delivers significant benefit compared with a hypothetical baseline scenario with no monitoring or switching. Regular CD4 cell count monitoring confers a benefit over clinical monitoring alone, at an incremental cost that makes it affordable in more settings than viral load monitoring, which is currently more expensive. Viral load monitoring without CD4 cell count every 6—12 months provides the greatest reductions in morbidity and mortality, but incurs a high cost per DALY averted, resulting in lost opportunities to generate health gains if implemented instead of increasing antiretroviral therapy coverage or expanding antiretroviral therapy eligibility. Interpretation: The priority for HIV programmes should be to expand antiretroviral therapy coverage, firstly at CD4 cell count lower than 350 cells per μL, and then at a CD4 cell count lower than 500 cells per μL, using lower-cost clinical or CD4 monitoring. At current costs, viral load monitoring should be considered only after high antiretroviral therapy coverage has been achieved. Point-of-care technologies and other factors reducing costs might make viral load monitoring more affordable in future. Funding: Bill & Melinda Gates Foundation, WHO.
Resumo:
Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.
Resumo:
Car manufacturers increasingly offer delivery programs for the factory pick-up of new cars. Such a program consists of a broad range of event-marketing activities. In this paper we investigate the problem of scheduling the delivery program activities of one day such that the sum of the customers’ waiting times is minimized. We show how to model this problem as a resource-constrained project scheduling problem with nonregular objective function, and we present a relaxation-based beam-search solution heuristic. The relaxations are solved by exploiting a duality relationship between temporal scheduling and min-cost network flow problems. This approach has been developed in cooperation with a German automaker. The performance of the heuristic has been evaluated based on practical and randomly generated test instances.
Resumo:
Dynamic systems, especially in real-life applications, are often determined by inter-/intra-variability, uncertainties and time-varying components. Physiological systems are probably the most representative example in which population variability, vital signal measurement noise and uncertain dynamics render their explicit representation and optimization a rather difficult task. Systems characterized by such challenges often require the use of adaptive algorithmic solutions able to perform an iterative structural and/or parametrical update process towards optimized behavior. Adaptive optimization presents the advantages of (i) individualization through learning of basic system characteristics, (ii) ability to follow time-varying dynamics and (iii) low computational cost. In this chapter, the use of online adaptive algorithms is investigated in two basic research areas related to diabetes management: (i) real-time glucose regulation and (ii) real-time prediction of hypo-/hyperglycemia. The applicability of these methods is illustrated through the design and development of an adaptive glucose control algorithm based on reinforcement learning and optimal control and an adaptive, personalized early-warning system for the recognition and alarm generation against hypo- and hyperglycemic events.
Resumo:
OBJECTIVE Approximately 85% of cervical cancer cases and deaths occur in resource-constrained countries where best practices for prevention, particularly for women with HIV infection, still need to be developed. The aim of this study was to assess cervical cancer prevention capacity in select HIV clinics located in resource-constrained countries. MATERIALS AND METHODS A cross-sectional survey of sub-Saharan African sites of 4 National Institutes of Health-funded HIV/AIDS networks was conducted. Sites were surveyed on the availability of cervical cancer screening and treatment among women with HIV infection and without HIV infection. Descriptive statistics and χ or Fisher exact test were used as appropriate. RESULTS Fifty-one (65%) of 78 sites responded. Access to cervical cancer screening was reported by 49 sites (96%). Of these sites, 39 (80%) performed screening on-site. Central African sites were less likely to have screening on-site (p = .02) versus other areas. Visual inspection with acetic acid and Pap testing were the most commonly available on-site screening methods at 31 (79%) and 26 (67%) sites, respectively. High-risk HPV testing was available at 29% of sites with visual inspection with acetic acid and 50% of sites with Pap testing. Cryotherapy and radical hysterectomy were the most commonly available on-site treatment methods for premalignant and malignant lesions at 29 (74%) and 18 (46%) sites, respectively. CONCLUSIONS Despite limited resources, most sites surveyed had the capacity to perform cervical cancer screening and treatment. The existing infrastructure of HIV clinical and research sites may provide the ideal framework for scale-up of cervical cancer prevention in resource-constrained countries with a high burden of cervical dysplasia.