53 resultados para science learning


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We develop a simulation based algorithm for finite horizon Markov decision processes with finite state and finite action space. Illustrative numerical experiments with the proposed algorithm are shown for problems in flow control of communication networks and capacity switching in semiconductor fabrication.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we use reinforcement learning (RL) as a tool to study price dynamics in an electronic retail market consisting of two competing sellers, and price sensitive and lead time sensitive customers. Sellers, offering identical products, compete on price to satisfy stochastically arriving demands (customers), and follow standard inventory control and replenishment policies to manage their inventories. In such a generalized setting, RL techniques have not previously been applied. We consider two representative cases: 1) no information case, were none of the sellers has any information about customer queue levels, inventory levels, or prices at the competitors; and 2) partial information case, where every seller has information about the customer queue levels and inventory levels of the competitors. Sellers employ automated pricing agents, or pricebots, which use RL-based pricing algorithms to reset the prices at random intervals based on factors such as number of back orders, inventory levels, and replenishment lead times, with the objective of maximizing discounted cumulative profit. In the no information case, we show that a seller who uses Q-learning outperforms a seller who uses derivative following (DF). In the partial information case, we model the problem as a Markovian game and use actor-critic based RL to learn dynamic prices. We believe our approach to solving these problems is a new and promising way of setting dynamic prices in multiseller environments with stochastic demands, price sensitive customers, and inventory replenishments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present a novel macroblock mode decision algorithm to speedup H.264/SVC Intra frame encoding. We replace the complex mode-decision calculations by a classifier which has been trained specifically to minimize the reduction in RD performance. This results in a significant speedup in encoding. The results show that machine learning has a great potential and can reduce the complexity substantially with negligible impact on quality. The results show that the proposed method reduces encoding time to about 70% in base layer and up to 50% in enhancement layer of the reference implementation with a negligible loss in quality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Indian logic has a long history. It somewhat covers the domains of two of the six schools (darsanas) of Indian philosophy, namely, Nyaya and Vaisesika. The generally accepted definition of Indian logic over the ages is the science which ascertains valid knowledge either by means of six senses or by means of the five members of the syllogism. In other words, perception and inference constitute the subject matter of logic. The science of logic evolved in India through three ages: the ancient, the medieval and the modern, spanning almost thirty centuries. Advances in Computer Science, in particular, in Artificial Intelligence have got researchers in these areas interested in the basic problems of language, logic and cognition in the past three decades. In the 1980s, Artificial Intelligence has evolved into knowledge-based and intelligent system design, and the knowledge base and inference engine have become standard subsystems of an intelligent system. One of the important issues in the design of such systems is knowledge acquisition from humans who are experts in a branch of learning (such as medicine or law) and transferring that knowledge to a computing system. The second important issue in such systems is the validation of the knowledge base of the system i.e. ensuring that the knowledge is complete and consistent. It is in this context that comparative study of Indian logic with recent theories of logic, language and knowledge engineering will help the computer scientist understand the deeper implications of the terms and concepts he is currently using and attempting to develop.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper gives a compact, self-contained tutorial survey of reinforcement learning, a tool that is increasingly finding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This paper surveys the literature and presents the algorithms in a cohesive framework.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper(1) presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l(1) norm regularization for promoting sparsity within RKHS norms of each group and l(s), s >= 2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels-hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efficient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O (m(2)n(tot) log n(max)/epsilon(2)) where m is no. training data points, n(max), n(tot) are the maximum no. kernels in any group, total no. kernels respectively and epsilon is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efficiency.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper analyses the behaviour of a general class of learning automata algorithms for feedforward connectionist systems in an associative reinforcement learning environment. The type of connectionist system considered is also fairly general. The associative reinforcement learning task is first posed as a constrained maximization problem. The algorithm is approximated hy an ordinary differential equation using weak convergence techniques. The equilibrium points of the ordinary differential equation are then compared with the solutions to the constrained maximization problem to show that the algorithm does behave as desired.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose, for the first time, a reinforcement learning (RL) algorithm with function approximation for traffic signal control. Our algorithm incorporates state-action features and is easily implementable in high-dimensional settings. Prior work, e. g., the work of Abdulhai et al., on the application of RL to traffic signal control requires full-state representations and cannot be implemented, even in moderate-sized road networks, because the computational complexity exponentially grows in the numbers of lanes and junctions. We tackle this problem of the curse of dimensionality by effectively using feature-based state representations that use a broad characterization of the level of congestion as low, medium, or high. One advantage of our algorithm is that, unlike prior work based on RL, it does not require precise information on queue lengths and elapsed times at each lane but instead works with the aforementioned described features. The number of features that our algorithm requires is linear to the number of signaled lanes, thereby leading to several orders of magnitude reduction in the computational complexity. We perform implementations of our algorithm on various settings and show performance comparisons with other algorithms in the literature, including the works of Abdulhai et al. and Cools et al., as well as the fixed-timing and the longest queue algorithms. For comparison, we also develop an RL algorithm that uses full-state representation and incorporates prioritization of traffic, unlike the work of Abdulhai et al. We observe that our algorithm outperforms all the other algorithms on all the road network settings that we consider.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper formulates the automatic generation control (AGC) problem as a stochastic multistage decision problem. A strategy for solving this new AGC problem formulation is presented by using a reinforcement learning (RL) approach This method of obtaining an AGC controller does not depend on any knowledge of the system model and more importantly it admits considerable flexibility in defining the control objective. Two specific RL based AGC algorithms are presented. The first algorithm uses the traditional control objective of limiting area control error (ACE) excursions, where as, in the second algorithm, the controller can restore the load-generation balance by only monitoring deviation in tie line flows and system frequency and it does not need to know or estimate the composite ACE signal as is done by all current approaches. The effectiveness and versatility of the approaches has been demonstrated using a two area AGC model. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we consider the problem of learning an n × n kernel matrix from m(1) similarity matrices under general convex loss. Past research have extensively studied the m = 1 case and have derived several algorithms which require sophisticated techniques like ACCP, SOCP, etc. The existing algorithms do not apply if one uses arbitrary losses and often can not handle m > 1 case. We present several provably convergent iterative algorithms, where each iteration requires either an SVM or a Multiple Kernel Learning (MKL) solver for m > 1 case. One of the major contributions of the paper is to extend the well knownMirror Descent(MD) framework to handle Cartesian product of psd matrices. This novel extension leads to an algorithm, called EMKL, which solves the problem in O(m2 log n 2) iterations; in each iteration one solves an MKL involving m kernels and m eigen-decomposition of n × n matrices. By suitably defining a restriction on the objective function, a faster version of EMKL is proposed, called REKL,which avoids the eigen-decomposition. An alternative to both EMKL and REKL is also suggested which requires only an SVMsolver. Experimental results on real world protein data set involving several similarity matrices illustrate the efficacy of the proposed algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP(mean average precision). We propose new, almost-lineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain)in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization.The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a randomized algorithm for large scale SVM learning which solves the problem by iterating over random subsets of the data. Crucial to the algorithm for scalability is the size of the subsets chosen. In the context of text classification we show that, by using ideas from random projections, a sample size of O(log n) can be used to obtain a solution which is close to the optimal with a high probability. Experiments done on synthetic and real life data sets demonstrate that the algorithm scales up SVM learners, without loss in accuracy. 1

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose two variants of the Q-learning algorithm that (both) use two timescales. One of these updates Q-values of all feasible state-action pairs at each instant while the other updates Q-values of states with actions chosen according to the ‘current ’ randomized policy updates. A sketch of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms for routing on different network topologies are presented and performance comparisons with the regular Q-learning algorithm are shown.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we consider the single-machine scheduling problem with past-sequence-dependent (p-s-d) setup times and a learning effect. The setup times are proportional to the length of jobs that are already scheduled; i.e. p-s-d setup times. The learning effect reduces the actual processing time of a job because the workers are involved in doing the same job or activity repeatedly. Hence, the processing time of a job depends on its position in the sequence. In this study, we consider the total absolute difference in completion times (TADC) as the objective function. This problem is denoted as 1/LE, (Spsd)/TADC in Kuo and Yang (2007) ('Single Machine Scheduling with Past-sequence-dependent Setup Times and Learning Effects', Information Processing Letters, 102, 22-26). There are two parameters a and b denoting constant learning index and normalising index, respectively. A parametric analysis of b on the 1/LE, (Spsd)/TADC problem for a given value of a is applied in this study. In addition, a computational algorithm is also developed to obtain the number of optimal sequences and the range of b in which each of the sequences is optimal, for a given value of a. We derive two bounds b* for the normalising constant b and a* for the learning index a. We also show that, when a < a* or b > b*, the optimal sequence is obtained by arranging the longest job in the first position and the rest of the jobs in short processing time order.