78 resultados para time dependant cost function


Relevância:

40.00% 40.00%

Publicador:

Resumo:

We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An optimal control law for a general nonlinear system can be obtained by solving Hamilton-Jacobi-Bellman equation. However, it is difficult to obtain an analytical solution of this equation even for a moderately complex system. In this paper, we propose a continuoustime single network adaptive critic scheme for nonlinear control affine systems where the optimal cost-to-go function is approximated using a parametric positive semi-definite function. Unlike earlier approaches, a continuous-time weight update law is derived from the HJB equation. The stability of the system is analysed during the evolution of weights using Lyapunov theory. The effectiveness of the scheme is demonstrated through simulation examples.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The pressure dependences of Cl-35 nuclear quadrupole resonance (NQR) frequency, temperature and pressure variation of spin lattice relaxation time (T-1) were investigated in 3,4-dichlorophenol. T-1 was measured in the temperature range 77-300 K. Furthermore, the NQR frequency and T-1 for these compounds were measured as a function of pressure up to 5 kbar at 300 K. The temperature dependence of the average torsional lifetimes of the molecules and the transition probabilities W-1 and W-2 for the Delta m = +/- 1 and Delta m = +/- 2 transitions were also obtained. A nonlinear variation of NQR frequency with pressure has been observed and the pressure coefficients were observed to be positive. A thermodynamic analysis of the data was carried out to determine the constant volume temperature coefficients of the NQR frequency. An attempt is made to compare the torsional frequencies evaluated from NQR data with those obtained by IR spectra. On selecting the appropriate mode from IR spectra, a good agreement with torsional frequency obtained from NQR data is observed. The previously mentioned approach is a good illustration of the supplementary nature of the data from IR studies, in relation to NQR studies of compounds in solid state.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An exciting application of crowdsourcing is to use social networks in complex task execution. In this paper, we address the problem of a planner who needs to incentivize agents within a network in order to seek their help in executing an atomic task as well as in recruiting other agents to execute the task. We study this mechanism design problem under two natural resource optimization settings: (1) cost critical tasks, where the planner's goal is to minimize the total cost, and (2) time critical tasks, where the goal is to minimize the total time elapsed before the task is executed. We identify a set of desirable properties that should ideally be satisfied by a crowdsourcing mechanism. In particular, sybil-proofness and collapse-proofness are two complementary properties in our desiderata. We prove that no mechanism can satisfy all the desirable properties simultaneously. This leads us naturally to explore approximate versions of the critical properties. We focus our attention on approximate sybil-proofness and our exploration leads to a parametrized family of payment mechanisms which satisfy collapse-proofness. We characterize the approximate versions of the desirable properties in cost critical and time critical domain.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We study risk-sensitive control of continuous time Markov chains taking values in discrete state space. We study both finite and infinite horizon problems. In the finite horizon problem we characterize the value function via Hamilton Jacobi Bellman equation and obtain an optimal Markov control. We do the same for infinite horizon discounted cost case. In the infinite horizon average cost case we establish the existence of an optimal stationary control under certain Lyapunov condition. We also develop a policy iteration algorithm for finding an optimal control.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A self-consistent mode coupling theory (MCT) with microscopic inputs of equilibrium pair correlation functions is developed to analyze electrolyte dynamics. We apply the theory to calculate concentration dependence of (i) time dependent ion diffusion, (ii) intermediate scattering function of the constituent ions, and (iii) ion solvation dynamics in electrolyte solution. Brownian dynamics with implicit water molecules and molecular dynamics method with explicit water are used to check the theoretical predictions. The time dependence of ionic self-diffusion coefficient and the corresponding intermediate scattering function evaluated from our MCT approach show quantitative agreement with early experimental and present Brownian dynamic simulation results. With increasing concentration, the dispersion of electrolyte friction is found to occur at increasingly higher frequency, due to the faster relaxation of the ion atmosphere. The wave number dependence of intermediate scattering function, F(k, t), exhibits markedly different relaxation dynamics at different length scales. At small wave numbers, we find the emergence of a step-like relaxation, indicating the presence of both fast and slow time scales in the system. Such behavior allows an intriguing analogy with temperature dependent relaxation dynamics of supercooled liquids. We find that solvation dynamics of a tagged ion exhibits a power law decay at long times-the decay can also be fitted to a stretched exponential form. The emergence of the power law in solvation dynamics has been tested by carrying out long Brownian dynamics simulations with varying ionic concentrations. The solvation time correlation and ion-ion intermediate scattering function indeed exhibit highly interesting, non-trivial dynamical behavior at intermediate to longer times that require further experimental and theoretical studies. (c) 2015 AIP Publishing LLC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quinones and their radical ion intermediates have been much studied by vibrational spectroscopy to understand their structure-function relationships in various biological processes. In this paper, we present a comprehensive analysis of vibrational spectra in the structure-sensitive region of both the naphthoquinone (NQ) and 2-methyl-1,4-naphthoquinone (MQ, menaquinone) radical anions using time-resolved resonance Raman and ab initio studies. Specific vibrational mode assignments have been made to all the vibrational frequencies recorded in the experiment. It is observed that the carbonyl and C-C stretching frequencies show considerable coupling in NQ and MQ radical anions. Further, the asymmetric substitution present in MQ with respect to NQ shows important signatures in the radical anion spectrum. It is concluded that assignments of vibrational frequencies of asymmetrically substituted quinones must take into consideration the influence of asymmetry on structure and reactivity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although LH is essential for survival and function of the corpus luteum (CL) in higher primates, luteolysis occurs during nonfertile cycles without a discernible decrease in circulating LH levels. Using genome-wide expression analysis, several experiments were performed to examine the processes of luteolysis and rescue of luteal function in monkeys. Induced luteolysis with GnRH receptor antagonist (Cetrorelix) resulted in differential regulation of 3949 genes, whereas replacement with exogenous LH (Cetrorelix plus LH) led to regulation of 4434 genes (1563 down-regulation and 2871 up-regulation). A model system for prostaglandin (PG) F-2 alpha-induced luteolysis in the monkey was standardized and demonstrated that PGF(2 alpha) regulated expression of 2290 genes in the CL. Analysis of the LH-regulated luteal transcriptome revealed that 120 genes were regulated in an antagonistic fashion by PGF(2 alpha). Based on the microarray data, 25 genes were selected for validation by real-time RT-PCR analysis, and expression of these genes was also examined in the CL throughout the luteal phase and from monkeys treated with human chorionic gonadotropin (hCG) to mimic early pregnancy. The results indicated changes in expression of genes favorable to PGF(2 alpha) action during the late to very late luteal phase, and expressions of many of these genes were regulated in an opposite manner by exogenous hCG treatment. Collectively, the findings suggest that curtailment of expression of downstream LH-target genes possibly through PGF(2 alpha) action on the CL is among the mechanisms underlying cross talk between the luteotropic and luteolytic signaling pathways that result in the cessation of luteal function, but hCG is likely to abrogate the PGF(2 alpha)-responsive gene expression changes resulting in luteal rescue crucial for the maintenance of early pregnancy. (Endocrinology 150: 1473-1484, 2009)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Texture evolution in a low cost beta titanium alloy was studied for different modes of rolling and heat treatments. The alloy was cold rolled by unidirectional and multi-step cross rolling. The cold rolled material was either aged directly or recrystallized and then aged. The evolution of texture in alpha and beta phases were studied. The rolling texture of beta phase that is characterized by the gamma fiber is stronger for MSCR than UDR; while the trend is reversed on recrystallization. The mode of rolling affects alpha transformation texture on aging with smaller alpha lath size and stronger alpha texture in UDR than in MSCR. The defect structure in beta phase influences the evolution of a texture on aging. A stronger defect structure in beta phase leads to variant selection with the rolled samples showing fewer variants than the recrystallized samples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Distributed space time coding for wireless relay networks when the source, the destination and the relays have multiple antennas have been studied by Jing and Hassibi. In this set-up, the transmit and the receive signals at different antennas of the same relay are processed and designed independently, even though the antennas are colocated. In this paper, a wireless relay network with single antenna at the source and the destination and two antennas at each of the R relays is considered. A new class of distributed space time block codes called Co-ordinate Interleaved Distributed Space-Time Codes (CIDSTC) are introduced where, in the first phase, the source transmits a T-length complex vector to all the relays;and in the second phase, at each relay, the in-phase and quadrature component vectors of the received complex vectors at the two antennas are interleaved and processed before forwarding them to the destination. Compared to the scheme proposed by Jing-Hassibi, for T >= 4R, while providing the same asymptotic diversity order of 2R, CIDSTC scheme is shown to provide asymptotic coding gain with the cost of negligible increase in the processing complexity at the relays. However, for moderate and large values of P, CIDSTC scheme is shown to provide more diversity than that of the scheme proposed by Jing-Hassibi. CIDSTCs are shown to be fully diverse provided the information symbols take value from an appropriate multidimensional signal set.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The growth of the nanocrystalline tribolayer produced in oxygen free high conductivity copper after sliding against 440C stainless steel was studied. Tests were conducted on a pin-on-disk tribometer at sliding velocities of 0.05 and 1.0 m/s and sliding times of 0.1 to 10,000 s. Subsurface deformation and the growth of the tribolayer as a function of time were studied with the use of transmission electron microscopy and ion induced secondary electron microscopy. A continuous nanocrystalline tribolayer was produced after as little as 10 s of sliding at both sliding velocities. The tribolayer produced by sliding at 0.05 m/s continued to grow at sliding times up to 10,000 s and developed texture. Dynamic recrystallization of the tribolayer at a sliding velocity of 1.0 m/s inhibited the growth of a continuous anocrystalline tribolayer.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A systematic study was undertaken on the combustion and thermal decomposition of pelletized Ammonium Perchlorate (AP) to investigate the effects of pelletizing pressure and dwell time. At constant pressure, increasing the dwell time results in an increase in the burning rate up to a maximum and thereafter decreases it. The dwell time required for the pellets to have maximum burning rate is a function of pressure. The maximum burning rate is the same for all the pressures used and is also unaffected by increasing, to the range 90-250 μ, the particle size of AP used. In order to explain the occurrence of a maximum in burning rate, pellets were examined for their thermal sensitivities, physical nature and the changes occurring during pelletization with dwell time and pressure. The variations are argued in terms of increasing density, formation of defects such as dislocations leading to an increase in the number of reactive sites, followed by their partial annihilation at longer dwell times due to flow of material during pelletization.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A recent theoretical model developed by Imparato et al. Phys of the experimentally measured heat and work effects produced by the thermal fluctuations of single micron-sized polystyrene beads in stationary and moving optical traps has proved to be quite successful in rationalizing the observed experimental data. The model, based on the overdamped Brownian dynamics of a particle in a harmonic potential that moves at a constant speed under a time-dependent force, is used to obtain an approximate expression for the distribution of the heat dissipated by the particle at long times. In this paper, we generalize the above model to consider particle dynamics in the presence of colored noise, without passing to the overdamped limit, as a way of modeling experimental situations in which the fluctuations of the medium exhibit long-lived temporal correlations, of the kind characteristic of polymeric solutions, for instance, or of similar viscoelastic fluids. Although we have not been able to find an expression for the heat distribution itself, we do obtain exact expressions for its mean and variance, both for the static and for the moving trap cases. These moments are valid for arbitrary times and they also hold in the inertial regime, but they reduce exactly to the results of Imparato et al. in appropriate limits. DOI: 10.1103/PhysRevE.80.011118 PACS.