TY - GEN
T1 - Policy iteration approximate dynamic programming using Volterra series based actor
AU - Guo, Wentao
AU - Si, Jennie
AU - Liu, Feng
AU - Mei, Shengwei
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/3
Y1 - 2014/9/3
N2 - There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.
AB - There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.
UR - http://www.scopus.com/inward/record.url?scp=84908466723&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908466723&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2014.6889865
DO - 10.1109/IJCNN.2014.6889865
M3 - Conference contribution
AN - SCOPUS:84908466723
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 249
EP - 255
BT - Proceedings of the International Joint Conference on Neural Networks
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 International Joint Conference on Neural Networks, IJCNN 2014
Y2 - 6 July 2014 through 11 July 2014
ER -