TY - GEN
T1 - Performance analysis of direct heuristic dynamic programming using control-theoretic measures
AU - Yang, Lei
AU - Si, Jennie
AU - Tsakalis, Konstantinos
AU - Rodriguez, Armando
PY - 2007/12/1
Y1 - 2007/12/1
N2 - Approximate dynamic programming (ADP) has been widely studied from several important perspectives: algorithm development, learning efficiency measured by success or failure statistics, convergence rate, and learning error bounds. Given that many learning benchmarks used in ADP or reinforcement learning studies are control problems, it is important and necessary to examine the learning controllers from a control-theoretic perspective. This paper makes use of direct heuristic dynamic programming (direct HDP) and several benchmark examples to introduce a unique analytical framework that can be extended to other learning control paradigms and other complex control problems. The sensitivity analysis and the linear quadratic regulator (LQR) design are used in the paper for two purposes: to gauge direct HDP performance characteristics and to provide guidance toward designing better learning controllers. This gauge however does not limit the direct HDP to be effective only as a linear controller. Toward this end, applications of the direct HDP for nonlinear control problems beyond sensitivity analysis and the confines of LQR have been developed and compared with LQR design for command following and internal system parameter changes.
AB - Approximate dynamic programming (ADP) has been widely studied from several important perspectives: algorithm development, learning efficiency measured by success or failure statistics, convergence rate, and learning error bounds. Given that many learning benchmarks used in ADP or reinforcement learning studies are control problems, it is important and necessary to examine the learning controllers from a control-theoretic perspective. This paper makes use of direct heuristic dynamic programming (direct HDP) and several benchmark examples to introduce a unique analytical framework that can be extended to other learning control paradigms and other complex control problems. The sensitivity analysis and the linear quadratic regulator (LQR) design are used in the paper for two purposes: to gauge direct HDP performance characteristics and to provide guidance toward designing better learning controllers. This gauge however does not limit the direct HDP to be effective only as a linear controller. Toward this end, applications of the direct HDP for nonlinear control problems beyond sensitivity analysis and the confines of LQR have been developed and compared with LQR design for command following and internal system parameter changes.
UR - http://www.scopus.com/inward/record.url?scp=51749117375&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51749117375&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2007.4371352
DO - 10.1109/IJCNN.2007.4371352
M3 - Conference contribution
AN - SCOPUS:51749117375
SN - 142441380X
SN - 9781424413805
T3 - IEEE International Conference on Neural Networks - Conference Proceedings
SP - 2504
EP - 2509
BT - The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings
T2 - 2007 International Joint Conference on Neural Networks, IJCNN 2007
Y2 - 12 August 2007 through 17 August 2007
ER -