TY - GEN
T1 - Active exploratory Q-learning for large problems
AU - Wu, Xianghai
AU - Kofman, Jonathan
AU - Tizhoosh, Hamid R.
PY - 2007
Y1 - 2007
N2 - Although reinforcement learning (RL) emerged more than a decade ago, it is still under extensive investigation in application to large problems, where the states and actions are multi-dimensional and continuous and result in the so-called curse of dimensionality. Conventional RL methods are still not efficient enough in huge state-action spaces, while value-function generalization-based approaches require a very large number of good training examples. This paper presents an active exploratory approach to address the challenge of RL in large problems. The core principle of this approach is that the agent does not rush to the next state. Instead, it attempts a number of actions at the current state first, and then selects the action that returns the greatest immediate reward. The state resulting from performing the action is considered as the next state. Four active exploration algorithms for good actions are proposed: random-based search, opposition-based random search, search by cyclical adjustment, and opposition-based cyclical adjustment of each action dimension. The efficiency of these algorithms is determined by a visual-servoing experiment with a 6-axis robot.
AB - Although reinforcement learning (RL) emerged more than a decade ago, it is still under extensive investigation in application to large problems, where the states and actions are multi-dimensional and continuous and result in the so-called curse of dimensionality. Conventional RL methods are still not efficient enough in huge state-action spaces, while value-function generalization-based approaches require a very large number of good training examples. This paper presents an active exploratory approach to address the challenge of RL in large problems. The core principle of this approach is that the agent does not rush to the next state. Instead, it attempts a number of actions at the current state first, and then selects the action that returns the greatest immediate reward. The state resulting from performing the action is considered as the next state. Four active exploration algorithms for good actions are proposed: random-based search, opposition-based random search, search by cyclical adjustment, and opposition-based cyclical adjustment of each action dimension. The efficiency of these algorithms is determined by a visual-servoing experiment with a 6-axis robot.
UR - http://www.scopus.com/inward/record.url?scp=40949138163&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=40949138163&partnerID=8YFLogxK
U2 - 10.1109/ICSMC.2007.4414257
DO - 10.1109/ICSMC.2007.4414257
M3 - Conference contribution
AN - SCOPUS:40949138163
SN - 1424409918
SN - 9781424409914
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 4040
EP - 4045
BT - 2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007
T2 - 2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007
Y2 - 7 October 2007 through 10 October 2007
ER -