Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments

Maryam Shokri, Hamid R. Tizhoosh, Mohamed S. Kamel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents some investigations on tradeoff between exploration and exploitation of opposition-based Q(λ) with non-Markovian update (NOQ(λ)) in a dynamic environment. In the previous work the authors applied NOQ(λ) to the deterministic GridWorld problem. In this paper, we have implemented the NOQ(λ) algorithm for a simple elevator control problem to test the behavior of the algorithm for nondeterministic and dynamic environment. We also extend the NOQ(λ) algorithm by introducing the opposition weight to And a better tradeoff between exploration and exploitation for the NOQ(λ) technique. The value of the opposition weight increases as the number of steps increases. Hence, it has more positive effects on the Q-value updates for opposite actions as the learning progresses. The performance of NOQ(λ) method is compared with Q(λ) technique. The experiments indicate that NOQ(λ) performs better than Q(λ).

Original languageEnglish (US)
Title of host publication2008 International Joint Conference on Neural Networks, IJCNN 2008
Pages2915-2921
Number of pages7
DOIs
StatePublished - 2008
Event2008 International Joint Conference on Neural Networks, IJCNN 2008 - Hong Kong, China
Duration: Jun 1 2008Jun 8 2008

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2008 International Joint Conference on Neural Networks, IJCNN 2008
Country/TerritoryChina
CityHong Kong
Period6/1/086/8/08

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments'. Together they form a unique fingerprint.

Cite this