Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments

Maryam Shokri; Hamid R. Tizhoosh; Mohamed S. Kamel

doi:10.1109/IJCNN.2008.4634208

Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments

Maryam Shokri, Hamid R. Tizhoosh, Mohamed S. Kamel

Artificial Intelligence and Informatics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

This paper presents some investigations on tradeoff between exploration and exploitation of opposition-based Q(λ) with non-Markovian update (NOQ(λ)) in a dynamic environment. In the previous work the authors applied NOQ(λ) to the deterministic GridWorld problem. In this paper, we have implemented the NOQ(λ) algorithm for a simple elevator control problem to test the behavior of the algorithm for nondeterministic and dynamic environment. We also extend the NOQ(λ) algorithm by introducing the opposition weight to And a better tradeoff between exploration and exploitation for the NOQ(λ) technique. The value of the opposition weight increases as the number of steps increases. Hence, it has more positive effects on the Q-value updates for opposite actions as the learning progresses. The performance of NOQ(λ) method is compared with Q(λ) technique. The experiments indicate that NOQ(λ) performs better than Q(λ).

Original language	English (US)
Title of host publication	2008 International Joint Conference on Neural Networks, IJCNN 2008
Pages	2915-2921
Number of pages	7
DOIs	https://doi.org/10.1109/IJCNN.2008.4634208
State	Published - 2008
Event	2008 International Joint Conference on Neural Networks, IJCNN 2008 - Hong Kong, China Duration: Jun 1 2008 → Jun 8 2008

Publication series

Name	Proceedings of the International Joint Conference on Neural Networks

Conference

Conference	2008 International Joint Conference on Neural Networks, IJCNN 2008
Country/Territory	China
City	Hong Kong
Period	6/1/08 → 6/8/08

ASJC Scopus subject areas

Software
Artificial Intelligence

Access to Document

10.1109/IJCNN.2008.4634208

Cite this

Shokri, M., Tizhoosh, H. R., & Kamel, M. S. (2008). Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments. In 2008 International Joint Conference on Neural Networks, IJCNN 2008 (pp. 2915-2921). Article 4634208 (Proceedings of the International Joint Conference on Neural Networks). https://doi.org/10.1109/IJCNN.2008.4634208

Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments. / Shokri, Maryam; Tizhoosh, Hamid R.; Kamel, Mohamed S.
2008 International Joint Conference on Neural Networks, IJCNN 2008. 2008. p. 2915-2921 4634208 (Proceedings of the International Joint Conference on Neural Networks).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Shokri, M, Tizhoosh, HR & Kamel, MS 2008, Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments. in 2008 International Joint Conference on Neural Networks, IJCNN 2008., 4634208, Proceedings of the International Joint Conference on Neural Networks, pp. 2915-2921, 2008 International Joint Conference on Neural Networks, IJCNN 2008, Hong Kong, China, 6/1/08. https://doi.org/10.1109/IJCNN.2008.4634208

@inproceedings{254389c9ee394e6baa144ac0d142e225,

title = "Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments",

abstract = "This paper presents some investigations on tradeoff between exploration and exploitation of opposition-based Q(λ) with non-Markovian update (NOQ(λ)) in a dynamic environment. In the previous work the authors applied NOQ(λ) to the deterministic GridWorld problem. In this paper, we have implemented the NOQ(λ) algorithm for a simple elevator control problem to test the behavior of the algorithm for nondeterministic and dynamic environment. We also extend the NOQ(λ) algorithm by introducing the opposition weight to And a better tradeoff between exploration and exploitation for the NOQ(λ) technique. The value of the opposition weight increases as the number of steps increases. Hence, it has more positive effects on the Q-value updates for opposite actions as the learning progresses. The performance of NOQ(λ) method is compared with Q(λ) technique. The experiments indicate that NOQ(λ) performs better than Q(λ).",

author = "Maryam Shokri and Tizhoosh, {Hamid R.} and Kamel, {Mohamed S.}",

year = "2008",

doi = "10.1109/IJCNN.2008.4634208",

language = "English (US)",

isbn = "9781424418213",

series = "Proceedings of the International Joint Conference on Neural Networks",

pages = "2915--2921",

booktitle = "2008 International Joint Conference on Neural Networks, IJCNN 2008",

note = "2008 International Joint Conference on Neural Networks, IJCNN 2008 ; Conference date: 01-06-2008 Through 08-06-2008",

}

TY - GEN

T1 - Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments

AU - Shokri, Maryam

AU - Tizhoosh, Hamid R.

AU - Kamel, Mohamed S.

PY - 2008

Y1 - 2008

N2 - This paper presents some investigations on tradeoff between exploration and exploitation of opposition-based Q(λ) with non-Markovian update (NOQ(λ)) in a dynamic environment. In the previous work the authors applied NOQ(λ) to the deterministic GridWorld problem. In this paper, we have implemented the NOQ(λ) algorithm for a simple elevator control problem to test the behavior of the algorithm for nondeterministic and dynamic environment. We also extend the NOQ(λ) algorithm by introducing the opposition weight to And a better tradeoff between exploration and exploitation for the NOQ(λ) technique. The value of the opposition weight increases as the number of steps increases. Hence, it has more positive effects on the Q-value updates for opposite actions as the learning progresses. The performance of NOQ(λ) method is compared with Q(λ) technique. The experiments indicate that NOQ(λ) performs better than Q(λ).

AB - This paper presents some investigations on tradeoff between exploration and exploitation of opposition-based Q(λ) with non-Markovian update (NOQ(λ)) in a dynamic environment. In the previous work the authors applied NOQ(λ) to the deterministic GridWorld problem. In this paper, we have implemented the NOQ(λ) algorithm for a simple elevator control problem to test the behavior of the algorithm for nondeterministic and dynamic environment. We also extend the NOQ(λ) algorithm by introducing the opposition weight to And a better tradeoff between exploration and exploitation for the NOQ(λ) technique. The value of the opposition weight increases as the number of steps increases. Hence, it has more positive effects on the Q-value updates for opposite actions as the learning progresses. The performance of NOQ(λ) method is compared with Q(λ) technique. The experiments indicate that NOQ(λ) performs better than Q(λ).

UR - http://www.scopus.com/inward/record.url?scp=56349096864&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56349096864&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2008.4634208

DO - 10.1109/IJCNN.2008.4634208

M3 - Conference contribution

AN - SCOPUS:56349096864

SN - 9781424418213

T3 - Proceedings of the International Joint Conference on Neural Networks

SP - 2915

EP - 2921

BT - 2008 International Joint Conference on Neural Networks, IJCNN 2008

T2 - 2008 International Joint Conference on Neural Networks, IJCNN 2008

Y2 - 1 June 2008 through 8 June 2008

ER -

Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this