Report EDI-INF-RR-1259

Informatics Report Series

Report

EDI-INF-RR-1259

Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home

Title:Consistent exploration improves convergence of reinforcement learning on POMDPs

Authors: Paul Crook ; Gillian Hayes

Date:May 2007

Publication Title:AAMAS 2007 Workshop on Adaptive and Learning Agents (ALAg-07)

Publication Type:Conference Paper Publication Status:Published

Abstract:: This paper sets out the concept of consistent exploration of observation-action pairs. We present a new temporal difference algorithm, CEQ(lambda), based on this concept and demonstrate using a randomly generated set of partially observable Markov decision processes (POMDPs) that it outperforms SARSA(lambda). This result should generalise to any POMDP where satisficing policies which map observations to actions exists. We also set out reasons for preferring CEQ(lambda) over an alternative Monte-Carlo style algorithm, MCESP, when working in the robotics domain.

Links To Paper
1st Link

Bibtex format
@InProceedings{EDI-INF-RR-1259,: author = { Paul Crook and Gillian Hayes },; title = {Consistent exploration improves convergence of reinforcement learning on POMDPs},; book title = {AAMAS 2007 Workshop on Adaptive and Learning Agents (ALAg-07)},; year = 2007,; month = {May},; url = {http://homepages.inf.ed.ac.uk/pcrook/publications/alag2007-paper.pdf},
}

Home : Publications : Report

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh