Informatics Report Series


Report   

EDI-INF-RR-1259


Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home
Title:Consistent exploration improves convergence of reinforcement learning on POMDPs
Authors: Paul Crook ; Gillian Hayes
Date:May 2007
Publication Title:AAMAS 2007 Workshop on Adaptive and Learning Agents (ALAg-07)
Publication Type:Conference Paper Publication Status:Published
Abstract:
This paper sets out the concept of consistent exploration of observation-action pairs. We present a new temporal difference algorithm, CEQ(lambda), based on this concept and demonstrate using a randomly generated set of partially observable Markov decision processes (POMDPs) that it outperforms SARSA(lambda). This result should generalise to any POMDP where satisficing policies which map observations to actions exists. We also set out reasons for preferring CEQ(lambda) over an alternative Monte-Carlo style algorithm, MCESP, when working in the robotics domain.
Links To Paper
1st Link
Bibtex format
@InProceedings{EDI-INF-RR-1259,
author = { Paul Crook and Gillian Hayes },
title = {Consistent exploration improves convergence of reinforcement learning on POMDPs},
book title = {AAMAS 2007 Workshop on Adaptive and Learning Agents (ALAg-07)},
year = 2007,
month = {May},
url = {http://homepages.inf.ed.ac.uk/pcrook/publications/alag2007-paper.pdf},
}


Home : Publications : Report 

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh