Meny

Javascript is not activated in your browser. This website needs javascript activated to work properly.
Du är här

A new Q-learning algorithm based on the Metropolis criterion

Författare:
Publiceringsår: 2004
Språk: Engelska
Sidor: 2140-2143
Publikation/Tidskrift/Serie: IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics
Volym: 34
Nummer: 5
Dokumenttyp: Artikel
Förlag: Institute of Electrical and Electronics Engineers, Inc.,

Sammanfattning

The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.

Disputation

Nyckelord

  • Mathematics and Statistics
  • reinforcement learning
  • Q-learning
  • metropolis criterion
  • exploitation
  • exploration

Övriga

Published
Yes
  • ISSN: 1083-4419

Box 117, 221 00 LUND
Telefon 046-222 00 00 (växel)
Telefax 046-222 47 20
lu [at] lu [dot] se

Fakturaadress: Box 188, 221 00 LUND
Organisationsnummer: 202100-3211
Om webbplatsen