Webbläsaren som du använder stöds inte av denna webbplats. Alla versioner av Internet Explorer stöds inte längre, av oss eller Microsoft (läs mer här: * https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Var god och använd en modern webbläsare för att ta del av denna webbplats, som t.ex. nyaste versioner av Edge, Chrome, Firefox eller Safari osv.

A new Q-learning algorithm based on the Metropolis criterion

Författare

Summary, in English

The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.

Avdelning/ar

  • Computer Science

Publiceringsår

2004

Språk

Engelska

Sidor

2140-2143

Publikation/Tidskrift/Serie

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Volym

34

Issue

5

Dokumenttyp

Artikel i tidskrift

Förlag

IEEE - Institute of Electrical and Electronics Engineers Inc.

Ämne

  • Computer Science

Nyckelord

  • reinforcement learning
  • Q-learning
  • metropolis criterion
  • exploitation
  • exploration

Status

Published

ISBN/ISSN/Övrigt

  • ISSN: 1083-4419