MAIA: The role of innate behaviors when picking flowers in Minecraft with Q-learning

MAIA: Instinkters roll vid blomplockning i Minecraft med Q-inlärning


Summary, in English

Recent advances in reinforcement learning research has achieved human level performance in playing video games (Mnih et al., 2015). This inspired me to understand the methods of reinforcement learning (RL) and investigate whether there is any basis for those methods in neurobiology and animal learning theories. The current study shows how RL is based on theories of animal conditioning and that there is solid evidence for neurobiological correlates with RL algorithms, primarily in the basal ganglia complex. This motivated a simple perceptron-based model of the basal ganglia called Q-tron, which utilizes the Q-learning algorithm. Additionally, I wanted to explore the hypothesis that adding an innate behavior to a Q-learning agent would increase performance. Thus four different agents were tasked with picking red flowers in the video game Minecraft where performance was measured as quantity of actions needed to pick a flower. A “pure” Q-learner called PQ used only the Q- tron model. MAIA (Minecraft Artificial Intelligence Agent) used the Q-tron model together with an innate behavior causing it to try picking when it saw red. Two mechanisms of the innate behavior were tested, creating MAIA1 and MAIA2, respectively. The fourth agent called random walker (RW) chose actions at random and acted as a baseline performance measure. We show that both MAIA versions have better performance than PQ, and MAIA1 has performance comparable to RW. Additionally, we show a difference in performance between MAIA1 and MAIA2 and argue that this shows the importance of investigations into the precise mechanisms underlying innate behaviors in animals in order to understand learning in general.


  • Science General


  • reinforcement learning
  • q-learning
  • minecraft
  • innate behavior
  • artificial intelligence
  • conditioning
  • neuroscience
  • dopamine
  • prediction error
  • cognition
  • learning
  • neural networks
  • cognitive science


  • Christian Balkenius (professor)