AlphaZero to Alpha Hero : A pre-study on Additional Tree Sampling within Self-Play Reinforcement Learning
In self-play reinforcement learning an agent plays games against itself and with the help of hindsight and retrospection improves its policy over time. Using this premise, AlphaZero famously managed to become the strongest known Go, Shogi, and Chess entity by training a deep neural network from data...
Main Authors: | , |
---|---|
Format: | Others |
Language: | English |
Published: |
KTH, Skolan för elektroteknik och datavetenskap (EECS)
2019
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259200 |