On Strength Analyses of Computer Programs for Stochastic Games with Perfect Information

博士 === 國立交通大學 === 資訊科學與工程研究所 === 107 === The field of computer games is important to the researches in artificial intelligence. According to two different roles of the elements of chance involved, games can be classified as deterministic vs. stochastic and perfect information vs. imperfect informati...

Full description

Bibliographic Details
Main Authors: Hsueh, Chu-Hsuan, 薛筑軒
Other Authors: Wu, I-Chen
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/ku48z7
Description
Summary:博士 === 國立交通大學 === 資訊科學與工程研究所 === 107 === The field of computer games is important to the researches in artificial intelligence. According to two different roles of the elements of chance involved, games can be classified as deterministic vs. stochastic and perfect information vs. imperfect information. Since many real-world problems involve uncertainty, stochastic games and imperfect information games are worthy to study. This thesis targets at stochastic games with perfect information since the games in this category is easier to model than imperfect information games. Chinese dark chess (CDC) and a reduced and solved variant, 2×4 CDC, are two games of this category which this thesis mainly focuses on. This thesis first enhances a game-playing program for CDC based on Monte-Carlo tree search (MCTS) by several existing techniques that combine additional knowledge. The additional knowledge is manually designed, and is incorporated into four techniques including early playout terminations, implicit minimax backups, quality-based rewards, and progressive bias. By combining all, the win rate is 84.75% (±1.90%) against the original program. In addition, this thesis investigates three strength analysis metrics on 2×4 CDC, including win rates playing against other players, prediction rates to expert actions, and mean squared errors to values of positions. Experiments show that win rates are indeed good indicators of programs’ strengths. The other two metrics are also good indicators, though not as good as win rates. Another analysis performed on 2×4 CDC is applying the AlphaZero algorithm, which is a kind of reinforcement learning algorithm achieved superhuman levels of plays in chess, shogi, and Go. Experiments show that the algorithm can learn the theoretical values and optimal plays even in stochastic games. Finally, this thesis studies two more stochastic games with perfect information, which are EinStein Würfelt Nicht! (EWN) and 2048-like games. Another kind of reinforcement learning algorithm, temporal difference learning, is applied to EWN and 2048-like games. For EWN, a program combining three techniques using the learned knowledge, including progressive bias, prior knowledge, and epsilon-greedy playouts, has a win rate of 62.25% (±2.12%) against the original program. For 2048-like games, a multistage variant of temporal difference learning improves the learned knowledge.