Applying Multi – Agent Reinforcement Learning on Adaptive Beer Game Inventory Model

碩士 === 國立高雄第一科技大學 === 運籌管理所 === 95 === Kimbrough, Wu & Zhong(2002)modeled a supply chain following the structure of MIT beer game with artificial agents to investigate whether those agents can learn via generic algorithm (GA) to achieve optimal lot-for-lot ordering strategy. They compared the re...

Full description

Bibliographic Details
Main Authors: Ying-Li Kuo, 郭映里
Other Authors: none
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/14641652759565558885
Description
Summary:碩士 === 國立高雄第一科技大學 === 運籌管理所 === 95 === Kimbrough, Wu & Zhong(2002)modeled a supply chain following the structure of MIT beer game with artificial agents to investigate whether those agents can learn via generic algorithm (GA) to achieve optimal lot-for-lot ordering strategy. They compared the results with those obtained from MBA and undergraduate students who played MIT beer game and found their artificial agents outperform student subjects in making supply chain decisions. Observing from their design of GA included in each artificial agent, they allowed GA to learn from the whole 35 time periods to find the optimal strategy which contradicts with the comparison foundation where human subjects can only learn from past experiences while the GA-based artificial agents can learn future events. We think the computer agent model and the conclusion derived in Kimbrough et al. (2002) are not appropriate to compare with human subjects. Their work, however, shows that computer agents with GA can learn from past experiences to conclude the best supply chain strategy for the past events but not the future events. Since reinforcement learning learns from past experiences and stores the intelligence in neural networks to respond to future events, we believe this kind of learning imitate human learning behaviors closely. In this study, we embed the reinforcement learning intelligence in artificial agents to learn the supply chain structure of MIT beer game to study if computer agents with reinforcement learning capability can learn appropriate supply chain strategy. Two different reinforcement structures with three diverse demand patterns are implemented in our study with Boltzmann selection algorithm. We do not compare our results with those of human subjects since too much uncertainty can be there in human experiments that we design a reasonably good model as a base one for the comparison. Results show the two reinforcement models perform better than the base one and can respond to complex demand scenarios as well; however, the reinforcement learning still can not outperform the lot-for-lot optimal policy. The phenomenon is apparent since the lot-for-lot policy implies information sharing among supply chain members, while reinforcement learning applies only to local myopic learning. Moreover, we can deduce from our study that learning intelligence can not outperform information visibility in supply chain decision making.