DESEM: Depthwise Separable Convolution-Based Multimodal Deep Learning for In-Game Action Anticipation

In real-time strategy (RTS) games, to defeat their opponents, players need to choose and implement the correct sequential actions. Because RTS games like StarCraft II are real-time, players have a very limited time to choose how to develop their strategy. In addition, players can only partially obse...

Full description

Bibliographic Details
Main Authors: Bae, J. (Author), Baek, I. (Author), Jeong, J. (Author), Kim, C. (Author), Kim, S.B (Author), Lee, Y.J (Author), Park, K. (Author), Shim, S.H (Author)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 02774nam a2200385Ia 4500
001 10.1109-ACCESS.2023.3271282
008 230529s2023 CNT 000 0 und d
020 |a 21693536 (ISSN) 
245 1 0 |a DESEM: Depthwise Separable Convolution-Based Multimodal Deep Learning for In-Game Action Anticipation 
260 0 |b Institute of Electrical and Electronics Engineers Inc.  |c 2023 
300 |a 1 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1109/ACCESS.2023.3271282 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159707862&doi=10.1109%2fACCESS.2023.3271282&partnerID=40&md5=c426add37de8b858196481e58dbd8762 
520 3 |a In real-time strategy (RTS) games, to defeat their opponents, players need to choose and implement the correct sequential actions. Because RTS games like StarCraft II are real-time, players have a very limited time to choose how to develop their strategy. In addition, players can only partially observe the parts of the map that they have explored. Therefore, unlike Chess or Go, players do not know what their opponents are doing. For these reasons, applying generally used artificial intelligence models to forecast sequential actions in RTS games is a challenge. To address this, we propose depthwise separable convolution-based multimodal deep learning (DESEM) for forecasting sequential actions in the game StarCraft II. DESEM performs multimodal learning using high-dimensional frames and action labels simultaneously as inputs. We use a depthwise separable convolution as the backbone network for extracting features from high-dimensional frames. In addition, we propose a weighted loss function to resolve class imbalances. We use 1,978 StarCraft II replays where the Terrans win in a Terran vs. Protoss game. The experimental results show that the proposed depthwise separable convolution is superior to the conventional convolution. Furthermore, we demonstrate that multimodal learning and the weighted loss function contribute significantly to improving forecasting performance. Author 
650 0 4 |a action anticipation 
650 0 4 |a Artificial intelligence 
650 0 4 |a Convolutional neural networks 
650 0 4 |a Deep learning 
650 0 4 |a depthwise separable convolution 
650 0 4 |a Feature extraction 
650 0 4 |a Forecasting 
650 0 4 |a game artificial intelligence 
650 0 4 |a Games 
650 0 4 |a multimodal deep learning 
650 0 4 |a Videos 
650 0 4 |a weighted loss function 
700 1 0 |a Bae, J.  |e author 
700 1 0 |a Baek, I.  |e author 
700 1 0 |a Jeong, J.  |e author 
700 1 0 |a Kim, C.  |e author 
700 1 0 |a Kim, S.B.  |e author 
700 1 0 |a Lee, Y.J.  |e author 
700 1 0 |a Park, K.  |e author 
700 1 0 |a Shim, S.H.  |e author 
773 |t IEEE Access