| 要約: | Synthetic data has emerged as a significant alternative to more costly and time-consuming data collection methods. This assertion is particularly salient in the context of training facial expression recognition (FER) and generation models. The EmoStyle model represents a state-of-the-art method for editing images of facial expressions in the latent space of StyleGAN2, using a continuous valence–arousal (VA) representation of emotions. While the model has demonstrated promising results in terms of high-quality image generation and strong identity preservation, its accuracy in reproducing facial expressions across the VA space remains to be systematically examined. To address this gap, the present study proposes a systematic evaluation of EmoStyle’s ability to generate facial expressions across the full VA space, including four levels of emotional intensity. While prior work on expression manipulation has mainly focused its evaluations on perceptual quality, diversity, identity preservation, or classification accuracy, to the best of our knowledge, no study to date has systematically evaluated the accuracy of generated expressions across the VA space. The evaluation’s findings include a consistent weakness in the VA direction range of 242–329°, where EmoStyle demonstrates the inability to produce distinct expressions. Building on these findings, we outline recommendations for enhancing the generation pipeline and release an open-source EmoStyle-based toolkit that integrates fixes to the original EmoStyle repository, an API wrapper, and our experiment scripts. Collectively, these contributions furnish both novel insights into the model’s capacities and practical resources for further research.
|