| Summary: | The rapid advancement in deep forgery technology in recent years has created highly deceptive face video content, posing significant security risks. Detecting these fakes is increasingly urgent and challenging. To improve the accuracy of deepfake face detection models and strengthen their resistance to adversarial attacks, this manuscript introduces a method for detecting forged faces and defending against adversarial attacks based on a multi-feature decision fusion. This approach allows for rapid detection of fake faces while effectively countering adversarial attacks. Firstly, an improved IMTCCN network was employed to precisely extract facial features, complemented by a diffusion model for noise reduction and artifact removal. Subsequently, the FG-TEFusionNet (Facial-geometry and Texture enhancement fusion-Net) model was developed for deepfake face detection and assessment. This model comprises two key modules: one for extracting temporal features between video frames and another for spatial features within frames. Initially, a facial geometry landmark calibration module based on the LRNet baseline framework ensured an accurate representation of facial geometry. A SENet attention mechanism was then integrated into the dual-stream RNN to enhance the model’s capability to extract inter-frame information and derive preliminary assessment results based on inter-frame relationships. Additionally, a Gram image texture feature module was designed and integrated into EfficientNet and the attention maps of WSDAN (Weakly Supervised Data Augmentation Network). This module aims to extract deep-level feature information from the texture structure of image frames, addressing the limitations of purely geometric features. The final decisions from both modules were integrated using a voting method, completing the deepfake face detection process. Ultimately, the model’s robustness was validated by generating adversarial samples using the I-FGSM algorithm and optimizing model performance through adversarial training. Extensive experiments demonstrated the superior performance and effectiveness of the proposed method across four subsets of FaceForensics++ and the Celeb-DF dataset.
|