Deep Learning Models for Deepfake Detection: A Comparative Analysis among CNN, GAN, and Transformer Architectures
Deep Learning Models for Deepfake Detection: A Comparative Analysis among CNN, GAN, and Transformer Architectures
DOI:
https://doi.org/10.51473/rcmos.v1i1.2022.1866Keywords:
Deepfake; Deep Learning; CNN; GAN; Transformer; Generalization; Digital Forensics.Abstract
The rapid advancement of synthetic media manipulation technologies, commonly known as deepfakes, poses an increasing threat to information trustworthiness and digital security. These forged videos, primarily created by Generative Adversarial Networks (GANs), make the distinction between real and fake content increasingly difficult, necessitating sophisticated Deep Learning-based countermeasures. This paper presents a rigorous comparative analysis of three fundamental architectures in Computer Vision for deepfake detection: Convolutional Neural Networks (CNNs), Generative Adversarial Networks (in their role as detectors or in hybrid models that exploit their signatures), and Transformers (particularly Vision Transformers - ViTs). The evaluation focuses on metrics critical for real-world application scenarios, including classification accuracy, processing time (inference latency), and the essential generalization capability to unseen forgery techniques and datasets (cross-dataset evaluation). The results from the bibliographic and theoretical analysis indicate that while CNNs (such as XceptionNet) maintain relevance due to their efficiency and ability to capture local artifacts, Transformer-based architectures demonstrate a superior capability to model global dependencies and, consequently, exhibit better generalization against the constantly evolving deepfake methodologies.
Downloads
References
GOODFELLOW, I. et al. Generative adversarial networks. In: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. 2014.
HOCHREITER, S.; SCHMIDHUBER, J. Long short-term memory. Neural Computation, v. 9, n. 8, p. 1735–1780, 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735
KINGMA, D. P.; BA, J. Adam: a method for stochastic optimization. In: INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS. 2015.
LECUN, Y. et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, v. 86, n. 11, p. 2278–2324, 1998. DOI: https://doi.org/10.1109/5.726791
ROSSLER, A. et al. Face forensics++: learning to detect manipulated facial images. In: INTERNATIONAL CONFERENCE ON COMPUTER VISION. 2019. DOI: https://doi.org/10.1109/ICCV.2019.00009
VASWANI, A. et al. Attention is all you need. In: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. 2017.
XU, H. et al. Positional encoding for deepfake detection. In: IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING. 2021.
ZHOU, P. et al. Two-stream neural networks for tampered face detection. In: IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS. 2017. DOI: https://doi.org/10.1109/CVPRW.2017.229
AFCHAR, D. et al. MesopNet: a compact deepfake detection network. In: IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING. 2020.
COZZOLINO, D.; VERDOLIVA, L. Forensic analysis of neural networks for generative model attribution. In: INTERNATIONAL WORKSHOP ON DIGITAL WATERMARKING. 2018.
Downloads
Published
Issue
Section
Categories
License
Copyright (c) 2022 Matheus de Oliveira Pereira Paula (Autor)

This work is licensed under a Creative Commons Attribution 4.0 International License.




