Name: Maowen Tang Master’s Thesis Defense, Wednesday, April 22, 2026 @ 11:00 am Central Time
Start: 2026-04-22T11:00:00-05:00
End: 2026-04-22T12:00:00-05:00

This event has passed.

Maowen Tang Master’s Thesis Defense, Wednesday, April 22, 2026 @ 11:00 am Central Time

April 22 @ 11:00 am - 12:00 pm

COMMITTEE CHAIR: Dr. Yonghui Wang

TITLE: STRUCTURED REPRESENTATION LEARNING FOR GENERALIZABLE DEEPFAKE VIDEO DETECTION

ABSTRACT Deepfake video detection has become an important problem in multimedia forensics as modern generative models produce increasingly realistic facial manipulations. Although many existing detectors achieve strong performance on the datasets on which they are trained, their performance often degrades substantially on unseen manipulation methods and evaluation conditions. A major reason for this limitation is the premature collapse of spatial structure: many vision transformer based detectors aggregate patch tokens into a single global representation, thereby suppressing the localized and temporally uneven forensic cues that characterize manipulated video. This thesis presents the Spatio-Temporal Slot Aggregation Network (ST-SAN), a video-level deepfake detection framework designed to preserve structured forensic evidence before final classification. ST-SAN extracts patch tokens from three intermediate layers of a frozen DINOv2 backbone and aligns them through a lightweight bottleneck projection. A K-slot soft aggregation module then forms multiple learned slot summaries for each frame, allowing the model to retain several localized views of manipulation evidence instead of collapsing all patch information into one vector. These slot features are further integrated through adaptive frame weighting and slot weighting so that frames and slot summaries with stronger forensic content contribute more to the final decision. Training is stabilized by structural regularization terms that encourage locality, orthogonality, diversity across slot summaries, and weak coverage. Experiments show that ST-SAN achieves 0.960 AUC on FaceForensics++ under in-domain evaluation. Under cross-domain evaluation, it achieves 0.917 AUC on Celeb-DF v2, 0.872 AUC on DeepFakeDetection, and 0.890 AUC on the DeepFake Detection Challenge Preview dataset, indicating competitive cross-domain performance on the reported benchmarks. Ablation results show that parallel soft slot aggregation is an important architectural component, while adaptive weighting and structural regularization help stabilize the full model. These findings indicate that preserving multiple localized forensic summaries prior to video-level classification is a promising strategy for improving robustness under cross-domain evaluation in deepfake video detection.

Keywords: Deepfake detection, video forensics, representation learning, vision transformer, slot aggregation, adaptive weighting.

Room Location: S. R. Collins Building, Room 111L

Maowen Tang Master’s Thesis Defense, Wednesday, April 22, 2026 @ 11:00 am Central Time

Maowen Tang Master’s Thesis Defense, Wednesday, April 22, 2026 @ 11:00 am Central Time

April 22 @ 11:00 am - 12:00 pm

Related Events

FINAL DEADLINE FOR FINAL DEFENSE OF THESIS/DISSERTATION/DOCTORAL PROJECT FOR SPRING 2026

Edesiri Albert Ukusajuya Master’s Thesis Defense, Friday, May 1, 2026 @ 10:00 am Central Time

Olamide Peter Oshinuga Master’s Thesis Defense, Friday, May 1, 2026 @ 10:30 am Central Time

Share This Story, Choose Your Platform!

Event Navigation