2026-06-26T00:00:00-05:00
Loading Events

COMMITTEE CHAIR: Dr. Lijun Qian

TITLE: DESIGN AND EVALUATION OF DETECTORS FOR AI-GENERATED IMAGES

ABSTRACT: The rapid improvement of generative image models has made synthetic images increasingly difficult to distinguish from authentic photographs, creating new challenges for media authentication and forensic analysis. Detectors that perform well on a single benchmark routinely fail when the test images come from generators or sources that the training set did not cover. This thesis studied which detection approaches survive that distribution shift, and tested three families of methods: pixel-space classifiers built on CLIP and Xception, diffusion-reconstruction analysis using a pretrained encoder–decoder pipeline, and latent-space classifiers trained directly on Stable Diffusion VAE encodings. The detectors were trained on a progressively expanded corpus assembled from FFHQ, DF40, the Tobecwb Stable Diffusion face dataset, OpenDiffusionAI human-image collections, and outputs from six diffusion generators including SDXL, Realistic Vision, Dreamlike Photo-real, Counterfeit, SDXL Turbo, and DALL·E 3. A separate source-diverse corpus combining Defactify, Midjourney-ImageNet, FFHQ, and the Tobecwb dataset was used for the reconstruction and latent-space studies, allowing the diffusion-derived signals to be tested across a wider range of generators than the main training set provided. Each approach was submitted to the NIST GenAI Image Discriminator Challenge, where the test set is held out by a third party and the evaluation includes calibration, equal error rate, and constrained operating-point metrics in addition to AUC. The experimental results show that CLIP-based pixel-space discriminators produce the strongest standalone leaderboard performance across both Round 1 and Round 3 submissions, confirming that contrastively pretrained representations transfer well to detection. Diffusion-reconstruction error does not separate real from synthetic images in practice: the LPIPS and PSNR distributions overlap with no usable decision threshold, refuting one of the cleaner hypotheses in the literature about using a generative model as its own discriminator. The latent-space classifier underperforms on its own but contributes the difference between a mediocre ensemble and the strongest one when fused with CLIP and Xception under weighted averaging. The strongest submission, a weighted ensemble combining CLIP, Xception, and the latent classifier, achieves the lowest Brier score in the experimental program and the smallest gap between in-distribution and blind-evaluation performance. The same three components combined under majority voting drop one AUC bucket and produce one of the largest transfer gaps observed, which makes the choice of fusion rule a finding in its own right rather than an implementation detail. Together, the results provide practical guidance for building image-discriminator systems that transfer across evolving generative models, and they offer a reproducible methodology for evaluating such systems under conditions that resemble real deployment.
Keywords: Deep learning, computer vision, AI-generated images, Deepfake, classifier
Room Location: Electrical Engineering Conference Room 315D

Share This Story, Choose Your Platform!

Go to Top