Return to Notebooks

Generative Adversarial Networks (GANs)

Notes on GANs, their minimax formulation, training instability, architectural evolution, and why diffusion models overtook them.

start: 2026.04.16, 17:39 · end: 2026.04.16, 17:39
status: In Progress

Status Indicator

The status indicator reflects the current state of the work: - Abandoned: Work that has been discontinued - Notes: Initial collections of thoughts and references - Draft: Early structured version with a central thesis - In Progress: Well-developed work actively being refined - Finished: Completed work with no planned major changes This helps readers understand the maturity and completeness of the content.

· certainty: likely

Confidence Rating

The confidence tag expresses how well-supported the content is, or how likely its overall ideas are right. This uses a scale from "impossible" to "certain", based on the Kesselman List of Estimative Words: 1. "certain" 2. "highly likely" 3. "likely" 4. "possible" 5. "unlikely" 6. "highly unlikely" 7. "remote" 8. "impossible" Even ideas that seem unlikely may be worth exploring if their potential impact is significant enough.

· importance: 6/10

Importance Rating

The importance rating distinguishes between trivial topics and those which might change your life. Using a scale from 0-10, content is ranked based on its potential impact on: - the reader - the intended audience - the world at large For example, topics about fundamental research or transformative technologies would rank 9-10, while personal reflections or minor experiments might rank 0-1.


GANs are a class of generative model. Ian Goodfellow and colleagues introduced them in 2014 with a simple idea: train two neural networks in opposition. A generator synthesizes data (images, audio, text) from noise. A discriminator learns to tell real samples from synthetic ones. Each network improves by exploiting the other's weaknesses. The generator wins when the discriminator can no longer tell real from fake. Goodfellow formalized this as a minimax game, with the Nash equilibrium, where the generator perfectly models the data distribution, as the theoretical optimum.

The intellectual genealogy is worth marking. GANs drew on adversarial examples (Szegedy et al., 2013) and game-theoretic learning, with a nod to the broader tradition of latent variable models including VAEs (Kingma & Welling, 2013, published the same year). What's genuinely new about GANs is that they replace an explicit likelihood function with an implicit one. You train a model without ever writing down $p(x)$. That made them far more flexible for high-dimensional perceptual data, and far harder to train.

The training dynamics are the central theoretical problem. The minimax objective is non-convex. Gradients can vanish when the discriminator dominates, and the generator can collapse to producing a narrow subset of outputs (mode collapse). The original formulation had no guarantee of convergence. Much of the subsequent literature, WGAN (Arjovsky et al., 2017), spectral normalization, progressive growing, etc., is best read as attempts to stabilize what is fundamentally a fragile adversarial equilibrium.

Architecturally the field moved fast. Fully connected networks gave way to convolutional ones (DCGAN, 2015), then conditional generation (cGAN, Pix2Pix, CycleGAN), then high-resolution synthesis (ProGAN, StyleGAN, BigGAN), and eventually hybrid approaches incorporating attention and transformers (VQGAN, GigaGAN). The image quality gains between 2016 and 2021 were among the most dramatic in deep learning's history. Early GAN faces are rough artifacts. StyleGAN2 faces are indistinguishable from photographs to the untrained eye.

A few deep tensions run through all of this. Evaluation is unsolved: metrics like FID (Fréchet Inception Distance) and Inception Score are imperfect proxies for the three things you actually want a generative model to do well, namely perceptual quality, memorization, and diversity. The Nash equilibrium framing may also be misleading in practice. The game is non-convex, and what training actually achieves may have little formal relationship to the minimax optimum. GANs have also largely ceded ground to diffusion models (DDPM, 2020; Stable Diffusion, 2022) on image synthesis benchmarks, a development that raises genuinely interesting questions about why an implicit likelihood model lost to an explicit one.

Adjacent intellectual territory worth engaging with: game theory (Nash equilibria, mechanism design), information geometry (the GAN objective can be reframed as minimizing certain divergences between distributions), optimal transport (WGAN's Wasserstein distance formulation), and the ethics of synthetic media. Deepfakes came directly out of GAN capabilities and remain a live policy problem.

See Also

Notes

To Read

Primary Sources

Papers

Tutorials & Survey


permanent link Notebooks RSS feed