AlphaFold 1: The Distogram Revolution at CASP13

Series: protein-structure-prediction

AlphaFold 1: The Distogram Revolution at CASP13

Author: Ersi Ni  |  Date: Feb 15, 2019  |  Log Entry: PhD Journal #5  |  Focus: DeepMind AlphaFold 1 Architecture Analysis

The rumors were not only true—they were understated. I am back from the CASP13 conference in Mexico, and the computational biology community is reeling. DeepMind's team A7D has completely dominated the Free Modeling category, scoring a median GDT_TS of 58.9, while the runner-up computational groups hovered in the high 40s.

This represents the first major rupture in the historical 30-40 GDT plateau, and it was achieved using deep neural networks. Today, I want to unpack the technical details of DeepMind's first major entry—AlphaFold 1—explaining the distogram revolution and the hybrid architecture of neural prediction and physical optimization.

Bypassing the Binary: The Distogram Breakthrough

For years, deep learning groups in our field (like RaptorX-Contact) treated structural prediction as a binary image classification task. They trained networks to output a contact map: a 2D grid of 1s and 0s indicating whether residue i and residue j were closer than 8Å.

But a binary threshold is a blunt biophysical instrument. A distance of 8.1Å is chemically very similar to 7.9Å, yet a binary map treats them as completely opposite. Furthermore, it throws away all precise geometric shape information.

AlphaFold 1’s first major breakthrough was replacing the binary contact map with a Distogram (a distance histogram):

  1. Instead of a single binary classifier, the model outputs a probability distribution over 64 continuous distance bins spanning from 2Å to 22Å (e.g., 2.0-2.3Å, 2.3-2.6Å, etc.).
  2. The network is a massive 2D Residual Network (ResNet) with 220 blocks that processes sequence-level features and co-evolutionary matrices to output this (L, L, 64) distogram tensor.
  3. Simultaneously, the model predicts the continuous probability distribution of the backbone dihedral torsion angles (φ, ψ), which dictate how the amino acid chain twists in 3D space.

By predicting a continuous probability curve for every single residue pair, the model captures rich, analog spatial gradients that represent the exact physical shapes of helices, sheets, and loops.

The Two-Stage Folding Pipeline

While AlphaFold 1 represents a massive leap forward in deep learning, it is not yet a fully "end-to-end" model. It operates in two distinct stages:

Stage 1: Neural Prediction
Uses a 220-block 2D ResNet to process 1D sequence and 2D co-evolution alignments, outputting high-fidelity continuous (L × L × 64) spatial distograms and (L × 128) backbone dihedral torsion angle distributions.
Stage 2: L-BFGS Folding
Compiles distance and angle probability outputs into a smooth potential energy function, then applies the classical L-BFGS gradient minimization algorithm to guide a random chain iteratively into its stable 3D coordinates.
AlphaFold 1 Two-Stage Pipeline STAGE 1: NEURAL PREDICTOR Inputs: Sequence + MSA profiles 220-Block 2D ResNet Outputs: (L x L x 64) Distogram Energy Potential STAGE 2: GRADIENT OPTIMIZATION Input: Smooth parameterized potential L-BFGS Minimization Result: Physical 3D Coordinates Hybrid Pipeline: Deep Learning + Classical Energy Minimization
Fig 5 — The AlphaFold 1 hybrid prediction and folding scheme. A 2D ResNet extracts structural gradients into analog continuous distograms, which are solved physically using L-BFGS.

This hybrid approach is incredibly elegant. It uses deep learning to solve the biophysical search problem (predicting the shape of the rugged funnel) and classical physics-based optimization to actually place the atoms.

Reflections in the Lab

For those of us working on deep learning in structural biology, AlphaFold 1 is a watershed moment. It proves that our sequence alignments contain vast, untapped spatial coordinates that only deep networks can extract.

But it also highlights the current limits of the field. The L-BFGS physical folding stage is slow, and compiling 2D probability distributions into physical potentials introduces hand-crafted approximations. It feels like an intermediate step—a brilliant bridge between classical biophysics and pure deep learning.

A Bold Thought for the Next Milestone

What if we could bypass L-BFGS entirely? What if a neural network could map sequences directly to 3D spatial coordinates in a single, end-to-end differentiable step, where the structural gradients flow directly from the 3D shapes back to the weights? Some researchers are already starting to explore this (like Mohammed AlQuraishi's work on Recurrent Geometric Networks). The next two years are going to be wild.

Thank you for following this PhD journal series (2017 - 2020). Compilation complete.


This is a post in the protein-structure-prediction series.
Other posts in this series:

w