AlphaFold 1: The Distogram Revolution at CASP13
Feb 15, 2019Series: protein-structure-prediction
AlphaFold 1: The Distogram Revolution at CASP13
The rumors were not only true—they were understated. I am back from the CASP13 conference in Mexico, and the computational biology community is reeling. DeepMind's team A7D has completely dominated the Free Modeling category, scoring a median GDT_TS of 58.9, while the runner-up computational groups hovered in the high 40s.
This represents the first major rupture in the historical 30-40 GDT plateau, and it was achieved using deep neural networks. Today, I want to unpack the technical details of DeepMind's first major entry—AlphaFold 1—explaining the distogram revolution and the hybrid architecture of neural prediction and physical optimization.
Bypassing the Binary: The Distogram Breakthrough
For years, deep learning groups in our field (like RaptorX-Contact) treated structural prediction as a binary image classification task. They trained networks to output a contact map: a 2D grid of 1s and 0s indicating whether residue i and residue j were closer than 8Å.
But a binary threshold is a blunt biophysical instrument. A distance of 8.1Å is chemically very similar to 7.9Å, yet a binary map treats them as completely opposite. Furthermore, it throws away all precise geometric shape information.
AlphaFold 1’s first major breakthrough was replacing the binary contact map with a Distogram (a distance histogram):
- Instead of a single binary classifier, the model outputs a probability distribution over 64 continuous distance bins spanning from 2Å to 22Å (e.g., 2.0-2.3Å, 2.3-2.6Å, etc.).
- The network is a massive 2D Residual Network (ResNet) with 220 blocks that processes sequence-level features and co-evolutionary matrices to output this
(L, L, 64)distogram tensor. - Simultaneously, the model predicts the continuous probability distribution of the backbone dihedral torsion angles (φ, ψ), which dictate how the amino acid chain twists in 3D space.
By predicting a continuous probability curve for every single residue pair, the model captures rich, analog spatial gradients that represent the exact physical shapes of helices, sheets, and loops.
The Two-Stage Folding Pipeline
While AlphaFold 1 represents a massive leap forward in deep learning, it is not yet a fully "end-to-end" model. It operates in two distinct stages:
This hybrid approach is incredibly elegant. It uses deep learning to solve the biophysical search problem (predicting the shape of the rugged funnel) and classical physics-based optimization to actually place the atoms.
Reflections in the Lab
For those of us working on deep learning in structural biology, AlphaFold 1 is a watershed moment. It proves that our sequence alignments contain vast, untapped spatial coordinates that only deep networks can extract.
But it also highlights the current limits of the field. The L-BFGS physical folding stage is slow, and compiling 2D probability distributions into physical potentials introduces hand-crafted approximations. It feels like an intermediate step—a brilliant bridge between classical biophysics and pure deep learning.
A Bold Thought for the Next Milestone
What if we could bypass L-BFGS entirely? What if a neural network could map sequences directly to 3D spatial coordinates in a single, end-to-end differentiable step, where the structural gradients flow directly from the 3D shapes back to the weights? Some researchers are already starting to explore this (like Mohammed AlQuraishi's work on Recurrent Geometric Networks). The next two years are going to be wild.
Thank you for following this PhD journal series (2017 - 2020). Compilation complete.
This is a post in the protein-structure-prediction series.
Other posts in this series:
- Dec 08, 2020 - The CASP14 Watershed: AlphaFold 2 and the Dawn of End-to-End Attention
- Feb 15, 2019 - AlphaFold 1: The Distogram Revolution at CASP13
- Nov 10, 2018 - CASP: The Olympic Arena of Double-Blind Structural Biology
- Jul 15, 2018 - Evolution's Mathematical Whispers: Co-Evolution and the DCA Puzzle
- Feb 20, 2018 - From Pixels to Peptides: Predicting Secondary Structure with Bi-LSTMs and ResCNNs
- Oct 12, 2017 - Anfinsen's Dogma and Levinthal's Paradox: The Biophysical Riddle of Protein Folding