Anfinsen's Dogma and Levinthal's Paradox: The Biophysical Riddle of Protein Folding
Oct 12, 2017Series: protein-structure-prediction
Anfinsen's Dogma and Levinthal's Paradox: The Biophysical Riddle of Protein Folding
Hello world. Last month, I officially unpacked my bags, registered my student ID card, and sat down at my desk to begin my PhD in computational structural biology. Coming straight out of a Computer Science background, the first few weeks have felt like being thrown headfirst into a swirling ocean of physical chemistry, thermodynamic formulas, and cellular biochemistry. My desk is currently buried under printouts of old biophysics papers.
But as I have spent the last few weeks wrestling with the basics, I’ve realized something incredibly exciting: the protein folding problem is fundamentally a computational search and optimization challenge disguised as biochemistry.
To kick off this blog series—which will document my research, my struggles, and the progress of the field—I want to lay down the absolute biophysical foundations. Before we talk about neural networks, convolutions, or sequence alignments, we have to understand the two central pillars of the field: Anfinsen’s Dogma and Levinthal’s Paradox.
The Master Key: Anfinsen’s Dogma
In the early 1960s, a biochemist named Christian Anfinsen conducted a series of elegant experiments at the National Institutes of Health that would earn him the 1972 Nobel Prize in Chemistry.
Anfinsen worked with a small, sturdy protein called Ribonuclease A (RNase A), an enzyme that degrades RNA. He placed RNase A in a solution of urea (which disrupts the weak non-covalent hydrogen bonds holding the protein’s shape together) and beta-mercaptoethanol (which breaks the strong covalent disulfide cross-links). This process is called denaturation or "unfolding." The once-tidy, beautifully folded, functional enzyme was reduced to a floppy, inactive, random-coil string of amino acids.
Here was the breakthrough question: If you remove these harsh chemicals, will the floppy string find its way back to its active shape?
As Anfinsen slowly dialled back the chemicals, allowing the protein to breathe in a physiological environment, it refolded perfectly on its own. It regained its exact original 3D shape and its biological activity.
Anfinsen's Dogma (Thermodynamic Hypothesis)
"For a small, globular protein in its native physiological environment, the three-dimensional active structure is completely determined by its primary amino acid sequence. Furthermore, this 'native state' corresponds to a unique, stable, and kinetically accessible global minimum of free energy."
To a computer scientist, this is a mind-blowing realization. It means that the 3D structure of a protein is pre-programmed directly into the 1D text string of its amino acids (e.g., M-K-V-L-L...). There is no external blueprint, no master builder assembly line. The laws of physics acting on the chemical groups of the amino acids are the sole orchestrators of the shape.
In algorithmic terms, if we can mathematically formulate the energy function of these atomic interactions, we should be able to write an optimization algorithm to "search" for the global minimum and predict the folded structure.
The Impossible Search: Levinthal’s Paradox
Enter Cyrus Levinthal. In 1969, Levinthal sat down to calculate the mathematical search space of this folding process. He asked a simple question: How long would it take a protein to find this global free-energy minimum if it did so by randomly searching through its possible shapes?
To put that number in perspective: 1.6 × 1028 years is quadrillions of times longer than the age of our universe (1.37 × 1010 years). If a single small protein folded by randomly checking shapes, it would never fold in the lifetime of the galaxy. Yet, in our cells, thousands of different proteins fold into their active shapes in microseconds to milliseconds.
Levinthal’s Paradox
"While the folded state of a protein represents its global thermodynamic energy minimum, the protein cannot possibly find this state via a random, unbiased search of its conformational space. Therefore, protein folding must be directed through a fast, biased physical pathway."
Resolving the Paradox: The Folding Funnel
How does nature bypass Levinthal’s impossible search space? Biophysicists resolved this in the 1990s through the Energy Landscape Theory of protein folding. Instead of a flat golf course with a single microscopic hole (where a random search is futile), the energy landscape of a protein is shaped like a rugged funnel.
As the unfolded amino acid chain begins to fold, it doesn't wait to check every structure. Instead, local regions immediately interact. Hydrophobic (water-fearing) amino acids rapidly collapse inward to shield themselves from water, while hydrophilic (water-loving) residues face outward. This "hydrophobic collapse" dramatically restricts the conformational space.
As the protein rolls down the funnel, the landscape guides it. The search space shrinks exponentially at every step. Local secondary structures (like α-helices and β-sheets) stabilize, acting as physical guide rails that funnel the protein directly toward its native global minimum.
The Computer Science Challenge
As a computer science graduate entering this field in late 2017, this is where the puzzle gets incredibly juicy.
Anfinsen proved the solution exists and is encoded in the sequence text. Levinthal proved that simulating every atomic collision randomly is computationally impossible. Nature proves that a highly biased, cooperative search works in milliseconds.
If we want to predict a protein's structure from its sequence, we have two primary routes:
- De Novo / Ab Initio Physics Simulation: We write massive molecular dynamics simulators (like CHARMM or AMBER) that attempt to compute the quantum and classical forces on every atom, trying to trace the rugged funnel step-by-step. This is computationally agonizing and struggles to scale past tiny proteins, even on supercomputers.
- Knowledge-Based Modeling: We look at the proteins nature has already folded (documented in the Protein Data Bank) and use their structural statistical properties to skip the physical simulation entirely, learning the mapping from sequence directly to structure.
My PhD is starting right as deep learning is beginning to assert itself in this second category. If we can train deep models to understand the "grammar" of amino acid sequences and how they relate to spatial geometry, we might just bypass Levinthal's paradox entirely with software.
PhD Research Goals (Autumn 2017)
- Acquire structural data from the Protein Data Bank (PDB) to extract clean mapping datasets.
- Explore encoding models for sequence features. Currently reviewing BLAST profiles.
- Target Milestone 1: Design a neural network architecture to predict 3-state Secondary Structure (Helix, Sheet, Coil) from sequence alone.
Next Log Entry: Secondary Structure Prediction with Bi-LSTMs and ResCNNs.
This is a post in the protein-structure-prediction series.
Other posts in this series:
- Dec 08, 2020 - The CASP14 Watershed: AlphaFold 2 and the Dawn of End-to-End Attention
- Feb 15, 2019 - AlphaFold 1: The Distogram Revolution at CASP13
- Nov 10, 2018 - CASP: The Olympic Arena of Double-Blind Structural Biology
- Jul 15, 2018 - Evolution's Mathematical Whispers: Co-Evolution and the DCA Puzzle
- Feb 20, 2018 - From Pixels to Peptides: Predicting Secondary Structure with Bi-LSTMs and ResCNNs
- Oct 12, 2017 - Anfinsen's Dogma and Levinthal's Paradox: The Biophysical Riddle of Protein Folding