CASP: The Olympic Arena of Double-Blind Structural Biology

Nov 10, 2018

CASP: The Olympic Arena of Double-Blind Structural Biology

Author: Ersi Ni | Date: Nov 10, 2018 | Log Entry: PhD Journal #4 | Focus: Independent Blind Evaluation Metrics

The mood in our lab this week is electric, mixed with a healthy dose of anxiety. In less than a month, the structural biology community will gather in Riviera Maya, Mexico, for the thirteenth Critical Assessment of Structure Prediction (CASP13) conference. For the uninitiated, CASP is the ultimate double-blind Olympics of our field.

As a researcher, CASP is the most fascinating scientific benchmark I have ever encountered. Today, I want to write about how CASP works, why it is the gold standard for avoiding machine learning "self-deception," and the mathematical metrics we use to evaluate whether a predicted protein matches reality.

The Ultimate Truth: How CASP Prevents Cheating

In machine learning, it is incredibly easy to lie to yourself. You design a model, train it on a dataset, test it on a held-out split, and celebrate a high accuracy score. But in biology, hidden biases, data leakage (such as training on proteins structurally similar to your test set), and subtle overfitting are constant hazards.

CASP was founded in 1994 by John Moult and Krzysztof Fidelis to solve this exact problem. It is designed as a true double-blind experiment:

The Targets: Organizers identify experimental structural biologists who have mapped out a protein's 3D structure but have not yet published it.
The Challenge: Organizers release the 1D amino acid sequence to the computational community.
The Blind Prediction: Computational groups have exactly three weeks to run their algorithms and submit predicted 3D coordinates back to the organizers.
The Evaluation: Independent assessors compare the submitted predictions against the secretly held experimental structures.

There is zero opportunity to overfit, cheat, or tweak parameters post-hoc. Your model either understands the biophysical principles of folding, or it crashes and burns on the grand stage.

Measuring Similarity: GDT_TS and TM-Score

How do you mathematically compare two folded proteins? You can't just overlay them and calculate the standard Root-Mean-Square Deviation (RMSD) of all atoms. If a protein has a flexible loop that is bent slightly out of place, a simple RMSD calculation will yield a massive error, even if the rest of the massive protein is aligned perfectly.

To resolve this, the community relies on two highly robust metrics:

GDT_TS (Global Distance Test)

Measures what percentage of the predicted alpha-carbon (Cα) atoms can be aligned within specific physical distance boundaries (1Å, 2Å, 4Å, 8Å) of the experimental structure. A score above 90 represents outstanding predictions competitive with experimental resolution.

TM-Score (Template Modeling)

A length-independent metric that weights spatial distances dynamically according to target sequence size. A TM-Score above 0.5 indicates that the two structures share the same global fold topology, preventing length-skewed averages.

The Historical 30-40 GDT Plateau

To understand why everyone in our lab is pacing around their desks, we have to look at the history of CASP on Free Modeling (FM) targets. Free Modeling targets are the hardest class of proteins—they have absolutely zero known structural templates in the public databases. Computational models must predict their shape ab initio (from scratch).

For over twenty years—from CASP1 in 1994 to CASP12 in 2016—the median GDT_TS score for Free Modeling targets was stuck in a depressing 30 to 40 GDT_TS plateau. Every two years, groups would present minor, incremental improvements, but the physical forces were simply too complex, and the search space too vast, for classical energy minimization to crack.

Fig 4 — The CASP double-blind evaluation pipeline. Blind target releases ensure computational algorithms represent generalizable physical knowledge rather than statistical overfitting.

The CASP13 Rumor Mill

This year, however, things feel different. DeepMind—the Google AI division that conquered Chess and Go with AlphaGo—entered CASP13 under the team name A7D.

Whispers have been circulating through the computational biology departments. Word on the street is that A7D’s predictions have completely shattered the historical Free Modeling plateau. The rumors suggest they are hitting median GDT scores well past 50, pushing toward 60, using deep neural networks that predict continuous distance maps rather than binary contact matrices.

Lab Whispers (November 2018)

If these rumors are true, we are about to witness the first major tectonic shift in structural biology in decades—proving that deep learning can extract spatial geometry directly from genetic sequence database evolution. I'm packing my bags for Mexico, and my next post will be a deep-dive analysis of the CASP13 results once the embargo is lifted.

Next Log Entry: AlphaFold 1: The Distogram Revolution at CASP13.

This is a post in the protein-structure-prediction series.
Other posts in this series:

Dec 08, 2020 - The CASP14 Watershed: AlphaFold 2 and the Dawn of End-to-End Attention
Feb 15, 2019 - AlphaFold 1: The Distogram Revolution at CASP13
Nov 10, 2018 - CASP: The Olympic Arena of Double-Blind Structural Biology
Jul 15, 2018 - Evolution's Mathematical Whispers: Co-Evolution and the DCA Puzzle
Feb 20, 2018 - From Pixels to Peptides: Predicting Secondary Structure with Bi-LSTMs and ResCNNs
Oct 12, 2017 - Anfinsen's Dogma and Levinthal's Paradox: The Biophysical Riddle of Protein Folding

The Knight who says Ni

CASP: The Olympic Arena of Double-Blind Structural Biology

The Ultimate Truth: How CASP Prevents Cheating

Measuring Similarity: GDT_TS and TM-Score

The Historical 30-40 GDT Plateau

The CASP13 Rumor Mill

Lab Whispers (November 2018)