Proteins are one of the most important molecules of life, with almost every biological function from birth to death being regulated by them in some way. Each protein is made up of a string of smaller building blocks called amino acids, which contain all the information to transform proteins — from a single sequence to a folded, functional 3D structure.
The steps a protein takes to go from its straight form to its final form are too many to count and too hard to follow, leaving the question of how every protein folds — the famous protein-folding problem — unanswered. “If you want to understand the molecular basis of how cells work, how organisms work, how life works, you need to understand how proteins get their shape,” Frank Uhlmann, a biochemist at the Francis Crick Institute in London, said.
Answers ex machina
Things changed when Google DeepMind’s protein-structure prediction software AlphaFold burst into the scene in 2020. They changed more drastically in 2021 with the highly improved AlphaFold 2. AlphaFold uses machine learning and artificial intelligence (AI) to accurately predict protein structures from an amino acid sequence, seemingly solving the protein-folding problem without learning any of the deeper physical principles that drive this biological process.
“If the protein folding problem was set to us by God to teach us how to learn molecular interactions from first principles, we cheated,” Derek Lowe, author of the Science column “In the pipeline” and long-time pharmaceutical researcher, told The Hindu. “We haven’t learned a tremendous amount more about that. We have figured out how they usually do it, even if we don’t know why.”
“It’s startling how it works as well as it does.”
Now, in a Nature paper published in May 2024, scientists at DeepMind led by John Jumper introduced AlphaFold 3, building on its predecessors with even more transformative capabilities. AlphaFold 3 can predict protein-protein interactions as well as the structures of other molecules like DNA and RNA, along with the interactions of proteins with all these other compounds.
Democratising research
“AlphaFold 2 predicted the structure of proteins with revolutionary levels of accuracy,” Josh Abramson, a research engineer at DeepMind and lead author of the new paper, told The Hindu in an email.
“AlphaFold 3 is even more accurate for proteins, but can also predict the structure of DNA, RNA, and all the other molecular components that make up biology. The interaction of all these biomolecules is what makes up the processes of life, so it is important to be able to predict the structure of these interactions.”
Apart from being able to give us a lot more insight into biological processes, the new AlphaFold is also more usable by scientists who aren’t experts in machine learning. Dr. Uhlmann, who has been using AlphaFold 3 to study how proteins and DNA interact in chromosomes, said, “You don’t need to know anything about coding, now literally everybody can do it. All you need is a Google account, you can upload protein sequences in the DeepMind server, and 10 minutes later you get your results. That completely democratises structure prediction research.”
From noise to signal
The original AlphaFold was trained on the thousands of sequences and protein structures present in the protein data bank, a giant protein repository where scientists submit experimentally determined protein structures. “It is completely ignoring all the fundamental physics and thermodynamics, it’s modelling based on learning what real structures tend to look like, taking advantage of tendencies of protein structures that are too subtle for humans to realise,” Dr. Lowe said.
Unlike its predecessors, AlphaFold 3 uses a diffusion model, which is what image-generating software also uses. The model works by first training on protein structures, adding noise to the data, and then trying to de-noise it. This way, the model becomes able to work its way back from a noisy structure to a real protein structure. This architecture also helps AlphaFold 3 handle a much larger input dataset.
A reliability problem
Its accuracy at predicting protein-protein interactions is also incredibly high — but not its reliability when it comes to interactions between small molecules and proteins. Proteins use a language of 20 amino acids whereas small molecule ligands “have a much larger vocabulary”, according to Dr. Lowe.
Greater variations in the dataset and the use of diffusion techniques can lead to the model coming up with answers that look plausible but aren’t real. Adding more training data can help circumvent this problem, but not entirely get rid of it.
Nevertheless, AlphaFold 3 predicts protein structures and interactions better than other models right now. Academics and companies can potentially use it to find drug candidates that can bind to proteins and help cure diseases. In fact, DeepMind’s spin-off company Isomorphic Labs is using AlphaFold 3 for this very purpose: drug discovery. However, this option isn’t open to everyone yet.
A peek under the hood
Additionally, even though scientists are free to use the AlphaFold server to upload their protein sequences, many researchers are irked at not being able to access the model’s full code. This means they can’t play around with its nuts and bolts and modify it for specific use-cases.
An important implication of this lack of access is that it’s currently impossible to use AlphaFold 3 to find structures of proteins bound to drug candidates. Researchers expressed their disappointment in an open letter signed by more than 600 to date. According to the text, the restriction “does not align with the principles of scientific progress, which rely on the ability of the community to evaluate, use, and build upon existing work.” Different groups have also begun a race to crack the model’s code and make open-source versions.
Responding to the backlash, DeepMind scientists have also changed their initial stance of not releasing the whole code to saying they will do so in six months.
The journey begins
For now, we need to wait and watch how DeepMind decides to let eager scientists look under the hood and examine AlphaFold 3 more closely, to appreciate its full power. But until then, the model remains one of the best AI-based protein structure prediction models out there, now with the ability to predict interactions with other kinds of biological structures as well.
At the same time, both Dr. Lowe and Dr. Uhlmann wanted to be clear that even if AlphaFold 3 makes very good predictions, it shouldn’t be treated as an “infallible oracle”. Instead, it offers a goodstarting point where scientists can obtain some answers, which they can then build on with further experiments and expert analysis.
“It’s a prediction, you can’t take it for granted,” Dr. Uhlmann said. “It’s not solving your question, but it’s a new and exciting discovery tool that helps you build and test new hypotheses.”
Rohini Subrahmanyam is a freelance journalist with a PhD in biology from the National Centre for Biological Sciences, Bengaluru.