This piece focuses on a specific application of AI – the use of AI by Dennis Hassabis (the man who in 2016 used AI to train DeepMind to beat the world champion in Go, Lee Sedol of Korea) to predict the structure of proteins. Reading the piece helps us understand how technology, and specifically AI, will be able crack open the toughest problems faced by scientists, problems which have for long acted as optimisation constraints for the world around us.
So why is predicting the structure of proteins such a big deal and why is DeepMind interested in cracking this problem? “The three-dimensional structure of proteins determines how they behave and interact in the body. But a large number of important proteins have structures that biologists still don’t know. Using AI to accurately predict them would offer an invaluable tool to help understand diseases, from cancer to covid. Proteins are a primary target for many drugs and a key ingredient in new therapeutics. Quickly unlocking their structures would fast-track the development of new therapies and vaccines.
In 2020 DeepMind, which is owned by Alphabet, revealed AlphaFold2, an AI that could predict the shape of proteins down to the nearest atom. “It’s the most complex thing we’ve ever done,” says Hassabis.
AlphaFold’s success is part of a bigger story, too, signaling a change of direction for the AI lab. The company’s focus is shifting from games to science, where it hopes to have a bigger real-world impact.”
The author then helps us understand why AI is well suited for cracking a problem of this nature: “Nearly everything your body does, it does with proteins: they digest food, contract muscles, fire neurons, detect light, power immune responses, and much more. Understanding what individual proteins do is therefore crucial for understanding how bodies work, what happens when they don’t, and how to fix them.
A protein is made up of a ribbon of amino acids, which chemical forces fold up into a knot of complex twists and twirls. The resulting 3D shape determines what it does. For example, hemoglobin, a protein that ferries oxygen around the body and gives blood its red color, is shaped like a little pouch, which lets it pick up oxygen molecules in the lungs. The structure of SARS-CoV-2’s spike protein lets the virus hook onto your cells.
The catch is that it’s hard to figure out a protein’s structure—and thus its function—from the ribbon of amino acids. An unfolded ribbon can take 10^300 possible forms, a number on the order of all the possible moves in a game of Go.
Predicting this structure in a lab, using techniques such as x-ray crystallography, is painstaking work. Entire PhDs have been spent working out the folds of a single protein. The long-running CASP (Critical Assessment of Structure Prediction) competition was set up in 1994 to speed things up by pitting computerized prediction methods against each other every two years. But no technique ever came close to matching the accuracy of lab work. By 2016, progress had been flatlining for a decade.”
So how much progress has DeepMind in understanding the structure of proteins? You might want to read what follows carefully to understand why AI will end up being much more Siri and Alexa and credit risk scoring: “Watching AlphaGo play in Seoul, Hassabis says, he’d been reminded of an online game called FoldIt, which a team led by David Baker, a leading protein researcher at the University of Washington, released in 2008. FoldIt asked players to explore protein structures, represented as 3D images on their screens, by folding them up in different ways. With many people playing, the researchers behind the game hoped, some data about the probable shapes of certain proteins might emerge. It worked, and FoldIt players even contributed to a handful of new discoveries.
Hassabis played that game when he was a postdoc at MIT in his 20s. He was struck by the way basic human intuition could lead to real breakthroughs, whether making a move in Go or finding a new configuration in FoldIt.
“I was thinking about what we had actually done with AlphaGo,” says Hassabis. “We’d mimicked the intuition of incredible Go masters. I thought, if we can mimic the pinnacle of intuition in Go, then why couldn’t we map that across to proteins?”
The two problems weren’t so different, in a way. Like Go, protein folding is a problem with such vast combinatorial complexity that brute-force computational methods are no match. Another thing Go and protein folding have in common is the availability of lots of data about how the problem could be solved. AlphaGo used an endless history of its own past games; AlphaFold used existing protein structures from the Protein Data Bank, an international database of solved structures that biologists have been adding to for decades.
AlphaFold2 uses attention networks, a standard deep-learning technique that lets an AI focus on specific parts of its input data. This tech underpins language models like GPT-3, where it directs the neural network to relevant words in a sentence. Similarly, AlphaFold2 is directed to relevant amino acids in a sequence, such as pairs that might sit together in a folded structure. “They wiped the floor with the CASP competition by bringing together all these things biologists have been pushing toward for decades and then just acing the AI,” says Stevens.
Over the past year, AlphaFold2 has started having an impact. DeepMind has published a detailed description of how the system works and released the source code. It has also set up a public database with the European Bioinformatics Institute that it is filling with new protein structures as the AI predicts them. The database currently has around 800,000 entries, and DeepMind says it will add more than 100 million—nearly every protein known to science—in the next year.
A lot of researchers still don’t fully grasp what DeepMind has done, says Charlotte Deane, chief scientist at Exscientia, an AI drug discovery company based in the UK, and head of the protein informatics lab at the University of Oxford. Deane was also one of the reviewers of the paper that DeepMind published on AlphaFold in the scientific journal Nature last year. “It’s changed the questions you can ask,” she says.”
If you want to read our other published material, please visit https://marcellus.in/blog/
Note: The above material is neither investment research, nor financial advice. Marcellus does not seek payment for or business from this publication in any shape or form. The information provided is intended for educational purposes only. Marcellus Investment Managers is regulated by the Securities and Exchange Board of India (SEBI) and is also an FME (Non-Retail) with the International Financial Services Centres Authority (IFSCA) as a provider of Portfolio Management Services. Additionally, Marcellus is also registered with US Securities and Exchange Commission (“US SEC”) as an Investment Advisor.