AI Content Chat (Beta) logo

November. As with many other AI products, AlphaFold SNI INSight: How long does analysis with AlphaFold is based on freely available data that was largely gener- take? ated by academic institutions. Unlike the data used for training, the use of the trained models is now limited Janani Durairaj: The DeepMind team worked with to noncommercial applications — which poses the risk the European Bioinformatics Institute of the European of a potential monopolization of the new methods. In Molecular Biology Laboratory (EMBL-EBI) to set up the numerous projects, however, the scienti昀椀c community is AlphaFold database, which contains predictions for over already working to make equivalent methods available 200 million protein sequences. Users receive the predict- on an open-source basis as well. ed structure within seconds of entering the name of the protein. The time taken to predict the structure of a spe- SNI INSight: What is the resolution of the predicted ci昀椀c sequence not in this database depends on the length 3D structures? of the protein and the available computing resources. It varies between minutes and hours — signi昀椀cantly less Janani Durairaj: The resolution of protein structures is than the months and years needed for experimental 10 structural elucidation. generally measured in units of angstroms (10- m). These are of an order of magnitude that corresponds to the One very useful community resource is ColabFold, range of interatomic distances. Experimental methods an interactive web-based platform where users enter a of structural determination such as X-ray crystallography protein sequence and receive a structure predicted by achieve accuracies of up to 1 angstrom (Å). In monomer- AlphaFold with no need for an installation process or ic proteins (proteins with only one amino acid chain), computing resources. AlphaFold can also achieve this level of accuracy and therefore delivers results within the margin of error of experimental methods. SNI INSight: How reliable are AlphaFold’s predictions? Timm Maier: Janani Durairaj: With every structural prediction, Alpha- We should bear in mind that most proteins consist of Fold provides a quality estimate that helps us classify the multiple amino acid chains and that, in many cases, an reliability of the prediction. “average” accuracy is not very meaningful. For example, There are two key reasons for lower quality pre- individual mutations in a reactive center of an enzyme dictions. On the one hand, proteins are highly 昀氀exible can change the substrate speci昀椀city completely. As an av- and dynamic. The static structure we normally see there- erage across all amino acids, however, this only changes fore re昀氀ects only a single point in time or one possible the overall structure by a fraction of an angstrom. conformation of the protein. Some regions of proteins are in constant motion and are described as being in- trinsically disordered so attempts to assign a 昀椀xed set of SNI INSight: You’ve mentioned huge amounts of data. 3D coordinates to these regions do not make sense and What sort of scale are we talking about? would not produce interpretable results. On the other hand, there are many proteins for Janani Durairaj: If we consider all possible 3D confor- which only a few coevolutionary signals exist. Since mations that a protein can adopt, we’re talking about an astronomical number of possible conformations. For example, even with some underestimations, a protein with 100 amino acids could theoretically result in over 198 3 di昀昀erent conformations. It is therefore impossible to enumerate these conformations as an algorithmic solu- tion for predicting protein structures. Seen from another perspective, the “se- quence-structure gap” means that although sequencing projects operating around the world have provided mil- lions of protein sequences from wide-ranging organisms, researchers have only determined hundreds of thousands of empirical protein structures. This is because each structure can take years of complex studies to solve. By learning from these sequences and structures, AlphaFold allows high-accuracy predictions for many mil- lions of sequences, bringing us much closer to closing the Janani Durairaj, who is setting up her own research group as an Ambi- gap with plausible hypotheses. zione fellow, has worked intensively on the creation of the “Protein Uni- verse Atlas”. This is a web service that allows users to navigate through a universe of millions of known proteins. SNI INSight December 2024 6

SNI INSight December 2024 - Page 6 SNI INSight December 2024 Page 5 Page 7