AlphaFold 3 in Molecular Biology: Predicting Protein-Ligand Interactions and Viral Glycoproteins
1. Introduction
The accurate prediction of three-dimensional protein structures has been a central challenge in structural biology for decades. The advent of deep learning-based approaches, particularly the AlphaFold series developed by DeepMind, has fundamentally altered the landscape of protein structure prediction. AlphaFold 3 represents the most recent iteration of this architecture, extending its predictive capabilities beyond static protein folds to encompass protein-ligand complexes, protein-nucleic acid interactions, and post-translational modifications. This advancement holds particular significance for veterinary virology and molecular diagnostics, where the structural characterization of viral glycoproteins and their interactions with host receptors or small molecule inhibitors is critical for understanding pathogenesis and developing therapeutic interventions.
This review examines the underlying deep learning architecture of AlphaFold 3, evaluates its reported accuracy in predicting protein-ligand complexes, and explores its specific applications in modeling viral fusion proteins for drug discovery. The focus is restricted to veterinary pathogens and non-human host systems, with an emphasis on the biophysical mechanisms governing these interactions.
2. Deep Learning Architecture of AlphaFold 3
2.1 Evolution from AlphaFold 2
AlphaFold 3 builds upon the foundational principles established by its predecessor while introducing several architectural modifications that enhance its capacity to model complex molecular assemblies. The core innovation of AlphaFold 2 was the integration of a multiple sequence alignment (MSA) processing module with a structure module based on the equivariant transformer architecture. This approach enabled the direct prediction of atomic coordinates from sequence information without the need for explicit template-based modeling.
AlphaFold 3 extends this paradigm by incorporating a unified architecture that processes both protein and non-protein components within a single neural network framework. The model employs a diffusion-based generative approach for structure prediction, which differs fundamentally from the regression-based methods used in earlier iterations. This diffusion module operates by iteratively refining a noisy initial representation of atomic positions toward a converged structural model, guided by learned energy landscapes derived from the training data.
2.2 The Pairformer and Diffusion Module
The architecture of AlphaFold 3 can be decomposed into two primary components: the Pairformer module and the diffusion module. The Pairformer module is an evolution of the Evoformer architecture from AlphaFold 2, which processes pairwise representations of amino acid residues. In AlphaFold 3, the Pairformer module additionally processes pairwise representations between all atoms in the input complex, including those belonging to ligands, nucleic acids, and other non-protein entities. This enables the model to capture inter-molecular interactions directly during the inference process.
The diffusion module operates in a continuous space of atomic coordinates. It applies a learned noise schedule to the input coordinates and then iteratively denoises them to produce a final structure. This approach is conceptually similar to the methods used in image generation models but is adapted for the three-dimensional coordinate space of molecular structures. The denoising process is conditioned on the output of the Pairformer module, which provides the necessary contextual information about the molecular system.
2.3 Training Data and Representation
AlphaFold 3 was trained on a large corpus of experimentally determined structures from the Protein Data Bank (PDB), including protein-ligand complexes, protein-nucleic acid complexes, and protein-protein interfaces. The training data includes structures determined by X-ray crystallography, cryo-electron microscopy (cryo-EM), and nuclear magnetic resonance (NMR) spectroscopy. The model learns to predict the positions of all atoms in the system, including hydrogen atoms where appropriate, and can output confidence metrics in the form of predicted local distance difference test (pLDDT) scores and predicted aligned error (PAE) matrices.
A critical feature of the training process is the inclusion of ligand molecules as explicit entities within the input representation. Ligands are represented using a combination of graph-based features and atomic property vectors. This representation allows the model to learn the geometric and chemical constraints governing ligand binding without requiring precomputed docking poses or binding site annotations.
3. Accuracy in Predicting Protein-Ligand Complexes
3.1 Benchmarking Against Experimental Structures
The accuracy of AlphaFold 3 in predicting protein-ligand complexes has been evaluated in several independent benchmarking studies. Krokidis et al. [1] provided a comprehensive overview of the performance of AlphaFold 3 across multiple structural categories, including ligand-bound proteins. Their analysis demonstrated that AlphaFold 3 achieves significantly higher accuracy than AlphaFold 2 for complexes involving small molecule ligands, particularly when the ligand is bound in a well-defined binding pocket.
He et al. [2] conducted a focused assessment of AlphaFold 3 accuracy in predicting ligand-bound G protein-coupled receptors (GPCRs), a class of membrane proteins that are notoriously difficult to model due to their conformational flexibility and the diversity of their ligand binding modes. The study compared AlphaFold 3 predictions to experimentally determined structures of GPCR-ligand complexes. The authors reported that AlphaFold 3 successfully recapitulated the binding poses of orthosteric ligands in a majority of cases, with root-mean-square deviation (RMSD) values for ligand heavy atoms below 2.0 Angstroms in the top-ranked predictions. However, accuracy was lower for allosteric modulators and for ligands that induce significant conformational rearrangements in the receptor.
3.2 Identification of Missing Interactions
A critical limitation of earlier computational approaches for protein-ligand interaction prediction was the inability to identify binding interactions that are not present in the training data or that involve non-canonical binding modes. Escobedo et al. [3] addressed this issue by systematically analyzing AlphaFold predictions for protein-ligand complexes and comparing them to experimentally determined structures. Their study revealed that AlphaFold 3 can identify a substantial number of previously missing protein-ligand interactions, particularly those involving solvent-mediated hydrogen bonds and water-bridged interactions. These interactions are often critical for binding affinity but are difficult to predict using traditional docking methods that treat water molecules implicitly.
The ability of AlphaFold 3 to predict water positions within binding sites is a notable advancement. The model can place water molecules in positions that are consistent with crystallographic water networks, and these predictions can be used to refine docking scores or to identify potential sites for ligand modification.
3.3 Implications for Docking-Based Screening
The integration of AlphaFold 3 predictions into molecular docking workflows has been explored as a strategy to improve the accuracy of virtual screening campaigns. Wong et al. [4] benchmarked AlphaFold-enabled molecular docking predictions for antibiotic discovery, comparing the performance of docking against AlphaFold-predicted structures to docking against experimentally determined structures. The study found that docking against AlphaFold 3 structures achieved enrichment factors comparable to those obtained with experimental structures for a set of known antibiotic targets. This result suggests that AlphaFold 3 predictions can serve as reliable surrogates for experimental structures in cases where crystallographic data are unavailable or where the target protein is difficult to crystallize.
The utility of AlphaFold 3 for docking-based screening is particularly relevant for veterinary applications, where many viral and bacterial targets lack high-resolution structural data. The ability to generate accurate models of these targets from sequence information alone enables the rapid initiation of structure-based drug discovery programs.
4. Applications in Modeling Viral Glycoproteins
4.1 Viral Fusion Proteins as Therapeutic Targets
Viral glycoproteins, particularly those involved in membrane fusion and host cell entry, represent a major class of targets for antiviral drug development. These proteins undergo large-scale conformational rearrangements during the fusion process, transitioning from a metastable prefusion state to a stable postfusion state. The structural characterization of these states is essential for understanding the mechanism of action of fusion inhibitors and for designing compounds that stabilize the prefusion conformation.
AlphaFold 3 has been applied to the modeling of viral glycoproteins from several veterinary pathogens. The ability of the model to predict the structures of these proteins in complex with small molecule ligands or host receptor fragments provides a powerful tool for rational drug design.
4.2 Measles Virus Fusion Protein Stabilization
The measles virus (MeV) fusion protein (F protein) is a class I viral fusion protein that mediates membrane fusion between the viral envelope and the host cell. The prefusion state of the F protein is metastable and can be triggered to undergo a conformational change by the attachment glycoprotein. Small molecule stabilizers of the prefusion F protein have been identified as potential antiviral agents.
Abbou et al. [5] used long-timescale molecular dynamics simulations to investigate the structural basis for the stabilization of the measles virus prefusion F protein by a specific compound, cannabichromevarin. The study employed AlphaFold 3 to generate initial structural models of the F protein in complex with the compound, which were then refined using molecular dynamics simulations. The simulations revealed that the compound binds at the interface between the F protein protomers, stabilizing the prefusion conformation by restricting the movement of the fusion peptide. This structural insight provides a basis for the design of more potent stabilizers of the prefusion state.
4.3 Lassa Virus Glycoprotein-Mediated Membrane Fusion
Lassa virus (LASV) is an arenavirus that causes severe hemorrhagic fever in humans and can infect non-human primates. The viral glycoprotein complex (GPC) mediates membrane fusion and is a target for antiviral intervention. Close et al. [6] conducted an in silico prioritization and cheminformatics study to identify structurally diverse small molecule inhibitors of Lassa virus glycoprotein-mediated membrane fusion. The study used AlphaFold 3 to model the structure of the LASV GPC in its prefusion state and then performed docking simulations to identify compounds that bind to the fusion interface.
The study identified several compounds that inhibited LASV glycoprotein-mediated fusion in cell-based assays. The structural models generated by AlphaFold 3 were used to rationalize the structure-activity relationships of the identified compounds, demonstrating the utility of the approach for hit-to-lead optimization.
4.4 Rift Valley Fever Virus Glycoprotein and Host Receptor Interactions
Rift Valley fever virus (RVFV) is a bunyavirus that causes severe disease in livestock and can be transmitted to humans. The viral glycoprotein interacts with host cell receptors to facilitate entry. Fatma et al. [7] investigated the biochemical basis for the interaction between the RVFV glycoprotein and the host receptor low-density lipoprotein receptor-related protein 1 (LRP1). The study used AlphaFold 3 to model the structure of the RVFV glycoprotein in complex with LRP1, providing structural insights into the binding interface.
The AlphaFold 3 model revealed that the RVFV glycoprotein binds to LRP1 through a conserved domain within the glycoprotein ectodomain. This interaction is critical for viral entry, and the structural model was used to identify potential sites for therapeutic intervention. The study highlights the utility of AlphaFold 3 for modeling host-pathogen interactions at the molecular level.
4.5 Ebola Virus Glycoprotein Entry Inhibitors
Ebola virus (EBOV) is a filovirus that causes severe hemorrhagic fever. The viral glycoprotein (GP) mediates cell entry and is the primary target for neutralizing antibodies and entry inhibitors. Ait Lahcen et al. [8] combined computational and experimental approaches to identify natural product-based inhibitors of EBOV GP-mediated entry. The study used AlphaFold 3 to model the structure of the EBOV GP in its prefusion conformation and then performed docking simulations to screen a library of natural products.
The computational approach identified several compounds that bound to the EBOV GP with high predicted affinity. These compounds were subsequently validated in cell-based entry assays, demonstrating that the AlphaFold 3-guided screening approach can identify active compounds. The structural models provided insights into the binding modes of the identified inhibitors, which can be used to guide further optimization.
4.6 Nipah Virus Attachment Glycoprotein
Nipah virus (NiV) is a paramyxovirus that causes severe respiratory and neurological disease in humans and can infect a range of animal species, including pigs and horses. The viral attachment glycoprotein (G protein) mediates binding to host cell receptors. de Oliveira et al. [9] used molecular docking and dynamics simulations to investigate the interaction of natural limonoids with the NiV attachment glycoprotein. The study employed AlphaFold 3 to generate a structural model of the NiV G protein, which was then used for docking studies.
The simulations revealed that the limonoids bind to a conserved pocket on the NiV G protein, potentially interfering with receptor binding. The structural models provided by AlphaFold 3 were essential for identifying the binding site and for rationalizing the observed binding affinities.
5. Workflow for AlphaFold 3 in Viral Glycoprotein Modeling
The application of AlphaFold 3 to viral glycoprotein modeling typically follows a structured workflow. The following diagram illustrates the key steps in this workflow.
flowchart TD
A[Input: Viral Glycoprotein Sequence], > B[MSA Generation]
B, > C[Pairformer Processing]
C, > D[Diffusion Module Initialization]
D, > E[Iterative Denoising]
E, > F[Structure Output]
F, > G[Confidence Assessment]
G, > H{Structure Quality}
H, >|High pLDDT| I[Ligand Docking]
H, >|Low pLDDT| J[Loop Modeling or Refinement]
I, > K[Docking Pose Generation]
K, > L[Binding Affinity Prediction]
L, > M[Experimental Validation]
M, > N[Lead Optimization]
The workflow begins with the input of the viral glycoprotein sequence. Multiple sequence alignment generation is performed to identify homologous sequences and to capture evolutionary information. The Pairformer module processes the pairwise representations, and the diffusion module generates the initial structure. Iterative denoising refines the structure, and confidence metrics are assessed. Structures with high pLDDT scores are used directly for ligand docking, while structures with low confidence may require additional refinement. Docking poses are generated and evaluated for binding affinity, and the predicted interactions are validated experimentally.
6. Limitations and Considerations
6.1 Conformational Flexibility
A significant limitation of AlphaFold 3 is its treatment of protein conformational flexibility. The model is trained on static structures and does not explicitly account for the dynamic nature of protein-ligand interactions. This limitation is particularly relevant for viral glycoproteins, which undergo large conformational changes during the fusion process. The model may predict the most stable conformation of the protein, which may not correspond to the biologically relevant state for a given interaction.
6.2 Ligand Representation
The representation of ligands in AlphaFold 3 is based on a fixed set of chemical features. The model may not accurately predict the binding poses of highly flexible ligands or ligands that adopt multiple binding modes. The accuracy of the predictions is also dependent on the quality of the training data for the specific ligand class.
6.3 Validation Requirements
All predictions generated by AlphaFold 3 require experimental validation. The model can produce high-confidence predictions that are structurally plausible, but these predictions may not always correspond to the biologically relevant binding mode. Experimental validation using techniques such as X-ray crystallography, cryo-EM, or binding assays is essential for confirming the predicted interactions.
7. Conclusion
AlphaFold 3 represents a significant advancement in the field of structural biology, with the capacity to predict protein-ligand complexes and viral glycoprotein structures with high accuracy. The architecture of the model, based on the Pairformer and diffusion modules, enables the simultaneous processing of protein and non-protein components. The application of AlphaFold 3 to veterinary virology has enabled the structural characterization of viral fusion proteins and the identification of small molecule inhibitors. The integration of AlphaFold 3 predictions into molecular docking workflows has improved the accuracy of virtual screening campaigns and facilitated the discovery of novel antiviral compounds. Continued development of the model and its application to veterinary pathogens will further advance the field of structural virology and molecular diagnostics.
References
[1] Krokidis MG, Koumadorakis DE, Lazaros K et al. AlphaFold3: An Overview of Applications and Performance Insights. Int J Mol Sci. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40332289/
[2] He XH, Li JR, Shen SY et al. AlphaFold3 versus experimental structures: assessment of the accuracy in ligand-bound G protein-coupled receptors. Acta Pharmacol Sin. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39643640/
[3] Escobedo N, Saldaño T, Mac Donagh J et al. Revealing Missing Protein-Ligand Interactions Using AlphaFold Predictions. J Mol Biol. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39510344/
[4] Wong F, Krishnan A, Zheng EJ et al. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol Syst Biol. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/36065847/
[5] Abbou H, Gaouzi Z, Zegrari R et al. Identification of cannabichromevarin as a potent stabilizer of the measles virus prefusion F protein: structural insights from long-timescale molecular dynamics. Sci Rep. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42286054/
[6] Close B, La Rosa B, Ong C et al. In silico prioritization and cheminformatics identify structurally diverse small-molecule inhibitors of Lassa virus glycoprotein-mediated membrane fusion. SLAS Discov. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42285356/
[7] Fatma F, Price DA, Rush RE et al. Biochemical basis for LRP1 interaction with Rift Valley fever virus glycoprotein and its role in viral entry. J Mol Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42264132/
[8] Ait Lahcen N, Yang L, Chen W et al. Natural product-based Ebola virus entry inhibitors targeting the viral glycoprotein: A combined computational and experimental study. Antiviral Res. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42263866/
[9] de Oliveira VM, Marinho MM, Dos Santos HS et al. Molecular Docking and Dynamics Simulations of Natural Limonoids Interacting with the Nipah Virus Attachment Glycoprotein. Curr Drug Targets. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42261155/
[10] Amorim AMB, Marques-Pereira C, Almeida T et al. ViralBindPredict: empowering viral protein-ligand binding sites through deep learning and protein sequence-derived insights. Gigascience. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41578956/