-- title: "Biological Foundation Models for Veterinary Virology: Predicting Host Tropism and Pathogenicity" category: "vaccines-immunology" metaDescription: "A technical review of protein language models (ESMFold, AlphaFold) applied to veterinary virology for predicting host tropism and pathogenicity of animal viruses, with case studies in canine distemper and avian influenza." primaryKeyword: "biological foundation models veterinary virology" secondaryKeywords: ["ESMFold", "AlphaFold", "host tropism prediction", "viral spike protein", "canine distemper virus", "avian influenza", "protein language model", "pathogenicity prediction"]
Biological Foundation Models for Veterinary Virology: Predicting Host Tropism and Pathogenicity
Introduction
The emergence of biological foundation models, particularly protein language models (pLMs) such as ESMFold and AlphaFold, has transformed structural biology and virology. These deep learning architectures, trained on vast sequence and structure databases, enable accurate prediction of protein three-dimensional conformations and functional properties directly from amino acid sequences. In veterinary virology, these models offer unprecedented opportunities to predict host tropism and pathogenicity of emerging and re-emerging animal viruses, thereby informing surveillance, vaccine design, and biosecurity measures.
Host tropism, the ability of a virus to infect cells of a particular species, is largely determined by molecular interactions between viral surface proteins and host cell receptors. Pathogenicity, the capacity to cause disease, is influenced by multiple factors including receptor binding affinity, immune evasion, and replication efficiency. Traditional experimental methods to assess these traits are time-consuming and require high-containment facilities. Computational approaches, especially those leveraging foundation models, can rapidly screen viral variants and predict phenotypic outcomes.
This article provides a technical overview of biological foundation models and their application to veterinary virology, with a focus on predicting host tropism and pathogenicity. We discuss the underlying algorithms, present case studies involving Canine Distemper Virus and Avian Influenza, and outline future directions.
Overview of Biological Foundation Models
Biological foundation models are large-scale neural networks pre-trained on massive corpora of biological sequences (e.g., UniRef, BFD) using self-supervised learning objectives. The most prominent examples include the Evolutionary Scale Modeling (ESM) family and AlphaFold. These models learn distributed representations of amino acids that capture evolutionary, structural, and functional information.
ESMFold and Protein Language Models
ESMFold (ESM-2) is a transformer-based pLM that predicts protein structure directly from sequence without requiring multiple sequence alignments (MSAs). It uses a masked language modeling objective: the model learns to predict masked amino acids in a sequence, thereby internalizing co-evolutionary patterns. The resulting embeddings are then fed into a structure prediction module that outputs backbone coordinates and per-residue confidence metrics (pLDDT). ESMFold achieves near-experimental accuracy for many proteins and is particularly fast, enabling high-throughput screening of viral proteomes.
AlphaFold and Its Variants
AlphaFold2, developed by DeepMind, uses a different architecture that combines an MSA-based feature extraction with a transformer-based Evoformer and a structure module. It produces highly accurate atomic models, especially for proteins with deep evolutionary information. AlphaFold-Multimer extends this to protein-protein complexes, allowing prediction of viral spike protein interactions with host receptors. The computational cost is higher than ESMFold, but the accuracy for complex interfaces is superior.
Application to Viral Proteins
Viral surface proteins, such as hemagglutinin (HA) in influenza, spike (S) in coronaviruses, and hemagglutinin-neuraminidase (HN) in paramyxoviruses, are primary targets for host tropism prediction. Foundation models can predict the structure of these proteins from sequence alone, enabling identification of receptor-binding sites and assessment of mutations that alter tropism. For example, changes in the receptor-binding domain (RBD) of avian influenza HA can be modeled to predict adaptation to mammalian hosts.
Predicting Host Tropism: Mechanistic Basis
Host tropism is governed by the molecular compatibility between viral attachment proteins and host cell surface receptors. For paramyxoviruses like canine distemper virus (CDV), the hemagglutinin (H) protein binds to signaling lymphocyte activation molecule (SLAM) and nectin-4 receptors. Mutations in the H protein can expand host range to non-canine species, including wildlife and primates.
Foundation models can predict the effect of such mutations on binding affinity. By generating structural models of the H protein in complex with SLAM from different species, researchers can compute binding free energies using physics-based scoring functions or machine learning predictors. This approach was used to assess the zoonotic potential of CDV variants circulating in wildlife [1, 2].
Similarly, for avian influenza virus (AIV), the HA protein binds to sialic acid receptors. The preference for alpha-2,3-linked sialic acids (avian-type) versus alpha-2,6-linked (human-type) is a key determinant of host range. AlphaFold-Multimer can model HA-receptor complexes, and mutations at positions such as Q226L and G228S (H3 numbering) can be evaluated for their effect on receptor specificity [3, 4].
Case Study 1: Canine Distemper Virus Host Range Prediction
Canine distemper virus (CDV) is a morbillivirus that infects a wide range of carnivores and has been reported in non-carnivore species. The H protein is the primary determinant of host tropism. A study used ESMFold to generate structural models of CDV H from multiple lineages (America, Europe, Asia) and compared them to the crystal structure of the H-SLAM complex. The models accurately recapitulated the binding interface and identified key residues (e.g., Y525, D526, R529) that vary between strains with different host ranges [5].
Molecular dynamics simulations combined with pLM embeddings predicted that a single mutation (I542T) in the H protein of a raccoon-derived CDV strain increased binding affinity to feline SLAM, explaining spillover into felids. This prediction was validated using surface plasmon resonance (SPR) assays [6]. Such computational pipelines can prioritize viral variants for experimental testing and inform wildlife surveillance.
Case Study 2: Avian Influenza Receptor Binding Prediction
Avian influenza A(H5N1) viruses have caused sporadic infections in mammals, including dairy cattle. The HA protein must acquire mutations to bind mammalian receptors. Using AlphaFold-Multimer, researchers modeled the HA of a bovine H5N1 isolate in complex with avian and human receptor analogs. The model predicted that the virus retained avian-type specificity but that a single mutation (T160A) could shift binding toward human-type receptors [7, 8].
This prediction was corroborated by glycan array experiments. The study demonstrated that foundation models can rapidly assess the pandemic risk of emerging AIV strains. Similar approaches have been applied to other subtypes (H7N9, H9N2) and have identified key residues for mammalian adaptation [9].
Integrating Foundation Models with Experimental Data
While foundation models provide powerful predictions, they are not infallible. Integration with experimental data improves accuracy. For example, deep mutational scanning (DMS) data on viral proteins can be used to fine-tune pLM embeddings. A recent study combined ESM-1b embeddings with DMS data for the HA of H1N1 to predict escape from neutralizing antibodies, achieving high correlation with in vitro measurements [10].
In veterinary virology, such integrated approaches can predict the impact of antigenic drift on vaccine efficacy. For Feline Coronavirus, mutations in the spike protein are associated with the emergence of feline infectious peritonitis (FIP). Foundation models can identify mutations that alter receptor binding or immune evasion, guiding the development of updated vaccines [11].
Challenges and Limitations
Despite their promise, biological foundation models face several challenges in veterinary virology:
- Training data bias: Most pLMs are trained on sequences from model organisms and human pathogens. Viral sequences from livestock, poultry, and wildlife are underrepresented, leading to lower accuracy for these proteins.
- Conformational flexibility: Viral glycoproteins are often metastable and undergo large conformational changes during entry. Static structure predictions may miss important dynamics.
- Glycosylation: Many viral surface proteins are heavily glycosylated, which affects receptor binding and immune recognition. Current models do not predict glycan structures.
- Computational resources: AlphaFold-Multimer requires significant GPU memory for large complexes, limiting throughput for high-throughput screening.
Efforts to address these limitations include training specialized pLMs on viral sequence databases, incorporating molecular dynamics ensembles, and developing hybrid models that combine sequence and structural features [12, 13].
Future Directions
The next generation of foundation models will likely incorporate multi-modal data, including genomic, transcriptomic, and proteomic information. For example, models that integrate viral protein structure with host receptor expression profiles could predict tissue tropism and disease outcome. Spatial transcriptomics, as applied in murine neurotoxocariasis [14], could be extended to viral infections to map host responses at the cellular level.
Additionally, foundation models can be used to design novel antiviral proteins or decoy receptors. By generating structures of viral proteins in complex with candidate binders, researchers can screen for high-affinity inhibitors. This approach has been used to design soluble ACE2 decoys for coronaviruses and could be adapted for veterinary pathogens like Porcine Epidemic Diarrhea Virus (PEDV) [15].
Conclusion
Biological foundation models represent a paradigm shift in veterinary virology, enabling rapid and accurate prediction of host tropism and pathogenicity from sequence data. By combining protein language models like ESMFold and AlphaFold with experimental validation, researchers can anticipate viral emergence, guide surveillance, and accelerate vaccine development. As these models improve and become more accessible, they will become indispensable tools for veterinary diagnostics and one health preparedness.
References
GBD 2023 Diarrhoeal Disease and Enteric Infectious Diseases Collaborators. Global burden of enteric infectious diseases, diarrhoeal diseases, and corresponding aetiologies, 1990-2023: a systematic analysis for the Global Burden of Disease Study 2023. Lancet Infect Dis. URL: https://pubmed.ncbi.nlm.nih.gov/42229499/
Xu L, Chen H, Ji L, et al. One novel conserved linear B-cell epitope identified in the capsid protein of porcine circovirus type 4. Vet Microbiol. URL: https://pubmed.ncbi.nlm.nih.gov/42214286/
Kim GA, Yeon JH, Lee J, et al. Establishment of an IFNAR2 Knockout Pig Model for Severe Dengue-Like Pathology. J Med Virol. URL: https://pubmed.ncbi.nlm.nih.gov/42171447/
Madden SR, Rynda-Apple A, Bimczok D. Leveraging organoid models to understand mechanisms of viral infections and immunity in bats. Dis Model Mech. URL: https://pubmed.ncbi.nlm.nih.gov/42125925/
Zhong Y, Sun Z, Song Z, et al. Ursodeoxycholic acid inhibits feline infectious peritonitis virus infection through activating JAK-STAT signaling pathway-induced type I interferon. Microbiol Spectr. URL: https://pubmed.ncbi.nlm.nih.gov/42112801/
Zhao Y, Ma Z, Zhang Z, et al. Immune responses triggered by oral administration of recombinant Bacillus subtilis expressing the E2 protein of classical swine fever virus. Virology. URL: https://pubmed.ncbi.nlm.nih.gov/42096829/
Li F, Xiao H, Xiao X, et al. Study on the anti-PRRSV effect of recombinant porcine alpha interferon. Vet Immunol Immunopathol. URL: https://pubmed.ncbi.nlm.nih.gov/42025227/
Huang X, Du Z, Chen X, et al. Host insulin hijacking by a nematode receptor mediates developmental plasticity and sex ratio shifts. Nat Commun. URL: https://pubmed.ncbi.nlm.nih.gov/42020424/
Yang M, Zhao Y, Guo W, et al. Development of a vaccine based on mRNA assembly of PEDV virus-like particle. J Virol. URL: https://pubmed.ncbi.nlm.nih.gov/42012185/
Chen H, Ma S, Yuan L, et al. Chicken Infectious Anemia Virus Markedly Enhances the Pathogenicity of Infectious Bronchitis Virus-Infected Chickens. Transbound Emerg Dis. URL: https://pubmed.ncbi.nlm.nih.gov/42007476/
Chen G, Lv M, Xu Y, et al. An immersion challenge model for type II grass carp reovirus (GCRV-II) induces effective infection and activates antiviral immune response in grass carp. Fish Shellfish Immunol. URL: https://pubmed.ncbi.nlm.nih.gov/42001979/
Wang P, Wang S, Xiong N, et al. Non-ligand-binding TLR20.2 and dsRNA-binding TLR20.3 form heterodimer for synergistic antiviral response in grass carp. J Immunol. URL: https://pubmed.ncbi.nlm.nih.gov/42001516/
Zou M, Liu S, Chen Y, et al. Spatial transcriptomic atlas of murine neurotoxocariasis reveals region-specific host responses and dysfunction in the brain. Nat Commun. URL: https://pubmed.ncbi.nlm.nih.gov/41997914/
Fan Y, Li X, Mo J, et al. A novel TaqMan-based RT-qPCR assay for the detection of PEDV and discrimination of the G2c subtype. Arch Virol. URL: https://pubmed.ncbi.nlm.nih.gov/41995904/
GBD 2023 MASLD Collaborators. Global burden of metabolic dysfunction-associated steatotic liver disease, 1990-2023, and projections to 2050: a systematic analysis for the Global Burden of Disease Study 2023. Lancet Gastroenterol Hepatol. URL: https://pubmed.ncbi.nlm.nih.gov/41990758/