Deep Learning for Predicting Viral Host-Range Transitions and Zoonotic Potential
Introduction
The emergence of viral pathogens from animal reservoirs into new host species represents a fundamental challenge in veterinary virology and public health preparedness [1, 2]. Predicting which viral strains possess the capacity for host-range transitions, particularly from wildlife or domestic animal reservoirs to other species, is a critical objective for preemptive surveillance and risk assessment [3, 4]. Traditional experimental approaches, including in vivo challenge studies and in vitro receptor binding assays, are resource-intensive and cannot be scaled to the vast diversity of circulating viruses [5, 6]. Deep learning methodologies have emerged as powerful computational tools capable of extracting predictive features from viral genomic sequences, protein structures, and host interaction data [7, 8]. These models can identify subtle molecular signatures associated with host adaptation and zoonotic potential that may be imperceptible to conventional phylogenetic or statistical analyses [9].
This review examines the biological principles underlying host-range restriction, the computational architectures employed for predictive modeling, and the integration of structural biology data into deep learning frameworks for assessing zoonotic risk.
Biological Determinants of Host-Range Restriction
Receptor Binding and Entry Mechanisms
The initial barrier to cross-species transmission is often the interaction between viral surface glycoproteins and host cell receptors [3, 10]. For influenza A viruses, the hemagglutinin (HA) protein mediates binding to sialic acid receptors, with avian strains preferentially binding alpha-2,3-linked sialic acids and mammalian strains binding alpha-2,6-linked sialic acids [3, 11]. Single amino acid substitutions in the receptor binding site can shift this preference, enabling avian viruses to bind mammalian receptors [3, 12]. Deep mutational scanning combined with deep learning has enabled systematic mapping of these permissive mutations [13].
For henipaviruses such as Nipah virus, the attachment glycoprotein (G) binds to ephrin-B2 and ephrin-B3 receptors, which are highly conserved across mammalian species [14, 15, 16]. This conservation explains the broad host range observed for these viruses, which can infect multiple mammalian orders [14, 16]. Similarly, filoviruses utilize NPC1 as an intracellular receptor, with structural compatibility across diverse mammalian hosts [17, 18].
Intrinsic Host Factors and Restriction Elements
Beyond receptor availability, intracellular host factors impose significant barriers to cross-species transmission [6, 10]. The BTN3A3 protein has been identified as a potent restriction factor for influenza A viruses, and evasion of this restriction is associated with increased zoonotic potential [10]. Pyroptosis pathways, mediated by gasdermin family proteins, exhibit species-specific activation patterns that influence host susceptibility to zoonotic pathogens [6].
The innate immune system represents a major selective pressure driving viral adaptation during host-range transitions [1, 19]. Comparative proteomic analyses have revealed distinct patterns of virus-host interaction across different animal species, with implications for predicting which viruses may successfully establish infection in new hosts [1, 17].
Deep Learning Architectures for Host Prediction
Sequence-Based Classification Models
Deep neural networks have been applied to classify viral sequences according to their host of origin, with the underlying assumption that sequences from viruses capable of infecting multiple hosts will exhibit ambiguous classification patterns [9]. Convolutional neural networks (CNNs) and bidirectional long short-term memory (LSTM) networks have been employed to process nucleotide and amino acid sequences [8, 9].
A landmark study by Hatibi et al. trained deep learning models on 848,630 unique influenza A virus sequences classified into avian, human, and swine host categories [9]. Models using mRNA sequences as input achieved higher prediction accuracy than those using amino acid sequences, suggesting that codon usage patterns contain host-specific information beyond the encoded protein sequence [9]. UMAP visualization of the latent space revealed that viral sequences clustered according to host of origin, with pandemic zoonotic strains localizing at the margins between host clusters [9]. Critically, host prediction for pandemic zoonotic sequences exhibited low prediction accuracy, supporting the hypothesis that ambiguously classified sequences bear features associated with cross-species infectivity [9].
Protein-LncRNA Interaction Networks
Viral host protein-lncRNA interactions (VHPLIs) represent an underexplored dimension of host-range determination [8]. Zhang et al. developed CBIL-VHPLI, a deep learning framework combining CNN and bidirectional LSTM modules with transfer learning, to predict these interactions [8]. The model achieved an accuracy of approximately 0.9 on external validation datasets, with fine-tuning on viral protein-human lncRNA datasets improving accuracy to 0.946 [8]. Case studies demonstrated 91.6% reproducibility with RIP-Seq experimental results, and the model successfully predicted interactions between human lncRNA PIK3CD-AS2 and the nonstructural protein 1 (NS1) of H5N1 influenza virus, validated by RNA pull-down experiments [8].
Antigenic Evolution Mapping
Deep learning approaches have also been applied to predict antigenic evolution, which is closely linked to host-range transitions [13]. Yang et al. developed models to predict hemagglutination inhibition titers from HA sequences, enabling mapping of antigenic drift that may facilitate immune evasion in new host species [13].
Structural Biology Integration
Receptor Complex Modeling
Three-dimensional structural comparison of receptor complexes across species provides mechanistic insight into host-range determinants [3, 18]. Cryo-electron microscopy (cryo-EM) structures of viral glycoproteins in complex with host receptors from different species enable identification of key contact residues that govern binding specificity [18]. For influenza viruses, structural analysis of HA in complex with avian and mammalian sialic acid receptors reveals the conformational changes required for host adaptation [3].
Jin et al. demonstrated that double mutations in the hemagglutinin of clade 2.3.4.4b H5Ny viruses enhanced binding to both human and SLe(X) receptors, providing a structural basis for increased zoonotic risk [3]. These findings underscore the importance of integrating structural data into predictive models.
Deep Learning for Structure Prediction
AlphaFold and related deep learning methods have revolutionized the prediction of viral protein structures, enabling structural analysis of viruses for which experimental structures are unavailable [20]. Structure-based drug design approaches have been applied to identify potential antiviral compounds targeting conserved viral proteins across multiple host species [20]. The integration of predicted structures into host-range prediction models represents an active area of development.
Workflow for Zoonotic Risk Assessment
The following diagram illustrates a computational workflow for predicting viral host-range transitions using deep learning:
flowchart TD
A[Viral Sequence Collection], > B[Feature Extraction]
B, > C{Deep Learning Model}
C, > D[Host Classification]
C, > E[Receptor Binding Prediction]
C, > F[Restriction Factor Evasion]
D, > G[Ambiguity Analysis]
E, > H[Structural Modeling]
F, > I[Host Factor Interaction]
G, > J[Zoonotic Risk Score]
H, > J
I, > J
J, > K[Surveillance Prioritization]
J, > L[Experimental Validation]
The workflow begins with viral sequence collection from public databases, followed by feature extraction including codon usage, amino acid composition, and structural features [8, 9]. Deep learning models then perform host classification, receptor binding prediction, and assessment of restriction factor evasion [10, 9]. Ambiguity in host classification, combined with structural modeling of receptor complexes and host factor interactions, generates a composite zoonotic risk score [9]. High-risk viruses are prioritized for surveillance and experimental validation.
Applications to Specific Viral Families
Influenza A Viruses
Influenza A viruses represent the most extensively studied system for deep learning-based host-range prediction [1, 3, 11, 10, 12, 13, 9]. The segmented genome and high mutation rate generate extensive sequence diversity, providing rich training data for computational models [9]. Key features associated with zoonotic potential include receptor binding site mutations, glycosylation patterns, and polymerase complex adaptations [3, 12].
The emergence of highly pathogenic avian influenza H5 clade 2.3.4.4 viruses in domestic cats in the Netherlands illustrates the importance of monitoring host-range expansion in companion animals [21]. Deep learning models trained on HA sequences from avian, swine, and human isolates can identify mutations that may enable similar transitions in other mammalian species [9].
Henipaviruses and Filoviruses
Henipaviruses, including Nipah and Hendra viruses, exhibit broad host ranges with recurrent spillover events from bat reservoirs [14, 15, 16]. The conservation of ephrin receptors across mammalian species facilitates cross-species transmission, but host-specific differences in immune responses and viral replication kinetics influence disease outcomes [14, 15]. Deep learning models incorporating both viral sequence features and host transcriptomic data may improve prediction of spillover risk [17].
Filoviruses, including Ebola and Marburg viruses, have been studied using comparative transcriptomic approaches across bat and human models [17]. These analyses reveal species-specific differences in host responses that may determine susceptibility and transmission potential [17].
Arboviruses
Tick-borne viruses such as Crimean-Congo hemorrhagic fever virus, Nairobi sheep disease virus, and Kyasanur Forest disease virus present unique challenges for host-range prediction due to the involvement of vector and vertebrate hosts [19, 22, 23, 24]. Rift Valley fever virus exhibits complex evolutionary dynamics across human and non-human outbreaks in Africa, with molecular adaptations that may influence host range [19]. Deep learning models for these viruses must account for the tripartite interactions among virus, vector, and vertebrate host.
Poxviruses
Mpox virus (formerly monkeypox virus) has demonstrated the capacity for sustained human-to-human transmission, raising questions about host-range determinants in orthopoxviruses [25, 26, 27, 28]. Comparative genomic analyses have identified genetic markers associated with host adaptation, and deep learning approaches may improve prediction of zoonotic potential in this family [25, 28].
Data Sources and Feature Engineering
Sequence Databases
Public sequence databases, including those maintained by the National Center for Biotechnology Information (NCBI), provide the foundation for training deep learning models [9]. The NCBI Influenza Virus and Influenza Research Databases contain hundreds of thousands of sequences with associated host metadata [9]. Multi-organ RNA virome profiling of wildlife species, such as edible rodents in Southwest China, expands the diversity of sequences available for training and validation [29].
Feature Representation
Sequence features for deep learning models include k-mer frequencies, one-hot encoding, composition-transition-distribution (CTD) descriptors, and Z-curve representations [8]. Codon usage patterns, which reflect host-specific selective pressures including tRNA abundance and GC content, provide information not captured by amino acid sequences alone [9]. Structural features derived from predicted or experimentally determined protein structures can be incorporated as additional input channels.
Transfer Learning
Transfer learning addresses the challenge of limited training data for specific virus-host pairs by pretraining models on large, diverse datasets and fine-tuning on target systems [8]. The CBIL-VHPLI model demonstrated that pretraining on plant and animal VHPLI data improved performance on viral protein-human lncRNA interaction prediction [8]. This approach is particularly valuable for emerging viruses with limited available sequence data.
Limitations and Challenges
Data Imbalance and Bias
Training datasets for host prediction models are heavily biased toward well-studied viruses and host species [9]. Influenza A viruses from humans, swine, and poultry dominate available sequence data, while sequences from wildlife reservoirs are underrepresented [9, 29]. This imbalance may lead to models that perform poorly on viruses from under-sampled host species.
Generalizability Across Viral Families
Models trained on one viral family may not generalize to others due to differences in genome organization, replication strategies, and host interaction mechanisms [8, 9]. Cross-family prediction requires identification of conserved features associated with host-range transitions, which remains an active research area.
Temporal Dynamics
Viral evolution is a continuous process, and models trained on historical data may not capture emerging adaptive mutations [3, 13]. Continuous retraining and validation against newly characterized viruses are essential for maintaining predictive accuracy.
Future Directions
Integration of Multi-Omics Data
Combining viral sequence data with host transcriptomic, proteomic, and epigenomic data may improve prediction of host-range transitions [1, 17, 8]. Comparative analyses of bat and human responses to filovirus infection have identified species-specific pathways that influence susceptibility [17]. Similar approaches for other virus-host pairs may reveal conserved determinants of zoonotic potential.
Structural Modeling at Scale
Advances in protein structure prediction, including AlphaFold and related methods, enable structural analysis of viral proteins across diverse families [20]. Integrating predicted structures into deep learning models for receptor binding prediction may improve accuracy for viruses with limited experimental structural data.
Real-Time Surveillance Systems
Automated surveillance systems that combine deep learning-based risk prediction with genomic epidemiology can provide real-time assessment of emerging threats [7, 4]. The Viral Sentry AI framework represents an early example of such integrated systems, combining automated zoonotic surveillance with drug repurposing analysis [7].
Conclusion
Deep learning has emerged as a transformative approach for predicting viral host-range transitions and zoonotic potential. Sequence-based classification models can identify viruses with ambiguous host signatures that correlate with pandemic potential [9]. Integration of structural data, host factor interactions, and multi-omics information continues to improve predictive accuracy [3, 10, 8]. While challenges remain, including data imbalance and limited generalizability across viral families, ongoing advances in computational methods and data availability promise to enhance our ability to anticipate and mitigate emerging zoonotic threats.
References
[1] Gerodez A, Dos Santos M, Attia M et al. Toward Predicting Pandemic Potential: A Comparative Analysis of Virus-Host Interactions Between Diverse Influenza A Viruses and the Human Innate Immune System. Proteomics. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42319249/
[2] Cheng K, Qiao Y. Dynamical Behavior and Control Optimization of a Zoonotic Epidemic Model Incorporating Temperature Effects: Analysis and Simulations. Bull Math Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42313298/
[3] Jin X, Han P, Wang Y et al. Hemagglutinin double-mutation enhances binding of human-infecting avian influenza virus clade 2.3.4.4b H5Ny to human and SLe(X) receptors. EMBO Rep. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42303811/
[4] Gavotte L, Gaucherel C, Goubier T et al. Walking the path: exploring the pathways of emerging viral diseases. Environ Res. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42270061/
[5] Ferreira JDS, Conceição EC, Antunes JMAP et al. Genotyping of Mycobacterium leprae in humans and six-banded armadillos suggest lack of inter-species transmission in Rio Grande do Norte, northeast Brazil. Acta Trop. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42303196/
[6] Ho S, Bryant CE. Pyroptosis across species and its potential impact on host defense against zoonotic pathogens. Cell Chem Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42296960/
[7] Munteanu CR, Vázquez-Naya J, Tejera E. Viral Sentry AI-Automated zoonotic surveillance and drug repurposing agent. Biol Methods Protoc. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42292930/
[8] Zhang M, Zhang L, Liu T et al. CBIL-VHPLI: a model for predicting viral-host protein-lncRNA interactions based on machine learning and transfer learning. Sci Rep. 2024. URL: https://www.semanticscholar.org/paper/1e66433625b6992e1a062d91c8d06b2116a671ea
[9] Hatibi N, Dumont-Lagacé M, Alouani Z et al. Misclassified: identification of zoonotic transition biomarker candidates for influenza A viruses using deep neural network. Front Genet. 2023. URL: https://www.semanticscholar.org/paper/d3be8d23ed14a4987ce68ee765cb35f457e1c78d
[10] Pinto R, Bakshi S, Lytras S et al. BTN3A3 evasion promotes the zoonotic potential of influenza A viruses. Nature. 2023. URL: https://www.semanticscholar.org/paper/5ffc8451bec6b21f21d59a8a6519842f67ae6a73
[11] Klivleyeva N, Glebova T, Saktaganov N et al. Human-Origin Influenza A(H3N2) Viruses Revealed in Swine Farms During the Period 2022-2025 in Kazakhstan. Animals (Basel). 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42278183/
[12] Kwon HI, Kim EH, Kim YI et al. Comparison of the pathogenic potential of highly pathogenic avian influenza (HPAI) H5N6, and H5N8 viruses isolated in South Korea during the 2016-2017 winter season. Emerg Microbes Infect. 2018. URL: https://pubmed.ncbi.nlm.nih.gov/29535296/
[13] Yang B, Yin Y, Wang L et al. Mapping antigenic evolution of influenza A virus using deep learning-based prediction of hemagglutination inhibition titers. bioRxiv. 2025. URL: https://www.semanticscholar.org/paper/6294cb3f19bb216d5ff291264ea9fc63ecdc117d
[14] Wang X, Zhang W. Henipaviruses at the threshold: preparedness in the era of recurrent spillover. Emerg Microbes Infect. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42261208/
[15] Davies KA, Welch SR, Coleman-McCray JD et al. Natural History of Nipah Virus in Hamsters: Strain, Route, and Sex-Associated Variability Characterized Using Large Datasets to Inform Pre-Clinical Study Design. J Infect Dis. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41159859/
[16] Pandey P, Chauhan P, Pandey S et al. An Updated Review on Nipah Virus Infection with a Focus on Encephalitis, Vasculitis, and Therapeutic Approaches. Curr Top Med Chem. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40468923/
[17] Xuan DTM, Yeh IJ, Liu HL et al. A comparative analysis of Marburg virus-infected bat and human models from public high-throughput sequencing data. Int J Med Sci. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39744175/
[18] Wang L, Zou B, Liu B et al. Cryo-EM structures of Měnglà virus GP reveal combined Ebola- and Marburg-like epitope masking strategies for antibody evasion. Proc Natl Acad Sci U S A. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42247561/
[19] Omara IE, Juma J, Tshiabuila D et al. Evolutionary dynamics and molecular adaptation of Rift Valley fever virus across human and non-human outbreaks in Africa. BMC Genomics. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42289653/
[20] Maurya VK, Kumar S, Maurya S et al. Structure-based drug designing for potential antiviral activity of selected natural product against Monkeypox (Mpox) virus and its host targets. VirusDisease. 2024. URL: https://www.semanticscholar.org/paper/1e116866973bbd66772bb9cf3a871c532fd24c82
[21] Duijvestijn MBHM, Broens EM, Schuurman NNMPNMP et al. EXPRESS: Highly pathogenic avian influenza H5 clade 2.3.4.4 and human new pandemic H1N1 virus exposure in domestic cats with outdoor access in the Netherlands in 2024. J Feline Med Surg. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42265865/
[22] Zhang X, Wang MH, Hu JH et al. Nairobi sheep disease virus: an emerging threat with unresolved pathogenesis and zoonotic potential. J Virol. 2026. URL: https://www.semanticscholar.org/paper/9b4bacf73a85f12d1ea818871bff8e3b4dc202ae
[23] Kaushal H, Kumar Meena V, Das S et al. Pathogenicity and virulence of Kyasanur Forest disease: A comprehensive review of an expanding zoonotic threat in southwestern India. Virulence. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41165010/
[24] Aslam M, Abbas RZ, Alsayeqh A. Distribution pattern of Crimean-Congo Hemorrhagic Fever in Asia and the Middle East. Front Public Health. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/36778537/
[25] Hajjo R, Abusara OH, Sabbah DA et al. Advancing the understanding and management of Mpox: insights into epidemiology, disease pathways, prevention, and therapeutic strategies. BMC Infect Dis. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40234789/
[26] Hou W, Wu N, Liu Y et al. Mpox: Global epidemic situation and countermeasures. Virulence. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39921615/
[27] Eslamkhah S, Aslan ES, Yavas C et al. Mpox virus (MPXV): comprehensive analysis of pandemic risks, pathophysiology, treatments, and mRNA vaccine development. Naunyn Schmiedebergs Arch Pharmacol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39777535/
[28] Alakunle E, Kolawole D, Diaz-Cánova D et al. A comprehensive review of monkeypox virus and mpox characteristics. Front Cell Infect Microbiol. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38510963/
[29] Chen D, Zhou J, Ma Q et al. Multi-Organ RNA Virome Profiling of Edible Rodents Reveals Potential Zoonotic Viral Exposure at the Wildlife-Livestock-Human Interface in Southwest China. Pathogens. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42198684/
[30] Munro J, Melnyk D, Afzal M et al. Development of a recombinant single-cycle influenza viral vector as an intranasal vaccine against SARS-CoV-2. Sci Rep. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42288524/
[31] Liao C, Shin S, Rohrbaugh K et al. Global land rush concentrates potential zoonotic spillover risk in the tropics. Commun Sustain. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42261324/
[32] Kessler S, Burke B, Andrieux G et al. Deciphering bat influenza H18N11 infection dynamics in male Jamaican fruit bats on a single-cell level. Nat Commun. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38802391/
[33] Bezerra KC, Vieira CMAG, de Oliveira-Filho EF et al. Susceptibility of solid organ transplant recipients to viral pathogens with zoonotic potential: A mini-review. Braz J Infect Dis. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38670166/
[34] Simo Tchetgna HD, Nakoune E, Selekon B et al. Molecular Characterization of the Kamese Virus, an Unassigned Rhabdovirus, Isolated from Culex pruina in the Central African Republic. Vector Borne Zoonotic Dis. 2017. URL: https://pubmed.ncbi.nlm.nih.gov/28350284/
[35] Chediack V, Calvo M, Cunto E et al. Hantavirus infection: A narrative review focusing on epidemiology, diagnosis, infection control and treatment in the era of globalisation. Med Intensiva (Engl Ed). 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42191525/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.