Section: Clinical Methods & Interventions

The Global Initiative on Sharing All Influenza Data (GISAID): A Technical Reference for Veterinary Virology

Introduction

The Global Initiative on Sharing All Influenza Data (GISAID) is a foundational platform for the international sharing of influenza virus genomic sequences, clinical metadata, and epidemiological data. Established to facilitate rapid and transparent access to viral genetic information, GISAID provides a curated repository that addresses key limitations of earlier public sequence archives, particularly regarding data provenance, author attribution, and pre-publication sharing rights. For veterinary virology, GISAID is an indispensable tool for the molecular surveillance of influenza A viruses circulating in avian and swine populations, serving as the primary resource for tracking the emergence, reassortment, and global spread of strains with zoonotic and epizootic potential.

Platform Architecture and Data Governance

GISAID operates under a unique access agreement model distinct from conventional open-access databases. Submitters retain data ownership and are attributed for their contributions. Users must register and agree to specific terms of use, including a moratorium on the use of submitted sequences for commercial purposes and a requirement to acknowledge the originating and submitting laboratories in any publications. This framework encourages data sharing by providing academic credit and intellectual property protection, a critical factor for veterinary laboratories in both developed and resource-limited settings.

The platform hosts the EpiFlu database, which is organized into four primary searchable domains:

  1. Virus name and identification: Standardized nomenclature including host species, geographic origin, strain identifier, and year of collection.
  2. Segment-specific sequence data: Full-length or partial sequences for all eight influenza A gene segments (PB2, PB1, PA, HA, NP, NA, MP, NS).
  3. Clinical and epidemiological metadata: Collection date, geographic location (country, province, detailed site), host species, clinical signs (if applicable), and passage history.
  4. Variant and phenotype annotations: Hemagglutinin (HA) and neuraminidase (NA) subtype designations, clade assignments (e.g., 2.3.4.4b for H5Nx high pathogenicity avian influenza), and antiviral resistance markers.

Utility in Veterinary Influenza Surveillance

Avian Influenza Surveillance

The continuous evolution of avian influenza viruses (AIVs) necessitates robust genomic surveillance in both domestic poultry and wild bird reservoirs. GISAID serves as the central repository for AIV sequences generated through national surveillance programs, outbreak investigations, and research studies. For instance, comprehensive genomic analysis of H6 AIVs detected in Vietnamese live bird markets over a four-year period relied on GISAID to contextualize newly sequenced isolates against a global dataset, revealing geographically distinct lineages and persistent circulation patterns [1]. Such analyses are critical for understanding the maintenance of low-pathogenicity AIVs that can serve as precursor viruses for highly pathogenic strains.

Molecular epidemiological studies leverage GISAID to track the introduction and dissemination of clades such as H5N1, H5N8, and H5N6 across continents via wild bird migration. Phylogenetic reconstructions using sequences deposited in GISAID enable the identification of index cases in poultry outbreaks, estimation of transmission rates, and assessment of control measure efficacy. The platform is also essential for monitoring the antigenic evolution of AIVs, a process driven by selective pressure from population immunity in reservoir hosts. Studies on epitope-specific antibody immunodominance in influenza viruses rely on large-scale HA sequence datasets from GISAID to map antigenic drift at a high resolution [2].

Swine Influenza Surveillance

Swine populations act as mixing vessels for influenza A viruses, facilitating reassortment between avian, human, and swine-adapted lineages. GISAID archives a substantial number of swine influenza A virus (IAV-S) sequences from global surveillance efforts. Genetic characterization of novel triple-reassortant viruses, such as those identified in pig populations in China, depends on the systematic comparison of newly generated whole-genome sequences with the comprehensive GISAID reference dataset. One study used GISAID to confirm the origin of internal gene segments from North American triple-reassortant lineages and the HA and NA segments from a European swine lineage in a novel H1N2 isolate [3]. Long-term evolutionary analyses of H1N1 IAV-S in China, spanning over four decades, utilized the GISAID archive to reconstruct phylodynamic patterns, revealing lineage turnover events and the expansion of specific genotypes [4]. The platform is similarly indispensable for continent-scale genomic epidemiology studies that describe the prevalence and evolution of enzootic swine influenza lineages and their periodic spillover into human populations [5].

Phylogenetic and Phylodynamic Analysis Workflow

The standard workflow for utilizing GISAID in veterinary research involves several sequential stages:

  1. Query and Retrieval: Sequences are selected using queries filtered by host species (e.g., Sus scrofa, Gallus gallus, Anas platyrhynchos), subtype (e.g., H5N1, H9N2), geographic region, and collection date range. Both complete genome sets and individual gene segments are retrievable in FASTA format with associated metadata.

  2. Quality Control and Alignment: Retrieved sequences are screened for internal stop codons, frameshift mutations, and ambiguous base calls. Multiple sequence alignment is performed using tools such as MAFFT or MUSCLE, often guided by codon-based alignment to preserve reading frames for protein-coding sequences.

  3. Phylogenetic Reconstruction: Maximum likelihood or Bayesian phylogenetic methods are applied to infer evolutionary relationships. Models of nucleotide substitution (e.g., GTR+I+G) are selected based on likelihood ratio tests. Molecular clock models (strict or relaxed lognormal) are employed for temporal phylogenies, enabling the estimation of time to most recent common ancestor.

  4. Phylodynamic Inference: Bayesian skyline or coalescent-based models are used to estimate effective population size changes over time, inferring viral demographic history. Discrete trait phylogeographic analysis can reconstruct the spatial spread of a lineage across geographic regions.

  5. Clade and Genotype Assignment: Subclade classifications follow standardized nomenclature (e.g., WHO/OIE/FAO H5 naming system). Reassortment events are detected by comparing phylogenetic topologies across gene segments.

graph TD
    A[Sample Collection from Avian or Swine Host], > B[RNA Extraction and RT-PCR Amplification]
    B, > C[High-Throughput Sequencing]
    C, > D[Sequence Assembly and Annotation]
    D, > E[Submission to GISAID EpiFlu Database]
    E, > F[Metadata Curation: Host, Date, Location]
    F, > G[Query and Sequence Retrieval]
    G, > H[Quality Control and Alignment]
    H, > I[Phylogenetic Reconstruction]
    I, > J[Clade Assignment and Reassortment Detection]
    J, > K[Antigenic Drift Analysis]
    J, > L[Phylogeographic and Demographic Inference]
    K & L, > M[Publication and Data Release]
    M, > N[Updated Vaccine Strain Selection and Surveillance Guidelines]

Impact on Vaccine Strain Selection and Antigenic Characterization

The GISAID database directly informs the biannual vaccine strain selection process conducted by the WHO Global Influenza Surveillance and Response System for human vaccines. For veterinary medicine, a similar paradigm applies for the development and update of autogenous or commercial vaccines against circulating AIV and IAV-S strains. The timely availability of full-length HA sequences from GISAID allows veterinary laboratories to assess the antigenic relatedness of field strains to existing vaccine strains using phylogenetic clustering and, when combined with serological data, antigenic cartography. The accuracy of evolutionary forecasts for influenza A/H3N2, for example, is significantly improved when genomic surveillance data from the preceding year are fully integrated into predictive models [6]. This principle extends to the prediction of future dominant variants in swine and poultry populations, where GISAID data form the empirical basis for these computational forecasts.

Integration with Other Surveillance Platforms

GISAID is not an isolated resource. It is tightly integrated with global animal health reporting systems including the World Organisation for Animal Health (WOAH) and the Food and Agriculture Organization (FAO). Disease outbreak notifications from these agencies often link to GISAID accession numbers, enabling direct access to the genetic characterization of the outbreak strain. Furthermore, GISAID data feeds into specialized analytical platforms such as Nextstrain and IRMA (Iterative Refinement Meta-Assembler), which provide automated, real-time phylodynamic analyses and visualizations of circulating influenza lineages.

Limitations and Challenges

Despite its utility, GISAID is subject to several limitations. Data submission is voluntary, leading to substantial geographic and temporal biases. Many countries with high disease burden contribute sequences sporadically or not at all, as evidenced by gaps in surveillance in parts of Africa [7] and other resource-limited regions. Data quality can vary, with some submissions lacking complete metadata (e.g., precise collection date, species identification) crucial for robust epidemiological inference. The annotation of host species sometimes relies on general categories rather than specific scientific names, complicating fine-scale ecological analyses. For swine influenza, nomenclature for genotypes (e.g., H1N1pdm09-like, H1N2 triple-reassortant) is not fully standardized across submissions, requiring manual curation.

Comparative Host-Range Context and Zoonotic Implications

Influenza A viruses spanning multiple species provide the most instructive contexts for comparative analysis. Cross-species transmission events, such as the spillover of H5N1 from poultry to mammalian hosts, can be monitored through GISAID by examining the genetic signatures associated with mammalian adaptation, including mutations in the HA receptor binding site (e.g., E190D, G225D in H1 numbering, or Q226L, G228S in H3 numbering) and the polymerase basic protein 2 (PB2) subunit (e.g., E627K, D701N). The database also facilitates the detection of lineage-specific extinction events, such as the probable extinction of the influenza B/Yamagata lineage, which was assessed through a systematic review of global surveillance databases including GISAID [8].

Future Directions

The continued utility of GISAID for veterinary virology hinges on several developments: the expansion of sequencing capacity in underrepresented regions; the incorporation of standardized, structured metadata for swine and avian hosts; the integration of phenotypic data, such as pathogenicity indices in chicken models; and the development of automated, validated bioinformatic pipelines for the rapid detection of reassortment and antigenic drift. The inclusion of other animal-derived influenza viruses, such as those from horses, canines, and feline species, further broadens its coverage of the host range.

Data Availability and Ethical Use

The data governance model of GISAID explicitly prohibits any user from distributing or redistributing the data in a manner that circumvents the access agreement. All researchers using GISAID are bound to:

  • Acknowledge the contributions of the data submitters.
  • Cite the GISAID platform itself in all publications.
  • Maintain the confidentiality of any pre-publication data identified by the submitter.

These conditions support an ethical framework that prioritizes collaborative research and reduces the risk of data misappropriation.

Conclusion

GISAID remains the cornerstone of global genomic influenza surveillance. For veterinary medicine, it is the principal repository for the genetic data that underpin outbreak response, vaccine development, and evolutionary tracking of influenza viruses in avian and swine populations. Its curated architecture and governance model have fostered unprecedented levels of data sharing, enabling the molecular epidemiology studies that are essential for managing the One Health threat posed by influenza A viruses.

References

[1] Guan L, Babujee L, Presler R, et al. Avian H6 Influenza Viruses in Vietnamese Live Bird Markets during 2018-2021. Viruses. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38543733/

[2] Lu X, Liu F, Tzeng WP, et al. Epitope-Specific Antibody Immunodominance Driving Antibody and Influenza Viral Evolution During 2010-2024. Open Forum Infectious Diseases. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40832595/

[3] Zhao Y, Liu C, Xia C, et al. Genetic characterization of a novel triple-reassortant influenza A (H1N2) virus from pigs, China, 2021. Frontiers in Microbiology. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42039829/

[4] Zhao Y, Han L, Sang H, et al. Genetic diversity and evolution of H1N1 subtype swine influenza virus in China: a comprehensive analysis from 1977 to 2020. Evolution. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41124035/

[5] Sun H, Liu H, Zhang J, et al. Genome-scale evolution and phylodynamics of swine influenza A viruses in China: a genomic epidemiology study. The Lancet Microbe. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40311646/

[6] Huddleston J, Bedford T. Timely vaccine strain selection and genomic surveillance improves evolutionary forecast accuracy of seasonal influenza A/H3N2. medRxiv. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39314963/

[7] Azbida M, Ferjani S, Elahmer O, et al. Sentinel Surveillance of Influenza A in Libya: Subtyping and Genomic Analysis During Recent Seasons (2022-2024). Tropical Medicine and Infectious Disease. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42188856/

[8] Caini S, Meijer A, Nunes MC, et al. Probable extinction of influenza B/Yamagata and its public health implications: a systematic literature review and assessment of global surveillance databases. The Lancet Microbe. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38729197/

[9] Ramuth M, Sonoo J, Mokshanand F, et al. Molecular Evolution of Influenza A Viruses From Mauritius, 2017-2019. Influenza and Other Respiratory Viruses. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40360235/

[10] Zhang M, Yang C, Wu X, et al. Antigenic analysis of the influenza B virus hemagglutinin protein. Virologica Sinica. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39233140/