What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Single-Cell RNA-Seq Analysis Pipelines for Veterinary Immunology

Single-cell RNA sequencing (scRNA-seq) has transformed the study of host immune responses by enabling transcriptomic profiling of individual cells within heterogeneous tissues. In veterinary immunology, these methods provide unprecedented resolution for characterizing leukocyte subsets in species such as cattle, swine, chickens, and teleost fish. This article describes a comprehensive computational pipeline for scRNA-seq data analysis tailored to veterinary models, with emphasis on immune cell identification, differential expression, and trajectory inference. The pipeline integrates quality control, normalization, clustering, cell type annotation, and downstream functional analyses. The reference study on Atlantic salmon head kidney using both single-cell and single-nucleus transcriptomics [1] serves as a central example for deploying these methods in non-mammalian veterinary species.

Overview of the scRNA-seq Analysis Pipeline

A standard scRNA-seq analysis pipeline for veterinary immunology consists of six major stages:

Raw data processing and quantification.
Quality control and filtering.
Normalization and batch correction.
Dimensionality reduction and clustering.
Cell type annotation.
Downstream analyses (differential expression, trajectory inference, ligand-receptor interactions).

Each stage requires specific considerations when applied to veterinary species due to differences in genome annotation quality, tissue architecture, and immune cell marker conservation.

The following Mermaid diagram summarizes the workflow:

flowchart TD
    A[Raw FASTQ files], > B[Alignment & quantification<br>(e.g., STARsolo, Alevin, Cell Ranger-like)]
    B, > C[Generate count matrix]
    C, > D[Quality control<br>Mitochondrial content, gene count, UMI count]
    D, > E[Filter cells & genes]
    E, > F[Normalization<br>scran, SCTransform, or analytic Pearson residuals]
    F, > G[Batch correction<br>Harmony, Seurat CCA, or fastMNN]
    G, > H[Dimensionality reduction<br>PCA, t-SNE, UMAP]
    H, > I[Clustering<br>Louvain, Leiden, K-means]
    I, > J[Cell type annotation<br>Marker genes, reference mapping, SingleR]
    J, > K[Downstream analyses<br>DEGs, trajectory, cell-cell communication]

Raw Data Processing and Quantification

The first step involves converting raw sequencing reads into a gene-cell count matrix. For veterinary species, the reference genome and annotation must be appropriate for the target organism. Commonly used alignment tools include STARsolo, Alevin (from the Salmon suite), and the kallisto-bustools pipeline. These tools generate unique molecular identifier (UMI) counts per gene per cell.

In the Atlantic salmon head kidney study [1], the authors compared single-cell and single-nucleus transcriptomics. For single-cell data, they used the 10x Genomics platform (described generically as a droplet-based microfluidic system) and aligned reads to the salmon genome (Ssal v3.1). For single-nucleus RNA-seq, nuclei were isolated from frozen tissues and processed similarly. The choice between whole cells and nuclei influences the recovery of certain transcripts; cytoplasmic mRNAs are enriched in whole cells, while nuclear RNA captures nascent transcripts and is more suitable for archived samples.

Quality Control and Filtering

Quality control (QC) steps remove low-quality cells, empty droplets, and doublets. Key metrics include:

Number of unique genes detected per cell.
Total UMI count per cell.
Percentage of reads mapping to mitochondrial genes.

For mammalian immune cells, a high mitochondrial fraction (above 20%) indicates damaged or dying cells. In fish, the threshold may be adjusted based on tissue type; for salmon head kidney, a mitochondrial fraction cutoff of 10-15% is commonly applied. Low gene counts (e.g., fewer than 200 genes) may represent empty droplets, while very high counts (>5000 genes) often indicate doublets. Doublet detection can be performed using tools such as DoubletFinder or scrublet.

The QC filtering step is critical in veterinary species because tissue dissociation protocols (e.g., from spleen, lymph node, or kidney) can introduce variable cell stress. For example, cells from the chicken bursa of Fabricius are particularly fragile and may require lower stringency.

Normalization and Batch Correction

Normalization aims to remove technical variation while preserving biological heterogeneity. Methods include library-size scaling (Seurat's LogNormalize, scran's deconvolution) and model-based approaches (SCTransform, analytic Pearson residuals). For datasets with multiple experimental batches (e.g., different animals or flow-sorted populations), batch correction is essential. Popular algorithms include Harmony, Seurat's canonical correlation analysis (CCA), and fastMNN.

In the Atlantic salmon study [1], the authors integrated single-cell and single-nucleus datasets to identify shared cell types. Batch effects arose from differences in dissociation protocols and sequencing runs. Harmony, which operates in a reduced dimensional space, was used to align the datasets without losing biological variation. The result was a unified representation of head kidney immune cells, including B cells, T cells, macrophages, and granulocytes.

Dimensionality Reduction and Clustering

After normalization and batch correction, principal component analysis (PCA) is applied to the most variable genes. The number of principal components (PCs) to retain is typically determined by elbow plots or jackstraw analysis (in Seurat). Clustering is performed in the PCA-reduced space using graph-based methods (Louvain or Leiden) or K-means. The resolution parameter controls the number of clusters.

In veterinary immunology, clustering must distinguish subtle subsets such as CD4+ versus CD8+ T cells, or M1 versus M2 macrophages. Marker gene panels for these subsets are often derived from mammalian studies but may require cross-species validation. For example, in salmon, CD3 epsilon serves as a pan-T cell marker, while IgM identifies B cells. The presence of novel or species-specific cell types (e.g., rodlet cells in fish kidney) demands careful manual annotation.

Cell Type Annotation

Cell type annotation can be performed using three approaches:

Manual annotation based on expression of known marker genes.
Reference-based annotation using tools like SingleR, which correlates cluster expression profiles with reference transcriptomic datasets.
Transfer learning from labeled datasets via methods such as Seurat's FindTransferAnchors.

For veterinary species, reference-based annotation is often limited by the availability of high-quality sorted cell populations. In the absence of species-specific references, cross-species mapping using orthologous gene symbols may be employed. The Atlantic salmon study [1] constructed a comprehensive cell atlas of the head kidney by combining manual annotation with reference data from zebrafish and mouse immune cells. They identified 15 distinct cell types, including thrombocytes, which are nucleated in fish and participate in both hemostasis and immunity.

A table of commonly used immune cell markers in selected veterinary species is provided below.

Cell Type	Mammalian Marker (cattle, swine)	Avian Marker (chicken)	Teleost Marker (salmon)
T cell	CD3E, CD4, CD8A	CD3E, CD4, CD8A	CD3E, CD4, CD8A
B cell	CD19, MS4A1 (CD20), PAX5	PAX5, CD79A	CD79A, sIgM
Macrophage	CD14, CD68, CSF1R	CSF1R, CD68	CSF1R, CD209
Granulocyte	S100A8, FUT4, ELANE	CATH1, avBD	MPO, LYZ
NK cell	NKG7, KLRD1, NCR1	NKG7, KLRD1	Perforin, Granzyme
Dendritic cell	FLT3, XCR1, CD207	FLT3, XCR1	FLT3, CD83

Note: These markers are based on current literature and may require experimental validation.

Downstream Analyses

Following annotation, several downstream analyses reveal biological insights.

Differential Expression Analysis

Differential expression (DE) between conditions (e.g., infected versus naive animals) is performed at the cell-type level using methods such as pseudobulk aggregation (edgeR, DESeq2) or single-cell level models (MAST, SCORPIUS). For example, in a study of porcine reproductive and respiratory syndrome virus (PRRSV) infection, DE analysis in alveolar macrophages can identify interferon-stimulated genes and antiviral factors.

Trajectory Inference

Trajectory inference methods (Monocle 3, Slingshot, PAGA) reconstruct developmental or activation paths from scRNA-seq data. In veterinary immunology, this is useful for studying B cell maturation in the chicken bursa of Fabricius or T cell differentiation in the bovine thymus. The pipeline should consider the inherent pseudotime ordering and confirm with RNA velocity (e.g., velocyto, scVelo).

Cell-Cell Communication

Ligand-receptor interaction analysis (CellChat, NicheNet, SingleCellSignalR) infers intercellular signaling. For instance, in the ovine mammary gland during mastitis, interactions between macrophages and epithelial cells via TNF and IL-1 pathways can be dissected.

Integration with Other Data Types

Multi-omic integration (e.g., scRNA-seq with CITE-seq for surface protein, or scATAC-seq for chromatin accessibility) is increasingly applied in veterinary immunology. The pipeline can incorporate weighted nearest neighbor analysis (Seurat v4/v5) to jointly cluster cells based on RNA and protein data.

Practical Considerations for Veterinary Specimens

Several factors differentiate veterinary scRNA-seq studies from human-focused work:

Tissue availability: Post-mortem samples from slaughterhouses or diagnostic necropsies may have variable RNA quality. single-nucleus RNA-seq is more robust for frozen tissues.
Species-specific genome annotations: Not all veterinary species have fully annotated immune gene sets. De novo assembly or cross-mapping with closely related genomes may be necessary.
Immune cell nomenclature: Veterinary immunologists often use CD nomenclature based on cross-reactivity with monoclonal antibodies; scRNA-seq can confirm expression of these markers.
Cost and throughput: Large animal studies (e.g., cattle, pigs) may require pooling multiple animals. Batch effects must be carefully modeled.

The Atlantic salmon study [1] demonstrated that single-nucleus transcriptomics can recover a comparable cell atlas to single-cell data, with the advantage of using archived frozen samples. This is particularly relevant for veterinary diagnostics where fresh tissue may not be available.

Conclusion

Single-cell RNA-seq analysis pipelines are now mature for veterinary immunology applications. The core computational steps of preprocessing, QC, normalization, clustering, annotation, and downstream analysis apply across species, but each stage requires careful tailoring to the target organism. The integration of single-cell and single-nucleus approaches, as exemplified in Atlantic salmon [1], expands the utility of these methods to field-collected specimens. As veterinary reference atlases grow, cross-species comparative immunology will benefit from standardized pipelines and marker gene databases. Continued development of computational tools that accept non-model organism inputs will further accelerate discovery in livestock, poultry, and aquatic species.

References

[1] Andresen AMS, Taylor RS, Grimholt U, et al. Mapping the cellular landscape of Atlantic salmon head kidney by single cell and single nucleus transcriptomics. Fish Shellfish Immunol. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38181891/