-- title: "Single-Cell RNA-Seq Analysis Pipelines for Veterinary Immunology" category: "transcriptomics-omics" metaDescription: "A technical review of single-cell RNA-seq analysis pipelines for profiling immune cells in livestock and companion animals, comparing Seurat, Scanpy, and Monocle workflows." primaryKeyword: "single-cell RNA-seq veterinary immunology" secondaryKeywords: ["scRNA-seq pipeline", "livestock immune profiling", "Seurat Scanpy Monocle", "veterinary bioinformatics", "B cell receptor reconstruction"]
Single-Cell RNA-Seq Analysis Pipelines for Veterinary Immunology
1. Introduction
Single-cell RNA sequencing (scRNA-seq) has transformed the resolution at which immune responses can be characterized across vertebrate species. In veterinary immunology, the application of scRNA-seq enables the dissection of heterogeneous cell populations within lymphoid tissues of livestock and companion animals, revealing cell type specific transcriptional responses to pathogens, vaccines, and neoplasia. However, the translation of computational pipelines developed for human or murine data to non-model species introduces substantial challenges: incomplete genome annotations, absence of validated cell surface markers, and the need for cross-species mapping strategies.
This article provides a technical overview of the major scRNA-seq analysis pipelines used in veterinary immunology, with emphasis on pre-processing steps, clustering algorithms, and cell type annotation. The tools Seurat, Scanpy, and Monocle are compared in the context of immune cell profiling in species such as Bos taurus, Salmo salar, Gallus gallus, and Sus scrofa. Integration strategies for reconstructing adaptive immune receptor repertoires and for analyzing intratumoral immune heterogeneity are also discussed, with reference to recent applications in veterinary oncology and fish immunology [1-4]. A workflow diagram and comparison tables are provided to guide pipeline selection for veterinary researchers.
2. Pre-Processing and Quality Control
Regardless of the chosen downstream tool, scRNA-seq data from veterinary samples require rigorous pre-processing to remove technical artifacts. Standard quality control (QC) metrics include:
- Minimum and maximum number of unique molecular identifiers (UMIs) per cell.
- Percentage of mitochondrial reads (typically below 20% for viable cells).
- Doublet detection using in silico methods (e.g., DoubletFinder or Scrublet).
- Ambient RNA removal using algorithms such as EmptyDrops or SoupX.
Veterinary samples often contain a high proportion of erythrocytes or granulocytes with high endogenous RNase activity, which can increase the fraction of damaged cells. Additionally, lipid-rich tissues from species such as Atlantic salmon require modified dissociation protocols that may yield higher ambient RNA levels [2]. Filtering thresholds must therefore be optimized per tissue and species.
Normalization is performed using global scaling (Seurat's NormalizeData with log-transformation) or via count-based models (scran's computeSumFactors, Scanpy's scran wrapper). For cross-sample integration, anchoring methods in Seurat (Canonical Correlation Analysis) or Scanorama in Scanpy are commonly applied. Batch effects arising from different sequencing runs or tissue processing dates are corrected using Harmony or mutual nearest neighbors (MNN) approaches.
3. Clustering and Cell Type Annotation
Unsupervised clustering is typically performed using the Louvain or Leiden algorithms implemented in Seurat and Scanpy. Monocle 3 uses a trajectory-based approach (UMAP + Louvain) that also allows pseudotime ordering. The choice of clustering resolution determines the granularity of cell type identification; for immune profiling, a resolution that distinguishes major lineages (T cells, B cells, monocytes, granulocytes) while avoiding over-splitting is critical.
Annotation of clusters is the primary bottleneck in veterinary scRNA-seq. Because most reference databases (e.g., CellMarker, PanglaoDB) are built on human and mouse data, veterinary researchers rely on:
- Differential expression of conserved orthologs. For example, CD3E for T cells, CD19/MS4A1 for B cells, CD14 for monocytes.
- SingleR or Garnett, which use cross-species mapping of gene symbols to a curated reference.
- Manual annotation based on literature-derived marker lists for the target species.
Meta-analyses of publicly available transcriptomic datasets have helped consolidate species-specific immune gene signatures. For example, Marimuthu et al. [1] performed a meta-analysis of Bos taurus transcriptomic datasets to identify key immune gene profiles and signaling pathways, providing a resource for cell type annotation in bovine scRNA-seq studies.
4. Immune Cell Profiling in Livestock
4.1 Bovine Immune Responses
Cattle are major livestock species for which scRNA-seq has been applied to study responses to viral infections such as Bovine Coronavirus (see Bovine Coronavirus Respiratory Disease). Single-cell profiling of peripheral blood mononuclear cells and lymph node tissues has revealed distinct subpopulations of gamma delta T cells and natural killer cells that expand during infection. The meta-analysis by Marimuthu et al. [1] identified conserved gene expression modules associated with Toll-like receptor signaling and interferon responses across multiple challenge studies, facilitating the interpretation of scRNA-seq clusters in the absence of species-specific antibody panels.
4.2 Teleost Fish Immunology
Atlantic salmon (Salmo salar) is a key species in aquaculture where scRNA-seq has been applied to understand immune responses to viral and bacterial pathogens. Andresen et al. [2] produced a comprehensive cellular map of the salmon head kidney using both single-cell and single-nucleus transcriptomics. This organ is functionally analogous to mammalian bone marrow. The study identified myeloid, lymphoid, and erythroid lineages and described novel markers for plasmacytoid dendritic cells and cytotoxic T cells. Importantly, the comparison between single-cell and single-nucleus methods revealed that nuclear transcriptomes retain sufficient immune gene information, offering an alternative for tissues where cell dissociation is challenging.
4.3 Avian and Porcine Systems
In chickens (Gallus gallus) and pigs (Sus scrofa), scRNA-seq has been used to characterize respiratory tract immune responses relevant to pathogens such as Avian Influenza A(H5N1) (see Avian Influenza A(H5N1) in Poultry and Wild Birds: Current Epidemiology, Molecular Diagnostics, and Biosecurity) and Porcine Reproductive and Respiratory Syndrome virus (PRRS) (see Porcine Reproductive and Respiratory Syndrome (PRRS): Genotyping, Diagnostic Assays, and Control Strategies). One challenge in avian scRNA-seq is the presence of nucleated red blood cells, which elevate the background of globin transcripts; pre-processing must include aggressive filtering of erythroid cells.
5. Comparative Analysis of Pipelines: Seurat, Scanpy, and Monocle
The three major analysis frameworks each offer distinct strengths for veterinary applications.
Table 1. Feature comparison of scRNA-seq analysis pipelines for veterinary immunology.
| Feature | Seurat (R) | Scanpy (Python) | Monocle 3 (R) |
|---|---|---|---|
| Normalization | LogNormalize, SCTransform | NormalizeTotal, Scran | LogNormalize, size factor |
| Batch correction | CCA, MNN, Harmony | Scanorama, BBKNN, Harmony | MNN, simple scaling |
| Clustering | Louvain, Leiden | Louvain, Leiden | Louvain, Leiden |
| Trajectory analysis | Built-in (Monocle wrapper) | External (PAGA, Slingshot) | Pseudotime, principal graph |
| Differential expression | Wilcoxon, ROC, MAST | t-test, Wilcoxon, logistic regression | Negative binomial, VGAM |
| Cross-species annotation | SingleR, Azimuth (limited) | SingleR, scArches | Manual marker mapping |
| Scalability to 100k cells | Moderate (requires memory) | High (HDF5, anndata) | Low to moderate |
| Active development | Very high | High | Moderate |
For typical veterinary immune profiling experiments with 5,000 to 50,000 cells, Seurat offers a user-friendly environment with extensive documentation. Scanpy is preferred for very large datasets (e.g., full atlas projects) and for users already working in Python. Monocle 3 is advantageous when the biological question involves developmental or activation trajectories, such as B cell differentiation in the germinal center or T cell exhaustion during chronic infection.
6. Advanced Applications: B Cell Receptor Reconstruction and Tumor Immunity
6.1 Full-Length Immunoglobulin Reconstruction
Standard scRNA-seq (3-prime or 5-prime) provides information on V(D)J recombination only in the context of UMI-tagged variable region transcripts. The BALDR pipeline, developed by Upadhyay et al. [4], reconstructs paired heavy and light chain immunoglobulin sequences from scRNA-seq data. This tool assembles full-length variable regions by using overlapping reads from the 5-prime end, enabling the identification of clonal relationships among B cells. In veterinary immunology, BALDR has been applied to characterize B cell responses in chickens after vaccination against infectious bursal disease virus (see Infectious Bursal Disease Virus Variants). The pipeline requires that the scRNA-seq library protocol includes coverage of the complementarity-determining region 3 (CDR3) and can be adapted to non-mammalian species by providing custom constant region reference sequences.
6.2 Intratumoral Immune Heterogeneity
In veterinary oncology, scRNA-seq has been used to dissect the tumor microenvironment. Ayers et al. [3] applied single-cell next-generation sequencing to assess intratumoral heterogeneity in canine osteosarcoma cell lines. While the study focused on tumor cells rather than immune infiltrates, the methodological framework can be extended to characterize tumor-associated macrophages and CD8+ T cells in spontaneous canine and feline neoplasms. Such approaches are becoming essential for understanding immune evasion mechanisms and for developing immunotherapeutics for companion animals.
7. Pipeline Integration and Computational Considerations
7.1 Combining scRNA-seq with Biological Foundation Models
Recent advances in biological foundation models (e.g., ESMFold, Geneformer) have the potential to improve clustering and annotation for non-model species. These models, pre-trained on large sequence databases, can provide embeddings for gene expression profiles that are less sensitive to incomplete annotations. For example, a foundation model fine-tuned on Bos taurus immune transcriptome data could be used as a feature extractor before clustering. This approach is still experimental but may reduce the reliance on manual marker identification. Relevant background is provided in the article Biological Foundation Models for Veterinary Virology: Predicting Host Tropism and Pathogenicity.
7.2 Hardware and Reproducibility
Processing scRNA-seq data from multiple veterinary samples requires high-performance computing. Seurat and Monocle run in R and can handle up to 50,000 cells on a standard workstation with 32 GB RAM. For larger atlases (e.g., a multi-tissue salmon atlas with 500,000 cells [2]), Scanpy with HDF5-backed anndata objects is recommended. Reproducibility is enhanced by using containerized environments (Docker, Singularity) with pinned package versions. Workflow managers like Nextflow or Snakemake can automate QC, alignment, and clustering steps across batches.
8. Workflow Diagram
The following Mermaid diagram illustrates a typical scRNA-seq analysis pipeline for veterinary immune profiling. Steps specific to species with incomplete genomes (e.g., use of cross-species mapping) are highlighted.
graph TD
A[Raw FASTQ from scRNA-seq], > B[Alignment to reference genome / transcriptome]
B, > C[Cell barcode & UMI counting]
C, > D[Quality control: mito%, UMI count, doublet removal]
D, > E{Ambient RNA?}
E, >|Yes| F[Ambient RNA removal]
E, >|No| G[Normalization & log transformation]
F, > G
G, > H[Batch correction & integration]
H, > I[Dimensionality reduction: PCA / scVI]
I, > J[Clustering: Louvain / Leiden]
J, > K{Cell type annotation}
K, > L[Reference-based: SingleR / Garnett]
K, > M[Manual: conserved markers]
K, > N[Cross-species mapping: orthologs]
L, > O[Immune cell cluster identification]
M, > O
N, > O
O, > P[Differential expression & functional analysis]
O, > Q[Advanced: BCR reconstruction / trajectory]
9. Conclusion
Single-cell RNA-seq analysis pipelines for veterinary immunology are adapting to meet the challenges of non-model species. Seurat and Scanpy provide robust frameworks for pre-processing and clustering, while Monocle adds trajectory inference capabilities. The choice of pipeline should be guided by dataset size, the availability of a high-quality annotated genome, and the specific immunological questions being addressed. Tools such as SingleR and manual annotation using meta-analytic resources [1] partially circumvent the absence of comprehensive species-specific references. Future developments in foundation models and improved cross-species mapping will further enhance the resolution of immune cell profiling in livestock and companion animals. The integration of B cell receptor reconstruction pipelines such as BALDR [4] and tumor microenvironment analyses [3] will continue to expand the scope of veterinary immunology research.
References
Marimuthu VKD, Matheswaran K, Thambiraja M et al. Meta-Analysis of Transcriptomic Datasets Reveals Key Immune Gene Profiles and Signaling Pathways in Bos taurus. Anim Genet. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42108216/
Andresen AMS, Taylor RS, Grimholt U et al. Mapping the cellular landscape of Atlantic salmon head kidney by single cell and single nucleus transcriptomics. Fish Shellfish Immunol. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38181891/
Ayers J, Milner RJ, Cortés-Hinojosa G et al. Novel application of single-cell next-generation sequencing for determination of intratumoral heterogeneity of canine osteosarcoma cell lines. J Vet Diagn Invest. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/33446089/
Upadhyay AA, Kauffman RC, Wolabaugh AN et al. BALDR: a computational pipeline for paired heavy and light chain immunoglobulin reconstruction in single-cell RNA-seq data. Genome Med. 2018. URL: https://pubmed.ncbi.nlm.nih.gov/29558968/
Srivastava AK, Wang Y, Huang R et al. Human genome meeting 2016: Houston, TX, USA. 28 February - 2 March 2016. Hum Genomics. 2016. URL: https://pubmed.ncbi.nlm.nih.gov/27294413/