Section: Clinical Methods & Interventions

Pan-Cancer Analysis of Whole Genomes (PCAWG): A Veterinary Comparative Oncology Framework

Introduction

The Pan-Cancer Analysis of Whole Genomes (PCAWG) project represents a landmark collaborative effort to characterize the genomic landscape of human cancers through the analysis of over 2,600 whole cancer genomes and their matched normal tissues. While the primary focus of PCAWG has been on human oncology, the conceptual framework, analytical pipelines, and biological insights derived from this project have profound implications for veterinary comparative oncology. This article provides a technical review of PCAWG methodologies, including whole genome sequencing (WGS) data processing, somatic variant calling, mutational signature analysis, and evolutionary inference, and discusses their direct applicability to the study of spontaneous neoplasms in domestic and wild animal species.

For veterinary molecular diagnosticians and comparative oncologists, PCAWG serves as a template for systematic genomic characterization of tumors. The integration of PCAWG-derived knowledge with species-specific genomic resources can accelerate the identification of driver mutations, prognostic biomarkers, and therapeutic targets in veterinary patients.

Project Design and Data Generation

Cohort Composition and Sequencing Strategy

The PCAWG consortium aggregated WGS data from 2,658 primary tumors and 738 matched normal tissue samples across 38 distinct tumor types. Sequencing was performed using high-throughput short-read platforms with a target mean coverage of 60x for tumor genomes and 30x for matched normal genomes. This depth of coverage is critical for the detection of clonal and subclonal somatic mutations with high sensitivity and specificity.

Key technical parameters included:

  • Library preparation: PCR-free protocols to minimize amplification bias and enable accurate copy number profiling.
  • Read length: Paired-end reads of 100 to 150 base pairs.
  • Alignment: Reads were aligned to the reference genome using Burrows-Wheeler Aligner (BWA) with subsequent duplicate marking and base quality score recalibration.

Quality Control and Data Harmonization

A centralized data coordination center performed uniform quality control across all contributing centers. Metrics included:

  • Contamination assessment: Estimation of cross-sample contamination using allele frequency deviations.
  • Coverage uniformity: Evaluation of GC bias and regional coverage depth.
  • Library complexity: Assessment of PCR duplicate rates and insert size distributions.

Samples failing predefined quality thresholds were excluded from downstream analyses. This rigorous harmonization ensured that batch effects and technical artifacts were minimized, a principle that is equally critical for veterinary WGS studies where sample quality may vary due to pre-analytical variables such as tissue preservation methods.

Somatic Variant Detection and Annotation

Single Nucleotide Variants and Small Insertions/Deletions

The PCAWG project employed multiple independent variant callers to maximize sensitivity and specificity for somatic single nucleotide variants (SNVs) and small insertions/deletions (indels). The consensus approach involved:

  1. Primary calling: Using tools such as MuTect, Strelka, and SomaticSniper.
  2. Ensemble filtering: Retaining variants called by at least two independent algorithms.
  3. Post-call filtering: Applying filters for strand bias, read position bias, and mapping quality.

For veterinary applications, the same ensemble calling strategy can be applied to non-human genomes, provided that a high-quality reference genome is available. The increasing availability of annotated genomes for dogs, cats, horses, and livestock species makes this approach feasible.

Copy Number Alterations and Structural Variants

Copy number alterations (CNAs) were identified using a combination of read-depth and allele-frequency based methods. The PCAWG pipeline integrated:

  • Read-depth segmentation: Using tools like Control-FREEC and ABSOLUTE to estimate absolute copy numbers.
  • Allele-specific copy number: Incorporating B-allele frequency from heterozygous germline SNPs to distinguish loss of heterozygosity from copy-neutral events.

Structural variants (SVs), including translocations, inversions, and large deletions, were detected using discordant read pair and split-read analysis. The integration of multiple SV callers reduced false positive rates.

A summary of the variant types and their detection methods is provided in Table 1.

Table 1. Somatic Variant Types and Detection Methods in PCAWG

Variant Type Detection Principle Example Tools Veterinary Relevance
SNV Base substitution relative to normal MuTect, Strelka Driver mutations in canine lymphoma
Indel Small insertions/deletions Strelka, Pindel Frameshift in tumor suppressor genes
CNA Read-depth and allele frequency Control-FREEC, ABSOLUTE MYC amplification in feline mammary carcinoma
SV Discordant read pairs, split reads DELLY, Manta Gene fusions in canine osteosarcoma

Mutational Signatures and Etiological Inference

Decomposition of Mutational Processes

One of the most impactful contributions of PCAWG has been the comprehensive cataloging of mutational signatures. These signatures represent the characteristic patterns of base substitutions, indels, and dinucleotide changes that result from specific endogenous or exogenous mutational processes.

The PCAWG consortium identified 49 single base substitution (SBS) signatures, 11 doublet base substitution (DBS) signatures, and 17 small insertion/deletion (ID) signatures. Each signature is defined by the probability of each mutation type occurring in a specific trinucleotide context.

For veterinary oncology, mutational signature analysis can provide insights into the etiological agents driving carcinogenesis in different species. For example:

  • Signature 4 (associated with tobacco smoke in humans) may have parallels in dogs exposed to environmental pollutants.
  • Signature 7 (ultraviolet light exposure) is relevant to solar-induced squamous cell carcinoma in cats and cattle.
  • Signature 17 (reactive oxygen species) may be implicated in chronic inflammation-associated cancers in livestock.

Application to Veterinary Species

The computational framework for mutational signature extraction is species-agnostic, provided that the trinucleotide context is normalized to the reference genome of the target species. A key consideration is the difference in CpG methylation patterns between species, which can alter the baseline mutation rate at CpG dinucleotides.

The workflow for mutational signature analysis in veterinary samples is illustrated in Figure 1.

graph TD
    A[Tumor WGS Data], > B[Somatic SNV Calling]
    B, > C[Trinucleotide Context Extraction]
    C, > D[Signature Decomposition (NMF)]
    D, > E[Signature Assignment to Samples]
    E, > F[Etiological Inference]
    F, > G[Exposure Assessment]
    F, > H[Defective DNA Repair]
    F, > I[Endogenous Processes]
    G, > J[Comparative Epidemiology]
    H, > J
    I, > J

Figure 1. Workflow for mutational signature analysis in veterinary comparative oncology. Non-negative matrix factorization (NMF) is used to decompose the mutation catalog into signatures, which are then assigned to individual tumors for etiological inference.

Tumor Evolution and Clonal Architecture

Subclonal Reconstruction

PCAWG employed multiple computational methods to reconstruct the evolutionary history of tumors from bulk WGS data. The core approach involved:

  1. Variant allele frequency (VAF) clustering: Grouping somatic mutations by their VAF to identify clonal and subclonal populations.
  2. Phylogenetic tree inference: Using methods such as PyClone and PhyloWGS to infer the order of mutation acquisition.
  3. Timing of genomic events: Estimating the time of occurrence of CNAs and SVs relative to clonal expansions.

The concept of clonal heterogeneity is directly transferable to veterinary oncology. For example, studies of canine multicentric lymphoma have demonstrated substantial intratumoral heterogeneity, which may contribute to treatment resistance and disease progression.

Driver Gene Identification

The PCAWG consortium systematically identified driver genes using a combination of:

  • Mutational recurrence: Genes mutated at a frequency significantly higher than the background mutation rate.
  • Functional impact: Prioritization of non-silent mutations predicted to alter protein function.
  • Pathway analysis: Enrichment of mutations in canonical signaling pathways.

A list of frequently mutated driver genes in human cancers and their veterinary orthologs is presented in Table 2.

Table 2. Selected PCAWG Driver Genes and Veterinary Orthologs

Human Gene Veterinary Ortholog Associated Tumor Types in Animals
TP53 TP53 Canine osteosarcoma, feline mammary carcinoma
PIK3CA PIK3CA Canine mammary tumor, equine melanoma
KRAS KRAS Feline pulmonary adenocarcinoma
APC APC Canine colorectal polyp
BRAF BRAF Canine urothelial carcinoma

Translational Applications in Veterinary Medicine

Diagnostic Biomarker Development

The PCAWG resource has identified numerous recurrent mutations that can serve as diagnostic biomarkers. In veterinary medicine, analogous biomarkers can be developed for:

  • Liquid biopsy assays: Detection of circulating tumor DNA (ctDNA) harboring species-specific hotspot mutations.
  • Tissue-based diagnostics: Immunohistochemistry and targeted sequencing panels for driver gene mutations.

For example, the BRAF V595E mutation in canine urothelial carcinoma is directly analogous to the BRAF V600E mutation in human melanoma and can be detected using allele-specific PCR or targeted sequencing.

Therapeutic Target Identification

The pathway-level analysis from PCAWG has highlighted druggable targets that are conserved across species. These include:

  • Receptor tyrosine kinases: Mutations in KIT, MET, and EGFR are observed in both human and canine cancers.
  • Cell cycle regulators: CDK4/6 inhibitors, originally developed for human breast cancer, are being evaluated in canine osteosarcoma.
  • DNA repair pathways: Tumors with homologous recombination deficiency (e.g., BRCA1/2 mutations) may be sensitive to PARP inhibitors.

Comparative Epidemiology

Mutational signatures can link environmental exposures to cancer risk in animal populations. For instance, the detection of aflatoxin-associated mutational signatures in livestock with hepatocellular carcinoma could inform feed management practices. Similarly, UV-associated signatures in solar-induced squamous cell carcinoma of the bovine ocular region can guide preventive management strategies.

Computational Infrastructure and Data Sharing

Data Repositories

The PCAWG data are publicly available through the International Cancer Genome Consortium (ICGC) data portal and the European Genome-phenome Archive (EGA). For veterinary researchers, analogous repositories such as the Canine Cancer Genome Project and the Feline Genome Project provide species-specific genomic data.

Analysis Pipelines

The PCAWG consortium developed standardized analysis pipelines that are openly available. These pipelines can be adapted for veterinary use with the following modifications:

  • Reference genome: Replace the human reference genome (GRCh38) with the appropriate species reference (e.g., CanFam3.1 for dog, Felis_catus_9.0 for cat).
  • Annotation databases: Use species-specific gene annotations and known variant databases.
  • Mutational signature reference: Apply the PCAWG signature catalog with caution, as some signatures may be species-specific.

Limitations and Considerations for Veterinary Application

Reference Genome Quality

The accuracy of variant detection is highly dependent on the quality of the reference genome. While the dog and cat genomes are well-annotated, genomes for many livestock and wildlife species are less complete. This can lead to higher false positive rates for SV detection and reduced sensitivity for indels.

Tumor Purity and Heterogeneity

Veterinary tumor samples often contain variable amounts of stromal and inflammatory cells, which can reduce the sensitivity of somatic variant detection. Computational methods for estimating tumor purity, such as those used in PCAWG, should be applied to veterinary samples to adjust VAF thresholds.

Ethical and Regulatory Considerations

The use of WGS in veterinary clinical practice raises ethical considerations regarding incidental findings, data ownership, and client consent. Veterinary molecular diagnosticians should establish clear protocols for the return of genomic results to referring veterinarians and pet owners.

Conclusion

The Pan-Cancer Analysis of Whole Genomes project has established a comprehensive framework for the genomic characterization of cancer. The methodologies developed by PCAWG, including ensemble variant calling, mutational signature decomposition, and clonal evolution analysis, are directly applicable to veterinary comparative oncology. By leveraging these tools, veterinary researchers can identify species-specific driver mutations, elucidate etiological factors, and develop diagnostic and therapeutic strategies for spontaneous animal cancers. The continued integration of PCAWG-derived knowledge with veterinary genomic resources will accelerate the translation of genomic medicine from human to veterinary oncology.

References

  1. Campbell PJ, Getz G, Korbel JO, et al. Pan-cancer analysis of whole genomes. Nature. 2020;578(7793):82-93.
  2. Alexandrov LB, Kim J, Haradhvala NJ, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578(7793):94-101.
  3. Gerstung M, Jolly C, Leshchiner I, et al. The evolutionary history of 2,658 cancers. Nature. 2020;578(7793):122-128.
  4. Rheinbay E, Nielsen MM, Abascal F, et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature. 2020;578(7793):102-111.
  5. Li Y, Roberts ND, Wala JA, et al. Patterns of somatic structural variation in human cancer genomes. Nature. 2020;578(7793):112-121.