Pan-Cancer Analysis of Whole Genomes (PCAWG): A Veterinary Comparative Oncology Framework
Introduction
The Pan-Cancer Analysis of Whole Genomes (PCAWG) project represents a landmark collaborative effort to characterize the genomic landscape of human cancers through the analysis of over 2,600 whole cancer genomes and their matched normal tissues. While the primary focus of PCAWG has been on human oncology, the conceptual framework, analytical pipelines, and biological insights derived from this project have profound implications for veterinary comparative oncology. This article provides a technical review of PCAWG methodologies, including whole genome sequencing (WGS) data processing, somatic variant calling, mutational signature analysis, and evolutionary inference, and discusses their direct applicability to the study of spontaneous neoplasms in domestic and wild animal species.
For veterinary molecular diagnosticians and comparative oncologists, PCAWG serves as a template for systematic genomic characterization of tumors. The integration of PCAWG-derived knowledge with species-specific genomic resources can accelerate the identification of driver mutations, prognostic biomarkers, and therapeutic targets in veterinary patients.
Project Design and Data Generation
Cohort Composition and Sequencing Strategy
The PCAWG consortium aggregated WGS data from 2,658 primary tumors and 738 matched normal tissue samples across 38 distinct tumor types. Sequencing was performed using high-throughput short-read platforms with a target mean coverage of 60x for tumor genomes and 30x for matched normal genomes. This depth of coverage is critical for the detection of clonal and subclonal somatic mutations with high sensitivity and specificity.
Key technical parameters included:
- Library preparation: PCR-free protocols to minimize amplification bias and enable accurate copy number profiling.
- Read length: Paired-end reads of 100 to 150 base pairs.
- Alignment: Reads were aligned to the reference genome using Burrows-Wheeler Aligner (BWA) with subsequent duplicate marking and base quality score recalibration.
Quality Control and Data Harmonization
A centralized data coordination center performed uniform quality control across all contributing centers. Metrics included:
- Contamination assessment: Estimation of cross-sample contamination using allele frequency deviations.
- Coverage uniformity: Evaluation of GC bias and regional coverage depth.
- Library complexity: Assessment of PCR duplicate rates and insert size distributions.
Samples failing predefined quality thresholds were excluded from downstream analyses. This rigorous harmonization ensured that batch effects and technical artifacts were minimized, a principle that is equally critical for veterinary WGS studies where sample quality may vary due to pre-analytical variables such as tissue preservation methods.
Somatic Variant Detection and Annotation
Single Nucleotide Variants and Small Insertions/Deletions
The PCAWG project employed multiple independent variant callers to maximize sensitivity and specificity for somatic single nucleotide variants (SNVs) and small insertions/deletions (indels). The consensus approach involved:
- Primary calling: Using tools such as MuTect, Strelka, and SomaticSniper.
- Ensemble filtering: Retaining variants called by at least two independent algorithms.
- Post-call filtering: Applying filters for strand bias, read position bias, and mapping quality.
For veterinary applications, the same ensemble calling strategy can be applied to non-human genomes, provided that a high-quality reference genome is available. The increasing availability of annotated genomes for dogs, cats, horses, and livestock species makes this approach feasible.
Copy Number Alterations and Structural Variants
Copy number alterations (CNAs) were identified using a combination of read-depth and allele-frequency based methods. The PCAWG pipeline integrated:
- Read-depth segmentation: Using tools like Control-FREEC and ABSOLUTE to estimate absolute copy numbers.
- Allele-specific copy number: Incorporating B-allele frequency from heterozygous germline SNPs to distinguish loss of heterozygosity from copy-neutral events.
Structural variants (SVs), including translocations, inversions, and large deletions, were detected using discordant read pair and split-read analysis. The integration of multiple SV callers reduced false positive rates.
A summary of the variant types and their detection methods is provided in Table 1.
Table 1. Somatic Variant Types and Detection Methods in PCAWG
| Variant Type | Detection Principle | Example Tools | Veterinary Relevance |
|---|---|---|---|
| SNV | Base substitution relative to normal | MuTect, Strelka | Driver mutations in canine lymphoma |
| Indel | Small insertions/deletions | Strelka, Pindel | Frameshift in tumor suppressor genes |
| CNA | Read-depth and allele frequency | Control-FREEC, ABSOLUTE | MYC amplification in feline mammary carcinoma |
| SV | Discordant read pairs, split reads | DELLY, Manta | Gene fusions in canine osteosarcoma |
Mutational Signatures and Etiological Inference
Decomposition of Mutational Processes
One of the most impactful contributions of PCAWG has been the comprehensive cataloging of mutational signatures. These signatures represent the characteristic patterns of base substitutions, indels, and dinucleotide changes that result from specific endogenous or exogenous mutational processes.
The PCAWG consortium identified 49 single base substitution (SBS) signatures, 11 doublet base substitution (DBS) signatures, and 17 small insertion/deletion (ID) signatures. Each signature is defined by the probability of each mutation type occurring in a specific trinucleotide context.
For veterinary oncology, mutational signature analysis can provide insights into the etiological agents driving carcinogenesis in different species. For example:
- Signature 4 (associated with tobacco smoke in humans) may have parallels in dogs exposed to environmental pollutants.
- Signature 7 (ultraviolet light exposure) is relevant to solar-induced squamous cell carcinoma in cats and cattle.
- Signature 17 (reactive oxygen species) may be implicated in chronic inflammation-associated cancers in livestock.
Application to Veterinary Species
The computational framework for mutational signature extraction is species-agnostic, provided that the trinucleotide context is normalized to the reference genome of the target species. A key consideration is the difference in CpG methylation patterns between species, which can alter the baseline mutation rate at CpG dinucleotides.
The workflow for mutational signature analysis in veterinary samples is illustrated in Figure 1.
graph TD
A[Tumor WGS Data], > B[Somatic SNV Calling]
B, > C[Trinucleotide Context Extraction]
C, > D[Signature Decomposition (NMF)]
D, > E[Signature Assignment to Samples]
E, > F[Etiological Inference]
F, > G[Exposure Assessment]
F, > H[Defective DNA Repair]
F, > I[Endogenous Processes]
G, > J[Comparative Epidemiology]
H, > J
I, > J
Figure 1. Workflow for mutational signature analysis in veterinary comparative oncology. Non-negative matrix factorization (NMF) is used to decompose the mutation catalog into signatures, which are then assigned to individual tumors for etiological inference.
Tumor Evolution and Clonal Architecture
Subclonal Reconstruction
PCAWG employed multiple computational methods to reconstruct the evolutionary history of tumors from bulk WGS data. The core approach involved:
- Variant allele frequency (VAF) clustering: Grouping somatic mutations by their VAF to identify clonal and subclonal populations.
- Phylogenetic tree inference: Using methods such as PyClone and PhyloWGS to infer the order of mutation acquisition.
- Timing of genomic events: Estimating the time of occurrence of CNAs and SVs relative to clonal expansions.
The concept of clonal heterogeneity is directly transferable to veterinary oncology. For example, studies of canine multicentric lymphoma have demonstrated substantial intratumoral heterogeneity, which may contribute to treatment resistance and disease progression.
Driver Gene Identification
The PCAWG consortium systematically identified driver genes using a combination of:
- Mutational recurrence: Genes mutated at a frequency significantly higher than the background mutation rate.
- Functional impact: Prioritization of non-silent mutations predicted to alter protein function.
- Pathway analysis: Enrichment of mutations in canonical signaling pathways.
A list of frequently mutated driver genes in human cancers and their veterinary orthologs is presented in Table 2.
Table 2. Selected PCAWG Driver Genes and Veterinary Orthologs
| Human Gene | Veterinary Ortholog | Associated Tumor Types in Animals |
|---|---|---|
| TP53 | TP53 | Canine osteosarcoma, feline mammary carcinoma |
| PIK3CA | PIK3CA | Canine mammary tumor, equine melanoma |
| KRAS | KRAS | Feline pulmonary adenocarcinoma |
| APC | APC | Canine colorectal polyp |
| BRAF | BRAF | Canine urothelial carcinoma |
Translational Applications in Veterinary Medicine
Diagnostic Biomarker Development
The PCAWG resource has identified numerous recurrent mutations that can serve as diagnostic biomarkers. In veterinary medicine, analogous biomarkers can be developed for:
- Liquid biopsy assays: Detection of circulating tumor DNA (ctDNA) harboring species-specific hotspot mutations.
- Tissue-based diagnostics: Immunohistochemistry and targeted sequencing panels for driver gene mutations.
For example, the BRAF V595E mutation in canine urothelial carcinoma is directly analogous to the BRAF V600E mutation in human melanoma and can be detected using allele-specific PCR or targeted sequencing.
Therapeutic Target Identification
The pathway-level analysis from PCAWG has highlighted druggable targets that are conserved across species. These include:
- Receptor tyrosine kinases: Mutations in KIT, MET, and EGFR are observed in both human and canine cancers.
- Cell cycle regulators: CDK4/6 inhibitors, originally developed for human breast cancer, are being evaluated in canine osteosarcoma.
- DNA repair pathways: Tumors with homologous recombination deficiency (e.g., BRCA1/2 mutations) may be sensitive to PARP inhibitors.
Comparative Epidemiology
Mutational signatures can link environmental exposures to cancer risk in animal populations. For instance, the detection of aflatoxin-associated mutational signatures in livestock with hepatocellular carcinoma could inform feed management practices. Similarly, UV-associated signatures in solar-induced squamous cell carcinoma of the bovine ocular region can guide preventive management strategies.
Computational Infrastructure and Data Sharing
Data Repositories
The PCAWG data are publicly available through the International Cancer Genome Consortium (ICGC) data portal and the European Genome-phenome Archive (EGA). For veterinary researchers, analogous repositories such as the Canine Cancer Genome Project and the Feline Genome Project provide species-specific genomic data.
Analysis Pipelines
The PCAWG consortium developed standardized analysis pipelines that are openly available. These pipelines can be adapted for veterinary use with the following modifications:
- Reference genome: Replace the human reference genome (GRCh38) with the appropriate species reference (e.g., CanFam3.1 for dog, Felis_catus_9.0 for cat).
- Annotation databases: Use species-specific gene annotations and known variant databases.
- Mutational signature reference: Apply the PCAWG signature catalog with caution, as some signatures may be species-specific.
Limitations and Considerations for Veterinary Application
Reference Genome Quality
The accuracy of variant detection is highly dependent on the quality of the reference genome. While the dog and cat genomes are well-annotated, genomes for many livestock and wildlife species are less complete. This can lead to higher false positive rates for SV detection and reduced sensitivity for indels.
Tumor Purity and Heterogeneity
Veterinary tumor samples often contain variable amounts of stromal and inflammatory cells, which can reduce the sensitivity of somatic variant detection. Computational methods for estimating tumor purity, such as those used in PCAWG, should be applied to veterinary samples to adjust VAF thresholds.
Ethical and Regulatory Considerations
The use of WGS in veterinary clinical practice raises ethical considerations regarding incidental findings, data ownership, and client consent. Veterinary molecular diagnosticians should establish clear protocols for the return of genomic results to referring veterinarians and pet owners.
Conclusion
The Pan-Cancer Analysis of Whole Genomes project has established a comprehensive framework for the genomic characterization of cancer. The methodologies developed by PCAWG, including ensemble variant calling, mutational signature decomposition, and clonal evolution analysis, are directly applicable to veterinary comparative oncology. By leveraging these tools, veterinary researchers can identify species-specific driver mutations, elucidate etiological factors, and develop diagnostic and therapeutic strategies for spontaneous animal cancers. The continued integration of PCAWG-derived knowledge with veterinary genomic resources will accelerate the translation of genomic medicine from human to veterinary oncology.
References
- Campbell PJ, Getz G, Korbel JO, et al. Pan-cancer analysis of whole genomes. Nature. 2020;578(7793):82-93.
- Alexandrov LB, Kim J, Haradhvala NJ, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578(7793):94-101.
- Gerstung M, Jolly C, Leshchiner I, et al. The evolutionary history of 2,658 cancers. Nature. 2020;578(7793):122-128.
- Rheinbay E, Nielsen MM, Abascal F, et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature. 2020;578(7793):102-111.
- Li Y, Roberts ND, Wala JA, et al. Patterns of somatic structural variation in human cancer genomes. Nature. 2020;578(7793):112-121.