Tumor mutation burden estimated by a 69-gene-panel is associated with overall survival in patients with diffuse large B-cell lymphoma

Background Tumor mutation burden (TMB) as estimated by cancer gene panels (CGPs) has been confirmed to be associated with prognosis and is effective in predicting clinical benefit from immune checkpoint blockade (ICB) in solid tumors. However, whether the TMB calculated by CGPs is associated with overall survival (OS) for patients with diffuse large B-cell lymphoma (DLBCL) is worth exploring. Methods The prognostic value of panel-TMB, calculated by a panel of 69 genes (GP69), for 87 DLBCL patients in our clinical center (GDPH dataset) was explored. The results were further validated using 37 DLBCL patients from the Cancer Genome Atlas (TCGA) database (TCGA dataset). Results Spearman correlation analysis suggested that panel-TMB is positively correlated with the TMB calculated by whole-exome sequencing (wTMB) in the TCGA dataset (R = 0.76, P < 0.0001). Both GDPH and TCGA results demonstrated that higher panel-TMB is significantly associated with a poor OS for DLBCL patients (P < 0.05) where a panel of 13 genes was associated with poor OS, and another panel of 26 genes was correlated with a favorable OS for DLBCL patients. Further subgroup analysis indicated that higher panel-TMB had shorter OS in DLBCL patients with younger than 60 years, elevated LDH, greater than one extranodal involvement, stage III/IV, an IPI score of 3–5, or HBsAg, anti-HBc, or HBV-DNA negativity (P < 0.05). Interestingly, the nomogram model constructed by panel-TMB, stage, and IPI could individually and visually predict the 1-, 2- and 3-year OS rates of DLBCL patients. Conclusions We established GP69 for the evaluation of OS for Chinese DLBCL patients. panel-TMB might be a potential predictor for prognostic stratification of DLBCL patients. Supplementary Information The online version contains supplementary material available at 10.1186/s40164-021-00215-4.


Introduction
Diffuse large B-cell lymphoma (DLBCL) is the most common type of aggressive non-Hodgkin lymphoma, which can occur de novo or is caused by the transformation of indolent lymphoma [1][2][3]. A large number of patients can be clinically relieved or even cured using standard chemo-immunotherapy; however, approximately one-third of patients still have a poor prognosis due to drug resistance or relapse, which is partially related to the heterogeneity of DLBCL [4][5][6][7]. This heterogeneity is manifested at the clinical level and in the morphology, genetics, and immunophenotype. However, the current prognostic scoring system stratifies DLBCL patients based on an international prognostic index (IPI) of clinical level, including age, stage, performance status (PS), serum lactate dehydrogenase (LDH) level, and the amount of extranodal involvement [8][9][10]. In fact, through next-generation sequencing (NGS) analysis of DLBCL, mutations in a number of genes that play a crucial role in tumor progression, maintenance, and response to treatment have been discovered [1]. Therefore, there is an urgent need for prognostic stratification of DLBCL patients based on mutations to guide clinical treatment.
Tumor mutation burden (TMB) is the number of somatic mutations per megabase (Mb) of the genome in a tumor, representing the instability in its genome. Tumors with high TMB are more likely to induce neoantigens' production, making them a target of activated immune cells [11,12]. Recent studies have shown that high TMB measured by whole-exome sequencing (WES) is closely related to higher response rates to ICB in cancers, thereby predicting favorable clinical outcomes [13,14]. However, because of the cost of whole-genome sequencing (WES), the timeliness and informatics challenges of WES in the clinical setting, it is difficult to popularize in clinical applications [15,16]. Instead, it is now clinically more common to use a smaller cancer gene panel (CGP) for precise therapy, immunotherapy, and patients' prognostic stratification [15,17,18].
Hence, in this study, the previously reported lymphoma-related genes [19] overlapped with the WES data in the Cancer Genome Atlas (TCGA) database, plus hot spot mutation genes, 69 genes were obtained for developing a panel for TMB estimation (panel-TMB), which could be used for overall survival (OS) analysis. We further validated our findings using data from the TCGA database.

Patient samples
A total of 87 whole blood and tumor biopsies were collected from patients newly diagnosed with DLBCL at Guangdong Provincial People's Hospital (GDPH) between January 21, 2014, and July 15, 2019, for use with targeted sequencing using the GP69 panel [20]. Clinical characteristics including gender, age, immunophenotype, LDH level, extranodal involvement, eastern cooperative oncology group performance status (ECOG PS), Ann Arbor stage, IPI score, double-hit and double-express or lymphomas (DHL/DEL), hepatitis B surface antigen (HBsAg), antibody to hepatitis B core antigen (anti-HBc), and hepatitis B virus DNA (HBV-DNA) status, and treatment options data ( Table 1). The last follow-up was completed on May 20, 2020, and the median followup time for the DLBCL patients was 435 days (range 5-1722 days). OS was defined as the time from diagnosis to death of any cause or last follow-up. The workflow of data mining in the GDPH cohort was shown in Fig. 1. This study was performed according to the Declaration of Helsinki principles and approved by the Ethics Committee of Guangdong Provincial People's Hospital. All participants provided written informed consent.

Library construction
Tumor genomic DNA was extracted from whole blood and tumor biopsies, and fragmented DNA was generated with a Bioruptor (Diagenode, Bioruptor UCD-200) following the manufacturer's protocol. Libraries were constructed using the KAPA Hyper DNA Library Prep Kit (KAPA Biosystem, KK8504). Finally, dual-indexed sequencing libraries were PCR amplified with KAPA HiFi Hot start-ready Mix (KAPA, KK2602) for 4-6 cycles, and they were then cleaned up with purification Beads (Corning, AxyPrep Fragment Select-I kit, 14223162). The library concentration and quality were determined using the Qubit 3.0 system (Invitrogen) and Bioanalyzer 2100 (Agilent, Agilent HS DNA Reagent, 5067-4627).

Hybrid selection and ultra-deep next-generation sequencing
A 5′-biotinylated probe solution was used as capture probes. The probes for targeted sequencing covered exons and selected introns in 69 DLBCL-related genes (Additional file 1: Table S1) in a cohort of 87 patients. A total of 1 μg of each fragmented sequencing library was mixed with 5 μg salmon sperm DNA, 5 μg human Cot-1 DNA, and 1unit adaptor-specific blocker DNA in hybridization buffer and then heated for 10 min at 95 °C and held for 5 min at 65 °C in a thermocycler. The capture probes were added to the mixture in 5 min, and solution hybridization was performed for 16-18 h at 65 °C. After hybridization was complete, the captured targets were selected by pulling down the biotinylated probe/target hybrids using streptavidin-coated magnetic beads, and the off-target library was removed using wash buffer. PCR master mix was directly added to amplify (6-8 cycles) the captured library from the washed beads. Afterward, the samples were purified by AMPure XP beads, quantified by qPCR (Kapa), and sized with a bioanalyzer 2100 (Agilent, Agilent HS DNA Reagent, 5067-4627). Libraries were normalized to 2.5 nM and then pooled. Finally, the library was sequenced as paired 150 bp reads with an Illumina HiSeq 4000 according to the manufacturer's instructions.

Sequence alignment and processing
Base-calling was performed using bcl2fastq v2.16.0.10 (Illumina, Inc.) to generate sequence reads in FASTQ format (Illumina 1.8 + encoding). Quality control (QC) was performed with Trimmomatic. High-quality reads were mapped to the human genome (hg19, GRCh37 Genome Reference Consortium Human Reference 37) using the BWA aligner 0.7.12 with the BWA-MEM algorithm using default parameters to create SAM files. Picard 1.119 was used to convert SAM files to compressed BAM files, which were then sorted according to chromosome coordinates. The genome analysis tool kit (GATK, version 3.4-0) was used to locally realign the BAM files at loci with indel mismatches and recalibrate the base quality scores of the reads in the BAM files.

SNV/indel/CNV detection
SNVs and short Indels were identified by VarScan2 2.3.9 with the minimum variant allele frequency threshold set at 0.01 and the p-value threshold for calling variants set at 0.05 to generate variant call format (VCF) files. All SNVs/ indels were annotated with ANNOVAR, and each SNV/ indel was manually checked in the Integrative Genomics Viewer (IGV). Copy number variations (CNVs) were detected using in-house-developed software.

TCGA dataset
The non-synonymous mutation data of 37  clinical information of the 37 DLBCL patients from the TCGA database was listed in Table 1. The TCGA database is publicly available; thus, approval from the local ethics committee was not required.

Statistical analysis
All statistical analyses were conducted with SPSS (version 22.0, IBM, Armonk, NY, USA) and R (version 3.6.1, https ://www.r-proje ct.org/) as appropriate. The optimal cut-off value for panel-TMB was determined using maximally selected rank statistics in the "maxstat" R package [24,25], which was reflected in the "survminer" package. The log-rank test was used to compare differences between Kaplan-Meier curves. The coefficients of the univariate COX regression model were acquired by SPSS 22.0 software. The Spearman method was applied to obtain correlation coefficients between two groups of quantitative variables. Differences in qualitative variables were compared by the chi-square test. A nomogram model was constructed based on the previous study [26]. The "survRM2" package was used to determine restricted mean survival time (RMST). A two-tailed P value < 0.05 was considered statistically significant.

The relationship between panel-TMB and tumor mutation burden estimated by whole-exome sequencing (wTMB)
To evaluate whether panel-TMB can reflect wTMB, 37 DLBCL patients in the TCGA dataset were used to analyze the correlation between the two. As shown in Fig. 2a, non-synonymous mutations (NsMs) derived from wholeexome sequencing and GP69 are relatively consistent in DLBCL patients. Further, Spearman correlation analysis found that panel-TMB and wTMB have a significant positive correlation (R = 0.76, P < 0.0001, Fig. 2b). These results indicated that the GP69 developed by us could well represent the wTMB in DLBCL patients.

The landscape of DLBCL-GP69
To visualize the distribution of mutations in the 69 genes for DLBCL patients, we depicted the landscape of DLBCL-GP69 in waterfall plots. Shown in Fig. 3a, b are the variant classifications of the 69 genes in the GDPH and TCGA datasets. We further analyzed genes with mutation frequencies greater than 10% and found the following ten genes in common in both the GDPH and TCGA datasets: KMT2D, PIM1, MYD88, B2M, FAT1, CD79B, TP53, CREBBP, MYC, and STAT3 (Fig. 3c). Besides, the GP69 genes were distributed on 21 chromosomes (Fig. 3d). Next, the biological processes (BPs) of the GP69 genes were investigated. As shown in Additional file 2: Fig. S1, GP69 was involved in eight BPs, including apoptosis/ cell proliferation, transcriptional regulation, cell cycle, chromatin modification, immune response, B cell receptor signaling pathway, JAK-STAT signaling pathway, and cell migration regulation, and the number of genes in each category was 26, 11, 7, 7, 6, 6, 4, and 2, respectively. These results suggested that the GP69 genes are primarily involved in important processes in tumor progression.

Higher TMB estimated by a 69-gene panel (panel-TMB) is associated with poor OS
To better assess the impact of panel-TMB on the OS of patients with DLBCL, the optimal cut-off value for panel-TMB was determined. The cut-off values for panel-TMB in the GDPH and TCGA datasets were 4 and 9, respectively (Additional file 3: Fig. S2).
We further explored which genes had a greater contribution to OS for panel-TMB using a univariate COX regression model.  Fig. S3A, B). Notably, we tried to streamline the gene number in the panel by using these 39 prognosis-related genes in both the GDPH and TCGA datasets. However, after reducing the number of genes in the panel, the calculated panel-TMB had no significant correlation with OS in both the GDPH and TCGA datasets (P > 0.05, Additional file 5: Fig. S4A, B). Therefore, panel-TMB estimated from 69 genes might be the minimal panel for OS analysis in this study.

Panel-TMB subgroup analysis
To investigate the correlation between panel-TMB and clinical characteristics, we conducted a subgroup analysis of the GDPH dataset. As shown in Fig. 5, among DLBCL patients younger than 60 years, higher panel-TMB was correlated with poor OS (HR > 100, 95% CI, 0 to > 100, P = 0.035). When LDH levels were elevated, DLBCL patients with higher panel-TMB had shorter OS (HR = 7.80, 95% CI, 1.01 to 60.12, P = 0.020). In patients with greater than one extranodal involvement, higher panel-TMB might predict poor OS for DLBCL patients (HR = 7.18, 95% CI, 0.92 to 56.25, P = 0.028). When the patients were at stage III/IV, there was a difference in survival according to the level of panel-TMB (HR = 4.25, 95% CI, 0.96 to 18.74, P = 0.037). In patients with an IPI of 3-5, higher panel-TMB was significantly associated with

Construction of nomogram model
To visually and personally predict the OS rate of DLBCL patients, clinical information was used to construct the nomogram model. Kaplan-Meier curves showed that compared with stage I/II, stage III/IV was associated with poor OS of patients (P = 0.025, Additional file 7: Fig. S6). Similarly, a high IPI score predicted poor OS of DLBCL (P = 0.021, Additional file 7: Fig. S6). However, gender, age, LDH, extranodal involvement, ECOG PS, subtype, DHL/DEL, HBsAg, anti-HBc, or HBV-DNA was not significantly correlated with OS (P > 0.05, Additional file 7: Fig. S6). Thus, panel-TMB, stage, and IPI were used to construct a nomogram model for predicting 1-, 2-and 3-year OS rates of DLBCL patients (Fig. 6a), and the detailed points and OS rates were shown in Additional file 8: Table S2. Then, the time-dependent receiver operating characteristic (ROC) curve suggested that the nomogram model constructed by panel-TMB, stage, and IPI had a good performance (area under the curve (AUC) = 0.723) (Fig. 6b upper panel). Moreover, the calibration curve indicated that the OS rate predicted by the nomogram was in line with the actual OS rate (Fig. 6b  bottom panel).

Discussion
To develop an OS prediction system for Chinese DLBCL patients, in this study, we developed the GP69 panel, which includes genes distributed on 21 chromosomes that are mainly involved in tumor progression. Excitingly, we found a significant positive correlation between the panel-TMB measured by GP69 and the wTMB assessed by WES. These results suggest that the panel-TMB estimated by 69 genes involved in important BPs could replace wTMB in evaluating OS and represent the genomic instability in DLBCL patients. These findings are in accordance with studies in which CGPs were developed to replace WES to estimate TMB in cancer patients [27,28].
To investigate the clinical significance of the panel-TMB estimated by GP69, we performed survival analysis of DLBCL patients. Firstly, we found that higher panel-TMB was significantly associated with a poor OS for DLBCL patients. Secondary, the nomogram model constructed by panel-TMB, stage, and IPI could individually and visually predict the 1-, 2-and 3-year OS rates of DLBCL. Interestingly, a previous study confirmed that higher TMB as estimated by a CGP is associated with a favorable prognosis and predicts the clinical benefits of ICB therapy [15,29]. Moreover, in the absence of checkpoint inhibitor treatment, cancer patients with higher TMB tend to have adverse outcomes [30], which is consistent with our findings, thus, it may also indicate that higher panel-TMB might be an adverse prognostic factor for DLBCL. To elucidate the factors that interact with TMB, we further stratified patients based on tumor burden-related clinical parameters, including extranodal involvement, LDH, IPI, and stage, and the results demonstrate that in cases with higher tumor burden, more extranodal involvement sites, elevated LDH, advanced stage, higher IPI score, and higher mutation burden might be worse for the prognosis of these subsets of cases, indicating that mutation burden and tumor burden act as doubly impaired factors for survival. HBV-infection contributes to mutagenesis and is associated with poor prognosis for DLBCL patients [31,32]; thus, we stratified cases based on HBV-infection. In HBV-infected patients, higher panel-TMB had no impact on survival, suggesting that HBV might act as an adverse factor but had no effect on the mutation burden induced by the virus. Studies have shown that TMB is significantly positively correlated with age in solid tumors, especially in patients with over 60 years of age [33]. However, little is known the prognostic importance between TMB in patients with age in DLBCL patients, in this study, we showed that higher panel-TMB was associated with a poor OS for younger than 60, while the level of panel-TMB was not significantly correlated with OS for greater than 60 years of age. The difference may be related to the characteristic of the malignancies, particularly the disorder in immune system. To further shorten the gene panel, we used univariate COX regression to analyze the contribution of the NsMs of the 69 genes to OS in panel-TMB. We found that a panel of 13 genes was associated with poor OS, and another panel of 26 genes was correlated with the favorable OS, indicating that mutations in these 39 genes could be used to calculate panel-TMB and conduct prognostic stratification of DLBCL patients. MYD88, CREBBP, MYC, and FAT1 particularly attracted our attention because, in addition to their greater contribution to OS, their mutation frequency was greater than 10%. Studies have shown that MYD88, CREBBP, and MYC mutations play a vital role in regulating apoptosis/cell proliferation and transcription and predict adverse clinical outcomes in DLBCL patients [34][35][36], which is consistent with our results. Besides, a previous study has suggested that FAT1 is a tumor suppressor or has carcinogenic effects and participates in the regulation of cell metastasis [37], but this study demonstrates that FAT1 mutation contributes to the favorable OS for DLBCL patients. These results indicate that the NsMs in MYD88, CREBBP, MYC, and FAT1 significantly contribute to panel-TMB and prognostic stratification of DLBCL patients.
The limitation in this study is that the sample size may make panel-TMB statistically biased in predicting the OS of DLBCL patients. Moreover, we have not conducted clinical trials to evaluate the immune response of DLBCL patients to ICB using panel-TMB. To better describe the clinical value of panel-TMB in DLBCL patients, further investigation is needed.

Conclusions
Herein, we reveal for the first time that the panel-TMB measured by GP69 could replace wTMB, and higher panel-TMB is associated with a poor OS for younger than 60, elevated LDH, greater than one extranodal involvement, stage III/IV, IPI score 3-5, and HBsAg, anti-HBc, or HBV-DNA negativity. Furthermore, nomogram constructed by panel-TMB, stage, and IPI could individually and visually predict the OS rates of DLBCL. Panel-TMB might be a potential predictor for the prognostic stratification of Chinese DLBCL patients.