【原】人工智能预测14万乳腺癌女性生死

SIBCS 2023-05-11 发布于上海

展开全文

　　预后（prognosis）是对结局（outcome）的预测，最大的结局莫过于生死。乳腺癌临床预后工具通过个体化风险预测，有助于乳腺癌医疗决策。不过，现有的预后工具本质上仅限于治疗特定分期的患者亚组。对任何分期乳腺癌患者进行诊断后死亡风险精准预测，可能有助于临床分层随访、根据患者结局预测向患者提供咨询或确定适合入组临床试验的高风险个体。过去，大多采用统计学回归模型进行临床结局预测建模。近年来，机器学习等人工智能方法虽然已被广泛用于临床预测建模，但是既往研究也大多局限于小样本人群特定分期乳腺癌患者亚组。

　　2023年5月10日，国际四大医学期刊之一、英国医学会《英国医学杂志》正刊发表英国癌症研究基金会牛津中心和牛津大学的超大样本人群队列研究报告，对用于临床预测任何分期乳腺癌女性10年乳腺癌相关死亡风险的统计学回归模型和机器学习模型进行了开发和内部＋外部验证，对统计学回归模型和机器学习模型的结果进行了比较。

　　该研究首先将英国初级和二级医疗保健数据库与全国癌症登记数据库、医院事件统计数据和全国死亡登记数据库的患者个体水平数据进行关联，其中2000年1月1日至2020年12月31日诊断为乳腺浸润癌且数据完整的年龄≥20岁女性共计14万1765例。随后采用了4种建模策略，包括2种回归模型（多因素比例风险回归和竞争风险回归）和2种机器学习（XGBoost和人工神经网络）方法，采用内部＋外部交叉验证对模型进行定性分析，采用随机效应荟萃分析汇总区分和校准指标的估计值，采用校准曲线和决策曲线分析对模型性能、可移植性和临床实用性进行定量分析。

多因素比例风险回归模型

竞争风险回归模型

　　结果发现，中位4.16年（四分位1.76～8.26）随访期间，发生2万1688例乳腺癌相关死亡和1万1454例其他原因死亡。将乳腺癌诊断后最长随访时间限制为10年，共计68万8564.81人×年发生2万0367例乳腺癌相关死亡。乳腺癌相关粗死亡率为万分之295.79（95%置信区间：291.75～299.88）。

　　多因素比例风险回归模型和竞争风险模型的预测因素各不相同，但是都包括诊断年龄、体重指数、吸烟状况、诊断途径、激素受体状态、癌症分期和乳腺癌分级。

　　在全部模型中，随机效应荟萃分析汇总预测区分度一致性指数依次为：

多因素比例风险回归模型：0.858（95%置信区间：0.853～0.864，95%预测区间：0.843～0.873）并且校准曲线显示校准度尚可接受。
竞争风险回归模型：0.849（95%置信区间：0.839～0.859，95%预测区间：0.821～0.876）并且缺乏对汇总指标系统性误校准的证据。
神经网络：0.847（95%置信区间：0.835～0.858，95%预测区间：0.816～0.878）
XGBoost：0.821（95%置信区间：0.813～0.828，95%预测区间：0.80～0.837）

　　机器学习模型与统计学回归模型相比，误校准模式较复杂、区域和分期相关性能变化较大。

　　决策曲线分析表明，该研究测试的多因素比例风险回归模型和竞争风险回归模型与2种机器学习方法相比，临床实用性可能更高。

　　因此，该研究结果表明，对于任何分期的乳腺癌女性，根据该数据集可用的预测因素，统计学回归模型与机器学习方法相比，性能更好且更一致，可能值得进一步评估潜在临床用途，例如分层随访。

　　不过，该研究数据全部来自基层医疗保健机构，并未考虑高风险基因突变、多基因或多基因组学数据以及乳房密度，这些数据可能提供额外的预测价值；对乳腺癌家族史等因素临床编码的依赖，可能偏向于那些更显著的家谱因素；此外，由于未记录阳性家族史者被假定为无家族史，故可能发生错误分类；处方数据也可能出现错误分类偏差，因为并非全部药物都由药剂师调配或由患者自己个人服用。中位仅4.16年随访时间对于10年乳腺癌相关死亡风险预测也偏少，故有必要进一步随访。

　　看来，龙游浅水遭虾戏，虎落平阳被犬欺，得志猫儿雄过虎，落毛凤凰不如鸡，牛津大学的人工智能到了基层也寂寞，还是不如传统方法经济、简便又实用，有些地方可能连回归模型或临床指南也不需要，医疗决策完全由科室主任根据个人经验拍板即可。

BMJ. 2023 May 10;381:e073800. IF: 93.333

Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study.

Ash Kieran Clift, David Dodwell, Simon Lord, Stavros Petrou, Michael Brady, Gary S Collins, Julia Hippisley-Cox.

Cancer Research UK Oxford Centre, Oxford, UK; University of Oxford, Oxford, UK.

OBJECTIVE: To develop a clinically useful model that estimates the 10 year risk of breast cancer related mortality in women (self-reported female sex) with breast cancer of any stage, comparing results from regression and machine learning approaches.

DESIGN: Population based cohort study.

SETTING: QResearch primary care database in England, with individual level linkage to the national cancer registry, Hospital Episodes Statistics, and national mortality registers.

PARTICIPANTS: 141765 women aged 20 years and older with a diagnosis of invasive breast cancer between 1 January 2000 and 31 December 2020.

MAIN OUTCOME MEASURES: Four model building strategies comprising two regression (Cox proportional hazards and competing risks regression) and two machine learning (XGBoost and an artificial neural network) approaches. Internal-external cross validation was used for model evaluation. Random effects meta-analysis that pooled estimates of discrimination and calibration metrics, calibration plots, and decision curve analysis were used to assess model performance, transportability, and clinical utility.

RESULTS: During a median 4.16 years (interquartile range 1.76-8.26) of follow-up, 21688 breast cancer related deaths and 11454 deaths from other causes occurred. Restricting to 10 years maximum follow-up from breast cancer diagnosis, 20367 breast cancer related deaths occurred during a total of 688564.81 person years. The crude breast cancer mortality rate was 295.79 per 10000 person years (95% confidence interval 291.75 to 299.88). Predictors varied for each regression model, but both Cox and competing risks models included age at diagnosis, body mass index, smoking status, route to diagnosis, hormone receptor status, cancer stage, and grade of breast cancer. The Cox model's random effects meta-analysis pooled estimate for Harrell's C index was the highest of any model at 0.858 (95% confidence interval 0.853 to 0.864, and 95% prediction interval 0.843 to 0.873). It appeared acceptably calibrated on calibration plots. The competing risks regression model had good discrimination: pooled Harrell's C index 0.849 (0.839 to 0.859, and 0.821 to 0.876, and evidence of systematic miscalibration on summary metrics was lacking. The machine learning models had acceptable discrimination overall (Harrell's C index: XGBoost 0.821 (0.813 to 0.828, and 0.805 to 0.837); neural network 0.847 (0.835 to 0.858, and 0.816 to 0.878)), but had more complex patterns of miscalibration and more variable regional and stage specific performance. Decision curve analysis suggested that the Cox and competing risks regression models tested may have higher clinical utility than the two machine learning approaches.

CONCLUSION: In women with breast cancer of any stage, using the predictors available in this dataset, regression based methods had better and more consistent performance compared with machine learning approaches and may be worthy of further evaluation for potential clinical use, such as for stratified follow-up.

DOI: 10.1136/bmj-2022-073800