在上篇文章中,我们初步探索了PHATE的使用方法,发现它在揭示一些连续分化过程中不同细胞状态之间的微小局部差异具有很好的效果,同时也能保留细胞全局的整体结构。在本节教程中,我们将复现演示近期发表在Science Immunology期刊上的一篇文章的结果,进一步学习PHATE的相关使用方法。
文章信息 文章题目:A reservoir of stem-like CD8+ T cells in the tumor-draining lymph node preserves the ongoing anti-tumor immune response 期刊:Science Immunology 日期:2021年9月2日 DOI:https:///10.1126/sciimmunol.abg7836
文章结果图:
本文复现图:
文章数据下载 文章处理后的基因表达矩阵文件存放在GEO数据库中,检索号为 GSE182509 [1] ,这里只提供了pkl格式的表达矩阵,需要用python的pickle包进行读取,我已将其转换为TSV文件存放在我的百度云盘中,有需要的可以下载使用。
链接:https://pan.baidu.com/s/1IoSIYoEfTzZarLWXWvgvzg
提取码:gkd9
加载示例数据 # 导入所需的python包 import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport phateimport scprepimport magic# 加载示例数据 chronic = scprep.io.load_tsv("GSM5530565_Chronic_LCMV_processed.matrix.txt" ) acute = scprep.io.load_tsv("GSM5530564_Acute_LCMV_processed.matrix.txt" ) tumor8 = scprep.io.load_tsv("GSM5530560_Lung_week8_processed.matrix.txt" ) dln8 = scprep.io.load_tsv("GSM5530561_LN_week8_processed.matrix.txt" ) tumor17 = scprep.io.load_tsv("GSM5530562_Lung_week17_processed.matrix.txt" ) dln17 = scprep.io.load_tsv("GSM5530563_LN_week17_processed.matrix.txt" )# 查看示例数据 chronic.head()# Mrpl15 (ENSMUSG00000033845) ... CAAA01147332.1 (ENSMUSG00000095742) #0 ... #AAACCTGAGAGTAAGG-1 0.0 ... 0.000000 #AAACCTGGTGTGAAAT-1 0.0 ... 0.000000 #AAACGGGAGGCTCATT-1 0.0 ... 0.000000 #AAACGGGGTAGCTTGT-1 0.0 ... 0.000000 #AAACGGGGTCGAATCT-1 0.0 ... 0.552073 #[5 rows x 11595 columns] chronic.shape#(1185, 11595) acute.shape#(10768, 12960) tumor8.shape#(806, 11749) dln8.shape#(1742, 12116) tumor17.shape#(731, 11150) dln17.shape#(876, 11595)
数据质控预处理和数据归一化 由于下载的数据是预先已经进行质控过滤和归一化处理的,这里将不再进行处理。详细用法见上期 [使用PHATE进行单细胞高维数据的可视化 ]
使用PHATE进行低维嵌入降维可视化 ### analysis for chronic sample ### #Embedding Data Using PHATE # Instantiating the PHATE estimator phate_operator = phate.PHATE(n_jobs=-2 ) Y_phate = phate_operator.fit_transform(chronic) Y_phate#array([[ -5.34563999, 6.11670333], # [ 29.8771147 , 8.33029219], # [ 9.44218856, -25.94568946], # ..., # [-10.86402299, -1.15554 ], # [-14.65615727, -1.03794057], # [-16.53150258, 4.50993521]])
细胞聚类分群 # cell clustering clusters = phate.cluster.kmeans(phate_operator, n_clusters=8 ) clusters#array([0, 2, 1, ..., 4, 4, 4], dtype=int32) # 聚类结果可视化 scprep.plot.scatter2d(Y_phate, c=clusters, figsize=(8 ,7 ), cmap="Spectral" , ticks=False , label_prefix="PHATE" , title= "Chronic LCMV" ) plt.savefig("plot_phate_chronic_2d_by_cluster.png" )
使用MAGIC进行数据填充 #Creating the MAGIC operator magic_op = magic.MAGIC()# Running MAGIC for all genes chronic_magic = magic_op.fit_transform(chronic, genes='all_genes' ) chronic_magic.head()# Mrpl15 (ENSMUSG00000033845) ... CAAA01147332.1 (ENSMUSG00000095742) #0 ... #AAACCTGAGAGTAAGG-1 0.150924 ... 0.076723 #AAACCTGGTGTGAAAT-1 0.153466 ... 0.067584 #AAACGGGAGGCTCATT-1 0.146601 ... 0.064838 # rename columns names chronic_magic.columns = [i.split(" " )[0 ] for i in chronic_magic.columns.tolist()]# marker基因可视化 markers = ["Sell" , "Ccr7" , "Tcf7" , "Slamf6" , "Xcl1" , "Il7r" , "Eomes" , "Tbx21" , "Gzmb" , "Prf1" , "Pdcd1" , "Havcr2" , "Cd101" , "Cx3cr1" , "Cxcr6" ]for marker in markers: # 2d plot scprep.plot.scatter2d(Y_phate, c=chronic_magic[marker], figsize=(8 ,7 ), cmap="Reds" , ticks=False , label_prefix="PHATE" , title=marker + " magic expression" ) plt.savefig("plot_chronic_magic_marker_2d_" + marker + ".png" )
根据这些marker基因的表达情况,我们将不同的cluster进行细胞类型的注释,得到以下的细胞注释结果。
Tex: Pdcd1, Havcr2, Cd101 多样本合并分析 ### analysis for combined five sample ### # Merge all datasets and create a vector representing the time point of each sample alldata = [chronic,tumor8,tumor17,dln8,dln17] EBT_counts, sample_labels = scprep.utils.combine_batches( alldata, ["Chronic" ,"Early Tumor" ,"Late Tumor" ,"Early LN" ,"Late LN" ], append_to_cell_names=True )del alldata # removes objects from memory EBT_counts.head()# Mrpl15 (ENSMUSG00000033845) ... CAAA01147332.1 (ENSMUSG00000095742) #AAACCTGAGAGTAAGG-1_Chronic 0.0 ... 0.000000 #AAACCTGGTGTGAAAT-1_Chronic 0.0 ... 0.000000 #AAACGGGAGGCTCATT-1_Chronic 0.0 ... 0.000000 #AAACGGGGTAGCTTGT-1_Chronic 0.0 ... 0.000000 #AAACGGGGTCGAATCT-1_Chronic 0.0 ... 0.552073 #[5 rows x 10246 columns] EBT_counts.shape#(5340, 10246) sample_labels#AAACCTGAGAGTAAGG-1_Chronic Chronic #AAACCTGGTGTGAAAT-1_Chronic Chronic #AAACGGGAGGCTCATT-1_Chronic Chronic #AAACGGGGTAGCTTGT-1_Chronic Chronic #AAACGGGGTCGAATCT-1_Chronic Chronic # ... #Name: sample_labels, Length: 5340, dtype: object
PHATE降维可视化 #Embedding Data Using PHATE # Instantiating the PHATE estimator phate_operator = phate.PHATE(n_jobs=-2 ) Y_phate = phate_operator.fit_transform(EBT_counts) Y_phate#array([[ 79.05651647, 15.42592929], # [ 20.72815444, 25.65566379], # [ 90.57712893, -4.77917562], # ..., # [-52.39011592, 39.20142516], # [-28.51731009, -8.66499775], # [-46.00734805, -17.37265621]]) scprep.plot.scatter2d(Y_phate, c=sample_labels, figsize=(10 ,8 ), cmap="Spectral" , ticks=False , label_prefix="PHATE" ) plt.savefig("plot_phate_2d_by_sample.png" )
# 3D visualization phate_operator.set_params(n_components=3 ) Y_phate_3d = phate_operator.transform() Y_phate_3d#array([[ 77.85894712, 13.21732854, -11.29708591], # [ 17.23971363, 19.47621167, -27.54015408], # [ 87.69788489, 4.0231593 , 13.16082805], # ..., # [-46.46850791, 42.68065699, -2.93836416], # [-20.53357918, -4.9821892 , -22.70265013], # [-36.18351133, -9.56494547, -29.67206442]]) scprep.plot.scatter3d(Y_phate_3d, c=sample_labels, figsize=(8 ,6 ), cmap="Spectral" , ticks=False , label_prefix="PHATE" ) plt.savefig("plot_phate_3d_by_sample.png" )
细胞聚类分群 # cell clustering clusters = phate.cluster.kmeans(phate_operator, n_clusters=12 ) clusters#array([8, 6, 1, ..., 2, 4, 4], dtype=int32) # save meta data meta = pd.merge(pd.DataFrame(sample_labels),pd.DataFrame(clusters,index=sample_labels.index,columns=["cluster" ]),left_index=True ,right_index=True ) meta.head()# sample_labels cluster #AAACCTGAGAGTAAGG-1_Chronic Chronic 8 #AAACCTGGTGTGAAAT-1_Chronic Chronic 6 #AAACGGGAGGCTCATT-1_Chronic Chronic 1 #AAACGGGGTAGCTTGT-1_Chronic Chronic 8 #AAACGGGGTCGAATCT-1_Chronic Chronic 8 meta.to_csv("metadata.csv" ) scprep.plot.scatter3d(Y_phate_3d, c=clusters, figsize=(8 ,6 ), cmap="Spectral" , ticks=False , label_prefix="PHATE" ) plt.savefig("plot_phate_3d_by_cluster.png" )
scprep.plot.scatter2d(Y_phate, c=clusters, figsize=(8 ,6 ), cmap="Spectral" , ticks=False , label_prefix="PHATE" ) plt.savefig("plot_phate_2d_by_cluster.png" )
# to save as a gif: scprep.plot.rotate_scatter3d(Y_phate_3d, c=sample_labels, figsize=(8 ,6 ), cmap="Spectral" , ticks=False , label_prefix="PHATE" , filename="phate.gif" )
使用MAGIC进行数据填充 # rename columns names EBT_counts.columns = [i.split(" " )[0 ] for i in EBT_counts.columns.tolist()]# MAGIC imputation #Creating the MAGIC operator magic_op = magic.MAGIC()#Running MAGIC with gene selection #bmmsc_magic = magic_op.fit_transform(bmmsc_data, genes=["Mpo", "Klf1", "Ifitm1"]) #bmmsc_magic.head() #Visualizing gene-gene relationships #fig, (ax1, ax2) = plt.subplots(1,2, figsize=(16, 6)) #scprep.plot.scatter(x=bmmsc_data['Mpo'], y=bmmsc_data['Klf1'], c=bmmsc_data['Ifitm1'], ax=ax1, # xlabel='Mpo', ylabel='Klf1', legend_title="Ifitm1", title='Before MAGIC') #scprep.plot.scatter(x=bmmsc_magic['Mpo'], y=bmmsc_magic['Klf1'], c=bmmsc_magic['Ifitm1'], ax=ax2, xlabel='Mpo', ylabel='Klf1', legend_title="Ifitm1", title='After MAGIC') #plt.tight_layout() #plt.show() # Running MAGIC for all genes EBT_counts_magic = magic_op.fit_transform(EBT_counts, genes='all_genes' ) EBT_counts_magic.head() markers = ["Sell" , "Ccr7" , "Tcf7" , "Slamf6" , "Xcl1" , "Il7r" , "Eomes" , "Tbx21" , "Gzmb" , "Prf1" , "Pdcd1" , "Havcr2" , "Cd101" , "Cx3cr1" , "Cxcr6" ]for marker in markers: # 2d plot scprep.plot.scatter2d(Y_phate, c=EBT_counts_magic[marker], figsize=(8 ,6 ), cmap="Reds" , ticks=False , label_prefix="PHATE" , title=marker + " magic expression" ) plt.savefig("plot_magic_marker_2d_" + marker + ".png" ) # 3d plot scprep.plot.scatter3d(Y_phate_3d, c=EBT_counts_magic[marker], figsize=(8 ,6 ), cmap="Reds" , ticks=False , label_prefix="PHATE" , title=marker + " magic expression" ) plt.savefig("plot_magic_marker_3d_" + marker + ".png" )
基因差异表达分析 # Differential analysis # DE by samples de_samples = scprep.stats.differential_expression_by_cluster(EBT_counts_magic,clusters=sample_labels,direction="up" ) de_samples de_samples["Chronic" ]for key,value in de_samples.items(): print(value.head(n=10 ))# save DE data de_samples["Chronic" ].to_csv("DEs_Chronic.csv" ) de_samples["Early Tumor" ].to_csv("DEs_Early_Tumor.csv" ) de_samples["Early LN" ].to_csv("DEs_Early_LN.csv" )# DE by clusters de_clusters = scprep.stats.differential_expression_by_cluster(EBT_counts_magic,clusters=clusters,direction="up" ) de_clusters de_clusters[0 ]for key,value in de_clusters.items(): print(value.head(n=10 ))
文中链接 [1] GSE182509: https://www.ncbi.nlm./geo/query/acc.cgi?acc=GSE182509