1、概述

straightforward annotation approach is to compare the single-cell expression profiles with previously annotated reference datasets.
其中最关键的就是reference datasets参考数据
关于参考数据，本质上就是sce对象，其中colData slot 含有cell type 的 label信息
本文笔记主要基于SingleR包的注释方法，而且该包也内置了许多 reference data可供使用。

SingleR 内置数据集概况

2、SingleR注释

（1）基本方法

#加载待注释sce
load("fluidigm.clust.RData")
fluidigm.clust

#准备合适的ref data
library(SingleR)
ref <- BlueprintEncodeData()
ref

pred <- SingleR(test=fluidigm.clust, ref=ref, labels=ref$label.main)
#pred <- SingleR(test=fluidigm.clust, ref=ref, labels=ref$label.fine)
table(pred$labels)

ref$label.fine provides more resolution at the cost of speed and increased ambiguity in the assignments.
简单来说就是ref $label.main分得粗，ref$ label.fine分得细

2-1

fluidigm.clust
colnames(colData(fluidigm.clust))
fluidigm.clust$celltype <- pred$labels
table(fluidigm.clust$celltype)
plotReducedDim(fluidigm.clust, dimred="UMAP", colour_by="celltype")
fluidigm.anno <- fluidigm.clust
save(fluidigm.anno,file = "fluidigm.anno.Rdata")

（2）visualization digonosis

heatmap
每一列为细胞与细胞类型（行）的比对情况，列标注取比对值最高对应的细胞类型

plotScoreHeatmap(pred)

plotScoreHeatmap(pred)

jitter and violin plots
showing assignment scores or related values for all cells across one or more labels.

sum(is.na(pred$pruned.labels)) 
#无 pruned cell
plotScoreDistribution(pred)
#black point for each cell
#grey area for cells that were assigned to the label.
#yellow area for other cells not assigned to the label.

plotScoreDistribution(pred)

最后还可以比较下已知注释分类与singler预测分类的关系

tab <- table(Assigned=pred$pruned.labels, Cluster=fluidigm.clust$Cluster2)
tab
# Adding a pseudo-count of 10 to avoid strong color jumps with just 1 cell.
library(pheatmap)
pheatmap(log2(tab+10), color=colorRampPalette(c("white", "blue"))(101))

ref data from other source

代表性的就是scRNAseq contains many single-cell datasets, many of which contain the authors’ manual annotations.可以用来当做ref data。

library(scRNAseq)
sceM <- MuraroPancreasData()
sceM
#此外要注意的是基因名为Ensemble ID
table(sceM$label)

sceM

待分类数据

#ID转换：symbol→ensemble
library(AnnotationHub)
edb <- AnnotationHub()[["AH73881"]]
gene.symb <- sub("__chr.*$", "", rownames(sceG))
gene.ids <- mapIds(edb, keys=gene.symb, 
                   keytype="SYMBOL", column="GENEID")
keep <- !is.na(gene.ids) & !duplicated(gene.ids)
sceG <- sceG[keep,]
rownames(sceG) <- gene.ids[keep]
counts(sceG)[1:4,1:4]
sceG

注释

pred.sceG <- SingleR(test=sceG, ref=sceM, 
                      labels=sceM$label, de.method="wilcox")
table(pred.sceG$labels)

3、其它注释方法

简单介绍，不再操作，详见原文

（1）Assigning cell labels from gene sets

A related strategy is to explicitly identify sets of marker genes that are highly expressed in each individual cell.
简单来说是比较特定细胞代表基因特征与待分类sce的每一个细胞的表达概况的相似度，以AUC曲线为指标确定最符合的cell type

（2）Assigning cluster labels from markers

Yet another strategy for annotation is to perform a gene set enrichment analysis on the marker genes defining each cluster.
This identifies the pathways and processes that are (relatively) active in each cluster based on upregulation of the associated genes compared to other clusters.
简单来说，就是对每个clust的marker基因进行go/kegg点的富集分析，通过对应结果的discription确定cell type

以上是第十二章Clustering部分的简单流程笔记，主要学习了基于SingleR的cell type注释方法。其它方式详见原文Chapter 12 Cell type annotation
本系列笔记基于OSCA全流程的大致流程梳理，详细原理可参考原文。如有错误，恳请指正！
此外还有刘小泽老师整理的全文翻译笔记，详见目录。

7、Cell type annotation

7、Cell type annotation

1、概述

2、SingleR注释

（1）基本方法

（2）visualization digonosis

ref data from other source

3、其它注释方法

（1）Assigning cell labels from gene sets

（2）Assigning cluster labels from markers