# Benchmarking SignacFast with single cell flow-sorted data

#### 2021-02-26

This vignette shows how to use SignacFast to annotate flow-sorted synovial cells by integrating SignacX with Seurat. We start with raw counts from this publication.

ReadCelseq <- function(counts.file, meta.file) {
gns <- E\$gene
E = E[, -1]
E = Matrix::Matrix(as.matrix(E), sparse = TRUE)
rownames(E) <- gns
E
}

counts.file = "./fls/celseq_matrix_ru10_molecules.tsv.gz"
meta.file = "./fls/celseq_meta.immport.723957.tsv"

E = ReadCelseq(counts.file = counts.file, meta.file = meta.file)

# filter data based on depth and number of genes detected
kmu = Matrix::colSums(E != 0)
kmu2 = Matrix::colSums(E)
E = E[, kmu > 200 & kmu2 > 500]

# filter by mitochondrial percentage
logik = grepl("^MT-", rownames(E))
MitoFrac = Matrix::colSums(E[logik, ])/Matrix::colSums(E) * 100
E = E[, MitoFrac < 20]

## Seurat

library(Seurat)

Create a Seurat object, and then perform SCTransform normalization. Note:

• You can use the legacy functions here (i.e., NormalizeData, ScaleData, etc.), use SCTransform or any other normalization method (including no normalization). We did not notice a significant difference in cell type annotations with different normalization methods.
• We think that it is best practice to use SCTransform, but it is not a necessary step. Signac will work fine without it.
# load data
synovium <- CreateSeuratObject(counts = E, project = "FACs")

# run sctransform
synovium <- SCTransform(synovium)

Perform dimensionality reduction by PCA and UMAP embedding. Note:

• Signac actually needs these functions since it uses the nearest neighbor graph generated by Seurat.
# These are now standard steps in the Seurat workflow for visualization and clustering
synovium <- RunPCA(synovium, verbose = FALSE)
synovium <- RunUMAP(synovium, dims = 1:30, verbose = FALSE)
synovium <- FindNeighbors(synovium, dims = 1:30, verbose = FALSE)

## SignacX

library(SignacX)

Generate Signac labels for the Seurat object. Note:

• Optionally, you can do parallel computing by setting num.cores > 1 in the Signac function.
labels <- Signac(synovium, num.cores = 4)
celltypes = GenerateLabels(labels, E = synovium)

Sometimes, training the neural networks takes a lot of time. The above classification took 27 minutes. To make a faster method, we implemented SignacFast which uses pre-trained models. Note:

• SignacFast uses an ensemble of 1,800 pre-calculated neural networks using the GenerateModels function together with the training_HPCA reference data set.
• Features that are absent from the single cell data and present in the neural network are set to zero.
# Run SignacFast
labels_fast <- SignacFast(synovium, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = synovium)

Compare results:

Celltypes:
B MPh NonImmune Plasma.cells TNK Unclassified
B 681 0 0 0 0 0
MPh 0 835 0 0 0 68
NonImmune 0 0 2487 0 0 0
Plasma.cells 0 0 0 263 0 6
TNK 0 0 0 0 1768 0
Unclassified 0 13 7 0 0 174
Cellstates:
B.memory B.naive Fibroblasts Macrophages Mon.Classical NonImmune Plasma.cells T.CD4.memory T.CD4.naive T.CD8.em T.CD8.naive T.regs Unclassified
B.memory 489 4 0 0 0 0 0 0 0 0 0 0 1
B.naive 4 184 0 0 0 0 0 0 0 0 0 0 0
DC 0 0 0 4 3 0 0 0 0 0 0 0 5
Fibroblasts 0 0 2110 0 0 136 0 0 0 0 0 0 0
Macrophages 0 0 0 662 33 1 0 0 0 0 0 0 73
Mon.Classical 0 0 0 23 93 0 0 0 0 0 0 0 1
NonImmune 0 0 74 0 0 166 0 0 0 0 0 0 2
Plasma.cells 0 0 0 0 0 1 259 0 0 0 0 0 8
T.CD4.memory 0 0 0 0 0 0 0 504 112 17 29 15 0
T.CD4.naive 0 0 0 0 0 0 0 0 309 4 18 0 1
T.CD8.em 0 0 0 0 1 0 0 7 4 574 1 0 2
T.CD8.naive 0 0 0 0 0 0 0 2 1 0 26 2 0
T.regs 0 0 0 0 0 0 0 0 27 1 2 106 0
Unclassified 0 0 1 12 3 5 0 0 0 1 0 0 179

Save results

saveRDS(synovium, file = "fls/seurat_obj_amp_synovium.rds")
saveRDS(celltypes, file = "fls/celltypes_amp_synovium.rds")
saveRDS(celltypes_fast, file = "fls/celltypes_fast_amp_synovium_celltypes.rds")

