
paper

workflow
article: 10.1186/s13059-024-03356-x, Genome Biology, 2024
challenges
sparse and noisy signal <- low copy number
no fixed feature sets
Packages and algorithms
Signac: LSI + SVD
ArchR: LSI
cisTopic: LDA
SnapATAC: diffusion map
SnapATAC2: Laplacian epigenmaps
BROCKMAN: gap k-mer frequency
datasets
6 published datasets of divergent sizes and sequencing protocols and from different tissues and species
Assesment
cell embeddings, graph structure, and final partitions <- 10 metrics
Results
Feature aggregation: SnapATAC & SnapATAC2 > LSI-based methods.
For datasets with complex cell-type structures, SnapATAC and SnapATAC2 were the most effective.
For large datasets, SnapATAC2 and ArchR were the most scalable in terms of computational efficiency.
