Skip to content

Cell Hashing Demultiplexing

Chen Weng edited this page Aug 25, 2023 · 2 revisions

Cell Hashing Demultiplexing

If the Biolegend cell hashing is applied, please following the steps below for demultiplxing

Cell hashing library structure:

  • Read 1 is the same as 10X scRNAseq library. First 16nt is cell barcode, and the following 12nt is UMI.
  • Read 2, the first 15nt is the barcode image (15)

The barcode information can be found on Biolegend (eg, A0257)

Some commonly used hash barcodes

Name1 Name2 Barcode
A0251 Hashtag1 GTCAACTCTTTAGCG
A0252 Hashtag2 TGATGGCCTATTGGG
A0253 Hashtag3 TTCCGCCTCTCTTTG
A0254 Hashtag4 AGTAAGTTCAGCGTA
A0255 Hashtag5 AAGTATCGTTTCGCA
A0255 Hashtag6 GGTTGCCAGATGTCA
A0257 Hashtag7 TGTCTTTCCTGCCAG
A0258 Hashtag8 CTCCTCTGCAATTAC
A0259 Hashtag9 CAGTAGTCACGGTCA
A0260 Hashtag10 ATTGACCCGCGTTAG
A0262 Hashtag12 TAACGACCAGCCATA
A0263 Hashtag13 AAATCTCTCAGGCTC
A0264 Hashtag14 CTGTATGTCCGATTG
A0265 Hashtag15 TAAGATTCAGAGCGA

A working folder should include the following

├── barcodes.tsv.16bp
├── HashDic.csv
├── Hash_XX_L001_R1_001.fastq.gz
├── Hash_XX_L001_R2_001.fastq.gz

Prepare two files: barcodes.tsv.16bp and HashDic.csv

zcat ../CellRanger/xx/outs/filtered_feature_bc_matrix/barcodes.tsv.gz | awk '{print substr(,1,16)}' > barcodes.tsv.16bp

Manually create HashDic.csv which is a decode table where first column is the barcode, second column is the sample name. For example:

TGTCTTTCCTGCCAG,Sample-1
CTCCTCTGCAATTAC,Sample-2
CAGTAGTCACGGTCA,Sample-3
ATTGACCCGCGTTAG,Sample-4
TAACGACCAGCCATA,Sample-5
TAAGATTCAGAGCGA,Sample-6

Download REDEEM-V git clone https://github.com/chenweng1991/REDEEM-V.git Assign path REDEEM=ThePathToREDEEM-V #The loacation where the REDEEM-V is downloaded to

Run the script below for demultiplexing

python3 $REDEEMV/JohnnyCellHash/RunCellHash.py $REDEEMV Name Hash_xxx_L001_R1_001.fastq.gz Hash_xxx_L001_R2_001.fastq.gz

After running the folder will look like below

├── barcodes.tsv.16bp
├── HashDic.csv
├── Hash_xxx_L001_R1_001.fastq.gz
├── Hash_xxx_L001_R2_001.fastq.gz
├── log
├── plot.pdf
└── pymulti
    ├── Name_calls.tsv
    ├── Name_reads.p
    ├── umicount.csv
    └── umicount_hist.pdf

Name_calls.tsv is a table of probability of a assigned barcode for each single cell. We can filter by significance (sig). This table will be used to demultiplex and for the downstream analysis.

Clone this wiki locally