-
Notifications
You must be signed in to change notification settings - Fork 3
Cell Hashing Demultiplexing
If the Biolegend cell hashing is applied, please following the steps below for demultiplxing
Cell hashing library structure:
- Read 1 is the same as 10X scRNAseq library. First 16nt is cell barcode, and the following 12nt is UMI.
- Read 2, the first 15nt is the barcode
The barcode information can be found on Biolegend (eg, A0257)
Some commonly used hash barcodes
Name1 | Name2 | Barcode |
---|---|---|
A0251 | Hashtag1 | GTCAACTCTTTAGCG |
A0252 | Hashtag2 | TGATGGCCTATTGGG |
A0253 | Hashtag3 | TTCCGCCTCTCTTTG |
A0254 | Hashtag4 | AGTAAGTTCAGCGTA |
A0255 | Hashtag5 | AAGTATCGTTTCGCA |
A0255 | Hashtag6 | GGTTGCCAGATGTCA |
A0257 | Hashtag7 | TGTCTTTCCTGCCAG |
A0258 | Hashtag8 | CTCCTCTGCAATTAC |
A0259 | Hashtag9 | CAGTAGTCACGGTCA |
A0260 | Hashtag10 | ATTGACCCGCGTTAG |
A0262 | Hashtag12 | TAACGACCAGCCATA |
A0263 | Hashtag13 | AAATCTCTCAGGCTC |
A0264 | Hashtag14 | CTGTATGTCCGATTG |
A0265 | Hashtag15 | TAAGATTCAGAGCGA |
A working folder should include the following
├── barcodes.tsv.16bp
├── HashDic.csv
├── Hash_XX_L001_R1_001.fastq.gz
├── Hash_XX_L001_R2_001.fastq.gz
Prepare two files: barcodes.tsv.16bp and HashDic.csv
zcat ../CellRanger/xx/outs/filtered_feature_bc_matrix/barcodes.tsv.gz | awk '{print substr(,1,16)}' > barcodes.tsv.16bp
Manually create HashDic.csv which is a decode table where first column is the barcode, second column is the sample name. For example:
TGTCTTTCCTGCCAG,Sample-1
CTCCTCTGCAATTAC,Sample-2
CAGTAGTCACGGTCA,Sample-3
ATTGACCCGCGTTAG,Sample-4
TAACGACCAGCCATA,Sample-5
TAAGATTCAGAGCGA,Sample-6
Download REDEEM-V git clone https://github.com/chenweng1991/REDEEM-V.git Assign path REDEEM=ThePathToREDEEM-V #The loacation where the REDEEM-V is downloaded to
Run the script below for demultiplexing
python3 $REDEEMV/JohnnyCellHash/RunCellHash.py $REDEEMV Name Hash_xxx_L001_R1_001.fastq.gz Hash_xxx_L001_R2_001.fastq.gz
After running the folder will look like below
├── barcodes.tsv.16bp
├── HashDic.csv
├── Hash_xxx_L001_R1_001.fastq.gz
├── Hash_xxx_L001_R2_001.fastq.gz
├── log
├── plot.pdf
└── pymulti
├── Name_calls.tsv
├── Name_reads.p
├── umicount.csv
└── umicount_hist.pdf
Name_calls.tsv is a table of probability of a assigned barcode for each single cell. We can filter by significance (sig). This table will be used to demultiplex and for the downstream analysis.