Skip to content

Multifuricating Developmental cEll Lineage Tree Alignment (mDELTA) algorithm

Notifications You must be signed in to change notification settings

Chenjy0212/mdelta_full

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

49 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

mDELTA: an algorithm for multifuricating Developmental cEll Lineage Tree Alignment.

PyPI - Python Version Jupyter r

  • MDELTA is an algorithm for Multifuricating Developmental cEll Lineage Tree Alignment. In essence, it compares two rooted, unordered, tip-labeled trees, and finds the best global/local correspondence between the nodes. The mDELTA program is designed for analyzing developmental cell lineage trees reconstructed through single-cell DNA barcoding (such as done by scGESTALT or SMALT, while greater cellular coverage is expected to yield more meaningful MDELTA alignments).

  • Except for dealing with cell lineage trees instead of biological sequences, MDELTA is conceptually similar to sequence alignment. It helps quantify similarity among different lineage trees, disentangle the consensus and variation, find recurrent motifs, and facilitate comparative/evolutionary analyses.

  • Also included in this repository are Python/R scripts for statistical analyses and visualization of MDELTA results, which facilitates their biological interpretation.

  • MDELTA was developed by Jingyu Chen (EeWhile) under the supervision of Professor Jian-Rong Yang at the Zhongshan School of Medicine of Sun Yat-Sen University in China.

For more details, please refer to the following content:

  • You can try to learn about the latest examples of dynamic MDELTAUI running results, which will help you better understand the purpose of mDERTA โš™๏ธ http://eewhile.cn/mdelta_ui

mdelta_ui

  • You can obtain a separate Python package for mDELTA for further development(Note that the version of this package is relatively old and many features have not been updated). https://github.com/Chenjy0212/mdelta

If you have any questions, please contact me. My contact information is located at the bottom

Readme Card Readme Card Readme Card

Before you begin โš ๏ธ

  1. It's best to use jupyter as your menu display tool, or you can install the corresponding plugin in vscode to use it.
  2. You can install the required packages automatically by running mdelta_menu.ipynb, or you can manually run the package_manger.py to install the required packages.

    Tips : The first download time may be relatively long. If any installation package fails, please go ahead and install a package that is suitable for your current running environment (Python and R)

  3. You should first install stable versions of Python and R language to facilitate subsequent operations

Quick start ๐Ÿ’ป

Open the Munu ๐Ÿ“–

Open jupyter and run mdelta_menu.ipynb, run the following code to automatically install the necessary Python packages and open the mdelta function menu.

%run package_manger.py
%run ./mdelta/mydefault.py
#from mdelta.mydefault import *
myargs = get_default()

่œๅ•

Running ๐Ÿš€

Select, click, and swipe to achieve automatic parameter acquisition. The relevant parameters are detailed in the "Parameter parsing table" section of the menu. Continuing to run the next section of code will obtain the desired result.

%run $mdelta $TREE $TREE2 -nt $N2T -nt2 $N2T2 -xsd $XScoreFile -lsd $LScoreFile -t $top -ma $ma -mi $mi -p $p -T $Tqdm -n $notebook -mg $mgg -x $diffs -o $output -P $PERM -c $cpu -pper $pper
!Rscript $match_tree $mdelta_json $XScoreFile $output $LScoreFile $mi $ma
%run $network $mdelta_json $output
!Rscript $densitree $mdelta_json $output $dcc
!Rscript $da $mdelta_json $output $dcc
show more ๐Ÿ‘ˆ๐Ÿ‘ˆ๐Ÿ‘ˆ CLICK

from itertools import product
TREE,TREE2,N2T,N2T2,XScoreFile,LScoreFile,top,mavv,mavvstep,mivv,mivvstep,ps,psstep,tqdm,n,mg,mgstep,xs,xsstep,o,PERM,cpu,mdelta,match_tree,network,densitree,da,dcc,pper = get_listvalue(myargs.values())
output = get_output(o)
notebook, Tqdm = TF_to_10(n, tqdm)
for ma, mi, p, mgg, diffs in product(forlist(mavv[0], mavv[1], mavvstep),
                                     forlist(mivv[0], mivv[1], mivvstep),
                                     forlist(ps[0], ps[1], psstep),
                                     forlist(mg[0], mg[1], mgstep),
                                     forlist(xs[0], xs[1], xsstep)):
    %run $mdelta $TREE $TREE2 -nt $N2T -nt2 $N2T2 -xsd $XScoreFile -lsd $LScoreFile -t $top -ma $ma -mi $mi -p $p -T $Tqdm -n $notebook -mg $mgg -x $diffs -o $output -P $PERM -c $cpu -pper $pper
    if not PERM > 0:
        mdelta_json = output + '{}_{}_top{}_diff{}_pv{}_miv{}_mav{}_mg{}.json'.format(os.path.basename(TREE).split('.')[0], os.path.basename(TREE2).split('.')[0], top, str(diffs),  str(p), str(mi), str(ma), str(mgg))
        !Rscript $match_tree $mdelta_json $XScoreFile $output $LScoreFile $mi $ma
        %run $network $mdelta_json $output
        !Rscript $densitree $mdelta_json $output $dcc
        !Rscript $da $mdelta_json $output $dcc

Help ๐Ÿงญ

Running the following command can obtain the parsing of relevant parameters

mDELTA.py -h
show more ๐Ÿ‘ˆ๐Ÿ‘ˆ๐Ÿ‘ˆ CLICK

Positional arguments ๐Ÿ‘

Parameter Type Description
TreeSeqFile [path/filename] A text file storing cell lineage tree #1 in newick format. Tips can be labeled by name or cell type. Branch lengths should be removed.
TreeSeqFile2 [path/filename] A text file storing cell lineage tree #2 in newick format. Tips can be labeled by name or cell type. Branch lengths should be removed.

Optional arguments ๐Ÿ‘Œ

Parameter Type Description
-h, --help show this help message and exit
-nt NAME2TYPEFILE,
--Name2TypeFile NAME2TYPEFILE
[path/filename] List of correspondance between tip name and cell type for cell lineage tree #1.
-nt2 NAME2TYPEFILE,
--Name2TypeFile2 NAME2TYPEFILE
[path/filename] List of correspondance between tip name and cell type for cell lineage tree #2.
-xsd XSCOREDICTFILE,
--ScoreDictFile SCOREDICTFILE
[path/filename] A comma-delimited text file used to determine similarity scores between cells. If there are exactly three columns, they will be interpreted as (1) the cell (name or type) in Tree #1, (2) the cell in Tree #2, and (3) the similarity score. If otherwise, the first column will be interpreted as the cell (name or type) and the remaining columns as features of the cell (e.g. expression of a gene). The similarity scores will be estimated between all pairs of cells based on the Euclidean distance calculated using all the features. Overrides `-ma` and `-mi`.
-lsd LSCOREDICTFILE1,
--ScoreDictFile1 SCOREDICTFILE1
[path/filename] Calculate the Euclidean distance based on the characteristics of the cell (such as gene expression) to obtain the result x, and obtain the corresponding score through a series of operations such as - ln (x+1).
-t TOP, --top TOP [int > 0] Performs local (instead of global) alignment, and output the top NUM local alignments with the highest score (e.g. `-t 10`). In the case of global alignment, this parameter should be omitted.
-ma MAV, --mav MAV [float] Default=2. Score of two paired end nodes with the same name or type.
-mi MIV, --miv MIV [float] Default=-1. Shorthand for a simple matching score scheme, where the matching score between a pair of the same cell types is MAV and all other pairs are MIV. (e.g. `-ma 2 -mi -2`). Overridden by `-sd`.
-p PV, --pv PV [float] Default=-1. The score for pruning a tip of the tree (e.g. `-p -2`). Default to -1.
-pper PPER, --prunepercent PRUNEPERCENT [float] Default=20. The comparison result requires either party's pruning rate to be less than or equal to PPER. The pruning rate is the proportion of pruned leaf nodes to all leaf nodes in the subtree.
-T TQDM, --Tqdm TQDM [0(off) or 1(on)] Whether to display the running Progress bar
-n NOTEBOOK,
--notebook NOTEBOOK
[0(off) or 1(on)] Toggle for the jupyter notebook environment.
-P PERM, --PERM PERM [int > 0] Toggle for the statistical significance. For each observed alignment, the aligned trees will be permuted PERM times to generate a null distribution of alignment scores, with which a P value can be calculated for the observed alignment score.
-c CPUS, --CPUs CPUS [int > 0] Number of threads for multi-processing. Default to 50., it can reach the maximum number of local CPU cores - 1.
-o OUTPUT,
--output OUTPUT
[path] Output path, eg:'/home/username'
-mg MERGE,
--merge MERGE
[float] This is the scaling factor for calculating the score of merging an internal node (e.g. -mg -1), which is multiplied by the number of tips of the internal node to be merged. Default to 0.
-x DIFF,
--diff DIFF
[int > 0] Alignment must consist of a minimal of DIFF percent aligned cell pairs that are different from previous(better) local alignments in order to be considered as another new alignment (e.g. `-x 20` means 20 percent).

Input & Output example ๐Ÿ””

input

ExampleFile ๐Ÿ“‚

tree1.nwk # A tree2.nwk # B

// Qualitative calculation

Name2Type.csv # C Xscorefile.csv

// Quantitative Calculation

Lscorefile.csv

network

output

Output_example ๐Ÿ“‚

network

Circle.pdf # A Cluster.pdf # B Line.pdf # C

network

Match_tree

topN.pdf // N represents the sequence number of the nth

network

DensitreeALL

_densitrees_info.csv
All density tree results are represented as ๏ผš

  1. MatchTimes_TreeX_Label_densitree_leaves_match.csv
  2. The nwk subtree sequence file of TreeX and its root node have been converted to a cell type nwk subtree sequence file '''

DensitreeBEST

  1. MatchTimes_TreeX_Label_densitree_leaves_match.csv
  2. The nwk subtree sequence file of TreeX and its root node have been converted to a cell type nwk subtree sequence file

network

Algorithm โŒจ๏ธ

Using the idea of dynamic programming, update the score matrix. Except for the scores of the end nodes derived from default values or user input socrefile, the scores of all other cells are calculated using the weighted bipartite graph most matching algorithm (KM) to obtain the optimal value

็ฎ—ๆณ•

Background ๐ŸŒ™

bg

Data preprocessing ๐Ÿค–

In order to speed up our computing speed, we can prepare the socrefile in advance, and use the data preprocessing program mdelta_prepro to calculate our biological data in advance, which can be reused or regenerated in the future. Greatly improve efficiency.

pre2

Readme Card

Procedure ๐ŸŽฌ

procedure

Authors ๐ŸŠ๐Ÿณโค๏ธ๐Ÿ‘

    ้™ˆๆ™ฏๅฎ‡ EeWhile ๐Ÿคตโ€โ™‚๏ธ

    Student of SYSU. ๐Ÿซ

    ZHONGSHAN SCHOOL OF MEDICINE,SYSU https://zssom.sysu.edu.cn/zh-hans ๐Ÿ’–

    Undergraduate majoring in computer science, master majoring in bioinformatics. ๐Ÿ‘จโ€๐Ÿ’ป

    Below is the contact information of the author. ๐Ÿ‘€

    Github Stars Bilibili Zhihu Weibo neteasy-mysic douyin instagram
    QQ wechat mail gmail
    sysu

    sysulogo

If you use this project in your research, please cite this project.

@misc{mdelta2022,
    author = {Jingyu Chen},
    title = {mDELTA: Multifuricating Developmental cEll Lineage Tree Alignment},
    year = {2023},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/Chenjy0212/mdelta_full}},
}

About

Multifuricating Developmental cEll Lineage Tree Alignment (mDELTA) algorithm

Resources

Stars

Watchers

Forks

Packages

No packages published