mDELTA: an algorithm for multifuricating Developmental cEll Lineage Tree Alignment.

MDELTA is an algorithm for Multifuricating Developmental cEll Lineage Tree Alignment. In essence, it compares two rooted, unordered, tip-labeled trees, and finds the best global/local correspondence between the nodes. The mDELTA program is designed for analyzing developmental cell lineage trees reconstructed through single-cell DNA barcoding (such as done by scGESTALT or SMALT, while greater cellular coverage is expected to yield more meaningful MDELTA alignments).
Except for dealing with cell lineage trees instead of biological sequences, MDELTA is conceptually similar to sequence alignment. It helps quantify similarity among different lineage trees, disentangle the consensus and variation, find recurrent motifs, and facilitate comparative/evolutionary analyses.
Also included in this repository are Python/R scripts for statistical analyses and visualization of MDELTA results, which facilitates their biological interpretation.
MDELTA was developed by Jingyu Chen (EeWhile) under the supervision of Professor Jian-Rong Yang at the Zhongshan School of Medicine of Sun Yat-Sen University in China.

For more details, please refer to the following content:

You can try to learn about the latest examples of dynamic MDELTAUI running results, which will help you better understand the purpose of mDERTA ⚙️ http://eewhile.cn/mdelta_ui

You can obtain a separate Python package for mDELTA for further development(Note that the version of this package is relatively old and many features have not been updated). https://github.com/Chenjy0212/mdelta

If you have any questions, please contact me. My contact information is located at the bottom

Before you begin ⚠️

It's best to use jupyter as your menu display tool, or you can install the corresponding plugin in vscode to use it.
You can install the required packages automatically by running mdelta_menu.ipynb, or you can manually run the package_manger.py to install the required packages.

Tips : The first download time may be relatively long. If any installation package fails, please go ahead and install a package that is suitable for your current running environment (Python and R)
You should first install stable versions of Python and R language to facilitate subsequent operations

Quick start 💻

Open the Munu 📖

Open jupyter and run mdelta_menu.ipynb, run the following code to automatically install the necessary Python packages and open the mdelta function menu.

%run package_manger.py
%run ./mdelta/mydefault.py
#from mdelta.mydefault import *
myargs = get_default()

Running 🚀

Select, click, and swipe to achieve automatic parameter acquisition. The relevant parameters are detailed in the "Parameter parsing table" section of the menu. Continuing to run the next section of code will obtain the desired result.

%run $mdelta $TREE $TREE2 -nt $N2T -nt2 $N2T2 -xsd $XScoreFile -lsd $LScoreFile -t $top -ma $ma -mi $mi -p $p -T $Tqdm -n $notebook -mg $mgg -x $diffs -o $output -P $PERM -c $cpu -pper $pper
!Rscript $match_tree $mdelta_json $XScoreFile $output $LScoreFile $mi $ma
%run $network $mdelta_json $output
!Rscript $densitree $mdelta_json $output $dcc
!Rscript $da $mdelta_json $output $dcc

show more 👈👈👈 CLICK


from itertools import product
TREE,TREE2,N2T,N2T2,XScoreFile,LScoreFile,top,mavv,mavvstep,mivv,mivvstep,ps,psstep,tqdm,n,mg,mgstep,xs,xsstep,o,PERM,cpu,mdelta,match_tree,network,densitree,da,dcc,pper = get_listvalue(myargs.values())
output = get_output(o)
notebook, Tqdm = TF_to_10(n, tqdm)
for ma, mi, p, mgg, diffs in product(forlist(mavv[0], mavv[1], mavvstep),
                                     forlist(mivv[0], mivv[1], mivvstep),
                                     forlist(ps[0], ps[1], psstep),
                                     forlist(mg[0], mg[1], mgstep),
                                     forlist(xs[0], xs[1], xsstep)):
    %run $mdelta $TREE $TREE2 -nt $N2T -nt2 $N2T2 -xsd $XScoreFile -lsd $LScoreFile -t $top -ma $ma -mi $mi -p $p -T $Tqdm -n $notebook -mg $mgg -x $diffs -o $output -P $PERM -c $cpu -pper $pper
    if not PERM > 0:
        mdelta_json = output + '{}_{}_top{}_diff{}_pv{}_miv{}_mav{}_mg{}.json'.format(os.path.basename(TREE).split('.')[0], os.path.basename(TREE2).split('.')[0], top, str(diffs),  str(p), str(mi), str(ma), str(mgg))
        !Rscript $match_tree $mdelta_json $XScoreFile $output $LScoreFile $mi $ma
        %run $network $mdelta_json $output
        !Rscript $densitree $mdelta_json $output $dcc
        !Rscript $da $mdelta_json $output $dcc

Help 🧭

Running the following command can obtain the parsing of relevant parameters

mDELTA.py -h

show more 👈👈👈 CLICK

Positional arguments 👍

Parameter	Type	Description
TreeSeqFile	[path/filename]	A text file storing cell lineage tree #1 in newick format. Tips can be labeled by name or cell type. Branch lengths should be removed.
TreeSeqFile2	[path/filename]	A text file storing cell lineage tree #2 in newick format. Tips can be labeled by name or cell type. Branch lengths should be removed.

Optional arguments 👌

Parameter	Type	Description
-h, --help		show this help message and exit
-nt NAME2TYPEFILE, --Name2TypeFile NAME2TYPEFILE	[path/filename]	List of correspondance between tip name and cell type for cell lineage tree #1.
-nt2 NAME2TYPEFILE, --Name2TypeFile2 NAME2TYPEFILE	[path/filename]	List of correspondance between tip name and cell type for cell lineage tree #2.
-xsd XSCOREDICTFILE, --ScoreDictFile SCOREDICTFILE	[path/filename]	A comma-delimited text file used to determine similarity scores between cells. If there are exactly three columns, they will be interpreted as (1) the cell (name or type) in Tree #1, (2) the cell in Tree #2, and (3) the similarity score. If otherwise, the first column will be interpreted as the cell (name or type) and the remaining columns as features of the cell (e.g. expression of a gene). The similarity scores will be estimated between all pairs of cells based on the Euclidean distance calculated using all the features. Overrides `-ma` and `-mi`.
-lsd LSCOREDICTFILE1, --ScoreDictFile1 SCOREDICTFILE1	[path/filename]	Calculate the Euclidean distance based on the characteristics of the cell (such as gene expression) to obtain the result x, and obtain the corresponding score through a series of operations such as - ln (x+1).
-t TOP, --top TOP	[int > 0]	Performs local (instead of global) alignment, and output the top NUM local alignments with the highest score (e.g. `-t 10`). In the case of global alignment, this parameter should be omitted.
-ma MAV, --mav MAV	[float]	Default=2. Score of two paired end nodes with the same name or type.
-mi MIV, --miv MIV	[float]	Default=-1. Shorthand for a simple matching score scheme, where the matching score between a pair of the same cell types is MAV and all other pairs are MIV. (e.g. `-ma 2 -mi -2`). Overridden by `-sd`.
-p PV, --pv PV	[float]	Default=-1. The score for pruning a tip of the tree (e.g. `-p -2`). Default to -1.
-pper PPER, --prunepercent PRUNEPERCENT	[float]	Default=20. The comparison result requires either party's pruning rate to be less than or equal to PPER. The pruning rate is the proportion of pruned leaf nodes to all leaf nodes in the subtree.
-T TQDM, --Tqdm TQDM	[0(off) or 1(on)]	Whether to display the running Progress bar
-n NOTEBOOK, --notebook NOTEBOOK	[0(off) or 1(on)]	Toggle for the jupyter notebook environment.
-P PERM, --PERM PERM	[int > 0]	Toggle for the statistical significance. For each observed alignment, the aligned trees will be permuted PERM times to generate a null distribution of alignment scores, with which a P value can be calculated for the observed alignment score.
-c CPUS, --CPUs CPUS	[int > 0]	Number of threads for multi-processing. Default to 50., it can reach the maximum number of local CPU cores - 1.
-o OUTPUT, --output OUTPUT	[path]	Output path, eg:'/home/username'
-mg MERGE, --merge MERGE	[float]	This is the scaling factor for calculating the score of merging an internal node (e.g. -mg -1), which is multiplied by the number of tips of the internal node to be merged. Default to 0.
-x DIFF, --diff DIFF	[int > 0]	Alignment must consist of a minimal of DIFF percent aligned cell pairs that are different from previous(better) local alignments in order to be considered as another new alignment (e.g. `-x 20` means 20 percent).

Input & Output example 🔔

input

ExampleFile 📂

tree1.nwk # A tree2.nwk # B

// Qualitative calculation

Name2Type.csv # C Xscorefile.csv

// Quantitative Calculation

Lscorefile.csv

output

Output_example 📂

network

Circle.pdf # A Cluster.pdf # B Line.pdf # C

Match_tree

topN.pdf // N represents the sequence number of the nth

DensitreeALL

_densitrees_info.csv
All density tree results are represented as ：

MatchTimes_TreeX_Label_densitree_leaves_match.csv

The nwk subtree sequence file of TreeX and its root node have been converted to a cell type nwk subtree sequence file '''

DensitreeBEST

MatchTimes_TreeX_Label_densitree_leaves_match.csv

The nwk subtree sequence file of TreeX and its root node have been converted to a cell type nwk subtree sequence file

Algorithm ⌨️

Using the idea of dynamic programming, update the score matrix. Except for the scores of the end nodes derived from default values or user input socrefile, the scores of all other cells are calculated using the weighted bipartite graph most matching algorithm (KM) to obtain the optimal value

Background 🌙

Data preprocessing 🤖

In order to speed up our computing speed, we can prepare the socrefile in advance, and use the data preprocessing program mdelta_prepro to calculate our biological data in advance, which can be reused or regenerated in the future. Greatly improve efficiency.

Procedure 🎬

Authors 🍊🐳❤️🍑

陈景宇 EeWhile 🤵‍♂️

Student of SYSU. 🏫

ZHONGSHAN SCHOOL OF MEDICINE,SYSU https://zssom.sysu.edu.cn/zh-hans 💖

Undergraduate majoring in computer science, master majoring in bioinformatics. 👨‍💻

Below is the contact information of the author. 👀

If you use this project in your research, please cite this project.

@misc{mdelta2022,
    author = {Jingyu Chen},
    title = {mDELTA: Multifuricating Developmental cEll Lineage Tree Alignment},
    year = {2023},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/Chenjy0212/mdelta_full}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
ExampleFile		ExampleFile
Output_ex		Output_ex
feature		feature
image		image
mdelta		mdelta
packages_required		packages_required
result		result
.DS_Store		.DS_Store
README.md		README.md
mdelta_menu.ipynb		mdelta_menu.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mDELTA: an algorithm for multifuricating Developmental cEll Lineage Tree Alignment.

Before you begin ⚠️

Quick start 💻

Open the Munu 📖

Running 🚀

Help 🧭

Positional arguments 👍

Optional arguments 👌

Input & Output example 🔔

input

output

Algorithm ⌨️

Background 🌙

Data preprocessing 🤖

Procedure 🎬

Authors 🍊🐳❤️🍑

About

Releases 1

Packages

Languages

Chenjy0212/mdelta_full

Folders and files

Latest commit

History

Repository files navigation

mDELTA: an algorithm for multifuricating Developmental cEll Lineage Tree Alignment.

Before you begin ⚠️

Quick start 💻

Open the Munu 📖

Running 🚀

Help 🧭

Positional arguments 👍

Optional arguments 👌

Input & Output example 🔔

input

output

Algorithm ⌨️

Background 🌙

Data preprocessing 🤖

Procedure 🎬

Authors 🍊🐳❤️🍑

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages