X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion

This project relates to the implementation of X-TF-GridNet, a Target Speaker Extraction Network (TSE) in the time-frequency (T-F) domain, which has been accepted by Information Fusion. Our proposed method boasts two key extensions: a U²-Net style network adeptly extracts robust fixed speaker embeddings, and an adaptive embedding fusion (AEA) mechanism ensures the effective utilization of target speaker information.

In this project, the primary basis is the original implementation of SpEx+ and the implementation of TF-GridNet. Notably, the project only encompasses the traing and inference phase. For specifics on data preparation, please refer to there.

Running Experiments

# Train the X-TF-GridNet model.
bash train.sh
# Decode the X-TF-GridNet model.
bash decode.sh
# Output score metrics.
bash evalute.sh

Results

We choose the PESQ, SDR and SI-SDR results on the WSJ0-2mix dataset for further comparison with other time domain TSE method.

Method	Domain	Param. (M)	MACs (G/s)	PESQ $\uparrow$	SDR (dB) $\uparrow$	SI-SDR (dB) $\uparrow$
Mixture	-	-	-	-	2.02	0.2
SpEx	T	10.79	3.55	-	16.3	15.8
SpEx+	T	11.14	3.76	3.43	17.2	16.9
X-DPRNN	T	6.32	63.92	-	-	17.4
SpEx++	T	34.08	11.88	3.53	18.4	18.0
SpEx_pc	T	28.40	40.54	-	18.8	18.6
VEVEN	T	2.63	85.11	3.66	19.2	19.0
X-SepFormer	T	26.66	61.34	3.74	19.5	18.9
X-TF-GridNet	T-F	7.79	68.32	3.70	20.4	19.7
X-TF-GridNet (Large)	T-F	12.68	113.24	3.77	21.7	20.7

(* More details can be found in the paper.)

Citation

If you use our code in your research or wish to refer to the baseline results, please use the following BibTeX entry.

@article{hao2024if,
    title = {{X-TF-GridNet}: A time–frequency domain target speaker extraction network with adaptive speaker embedding fusion},
    journal = {Information Fusion},
    volume = {112},
    pages = {102550},
    year = {2024},
    issn = {1566-2535},
    doi = {https://doi.org/10.1016/j.inffus.2024.102550},
    url = {https://www.sciencedirect.com/science/article/pii/S1566253524003282},
    author = {Fengyuan Hao and Xiaodong Li and Chengshi Zheng},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
data		data
demo		demo
nnet		nnet
README.md		README.md
decode.sh		decode.sh
evaluate.sh		evaluate.sh
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion

Running Experiments

Results

Citation

About

Releases

Packages

Languages

HaoFengyuan/X-TF-GridNet

Folders and files

Latest commit

History

Repository files navigation

X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion

Running Experiments

Results

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages