Skip to content

Latest commit

 

History

History
53 lines (42 loc) · 2.61 KB

README.md

File metadata and controls

53 lines (42 loc) · 2.61 KB

X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion

This project relates to the implementation of X-TF-GridNet, a Target Speaker Extraction Network (TSE) in the time-frequency (T-F) domain, which has been accepted by Information Fusion. Our proposed method boasts two key extensions: a U2-Net style network adeptly extracts robust fixed speaker embeddings, and an adaptive embedding fusion (AEA) mechanism ensures the effective utilization of target speaker information.

In this project, the primary basis is the original implementation of SpEx+ and the implementation of TF-GridNet. Notably, the project only encompasses the traing and inference phase. For specifics on data preparation, please refer to there.

Running Experiments

# Train the X-TF-GridNet model.
bash train.sh
# Decode the X-TF-GridNet model.
bash decode.sh
# Output score metrics.
bash evalute.sh

Results

We choose the PESQ, SDR and SI-SDR results on the WSJ0-2mix dataset for further comparison with other time domain TSE method.

Method Domain Param. (M) MACs (G/s) PESQ $\uparrow$ SDR (dB) $\uparrow$ SI-SDR (dB) $\uparrow$
Mixture - - - - 2.02 0.2
SpEx T 10.79 3.55 - 16.3 15.8
SpEx+ T 11.14 3.76 3.43 17.2 16.9
X-DPRNN T 6.32 63.92 - - 17.4
SpEx++ T 34.08 11.88 3.53 18.4 18.0
SpExpc T 28.40 40.54 - 18.8 18.6
VEVEN T 2.63 85.11 3.66 19.2 19.0
X-SepFormer T 26.66 61.34 3.74 19.5 18.9
X-TF-GridNet T-F 7.79 68.32 3.70 20.4 19.7
X-TF-GridNet (Large) T-F 12.68 113.24 3.77 21.7 20.7

(* More details can be found in the paper.)

Citation

If you use our code in your research or wish to refer to the baseline results, please use the following BibTeX entry.

@article{hao2024if,
    title = {{X-TF-GridNet}: A time–frequency domain target speaker extraction network with adaptive speaker embedding fusion},
    journal = {Information Fusion},
    volume = {112},
    pages = {102550},
    year = {2024},
    issn = {1566-2535},
    doi = {https://doi.org/10.1016/j.inffus.2024.102550},
    url = {https://www.sciencedirect.com/science/article/pii/S1566253524003282},
    author = {Fengyuan Hao and Xiaodong Li and Chengshi Zheng},
}