train loss of custom data #133

Wangzhen-kris · 2023-06-12T06:57:00Z

Hi,

I tried to train on my dataset, but I seem to have an abnormal loss curve. Do you have any suggestions?
Thanks.

The loss of AR:
https://drive.google.com/file/d/1-gZJX-mwYZ-2vkKTl0dTwBcp1A8MHrmV/view?usp=drive_link

The loss of NAR:
https://drive.google.com/file/d/1-9L_AQZyyAgDRqKPpx06w6M99ZPSUIhe/view?usp=drive_link

RuntimeRacer · 2023-06-21T18:21:59Z

Hi @Wangzhen-kris, what kind of data does your dataset consist of? Is it by any chance containing very diverse speakers or even multiple languages? Also, are they organized into separate cut sets which were combined for training?

While trying to train on Apache CommonVoice I ran into similar graphs. I found out that the usage of the Lhotse Dynamic samplers leads to the issue of static CutSet order - Which means Language C always gets trained after B, which is trained after A.
Also this leads to the Model biasing a lot towards the CutSet it was trained last on. For example, all my Inference tests at the end of one epoch had a french dialect.

I figured a solution for this, by randomizing the CutSet contents before training. It is quite Memory Intensive on a large dataset (~60 GB needed for almost complete CommonVoice 13) and also quite slow since it's a single threaded process. Takes about 10 Minutes on my AI server.
I still want to make this a bit better, for example, have it resample after each epoch (currently it does once at trainings start, and only if there is no randomized file already). But you could have a look at my branch; maybe it's helpful for you:

main...RuntimeRacer:vall-e:cuts_randomizer

Also I attached a screenshot how this stabilized my training; the arrows point to where this was applied after 2 epochs without this pre-processing:

MajoRoth · 2024-01-10T13:36:21Z

Im facing the same issue and trying to debug.
What causes Lhotse dynamic samplers to load in order?
im shuffling the files in the tokenization part, and using shuffle=True,
but still getting weird loss graphs that indicate that something is wrong:

this patterns occurs each epoch...
any clues?

lifeiteng pinned this issue Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train loss of custom data #133

train loss of custom data #133

Wangzhen-kris commented Jun 12, 2023

RuntimeRacer commented Jun 21, 2023 •

edited

Loading

MajoRoth commented Jan 10, 2024

train loss of custom data #133

train loss of custom data #133

Comments

Wangzhen-kris commented Jun 12, 2023

I tried to train on my dataset, but I seem to have an abnormal loss curve. Do you have any suggestions? Thanks.

RuntimeRacer commented Jun 21, 2023 • edited Loading

MajoRoth commented Jan 10, 2024

I tried to train on my dataset, but I seem to have an abnormal loss curve. Do you have any suggestions?
Thanks.

RuntimeRacer commented Jun 21, 2023 •

edited

Loading