-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scripts] filter segment duration in vad_to_segments.sh #2447
Conversation
@@ -58,7 +58,9 @@ if [ $stage -le 0 ]; then | |||
|
|||
for n in `seq $nj`; do | |||
cat $sdata/$n/subsegments | |||
done | sort > $data/subsegments || exit 1; | |||
done | sort | \ | |||
awk '{if (! (NF != 4 || $4 - $3 <= 0.25)) { print $0 }}' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but I think it would make more sense to have the minimum length configurable as an option to the script.
You can pass it into awk using e.g. -v m=$min_duration
Also I prefer if you write that as an && expression instead of a ! ||.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry the conditional was bad indeed. Fixed it and passed the duration as an option.
@@ -26,6 +27,7 @@ if [ $# -ne 2 ]; then | |||
echo " --stage (0|1) # start script from part-way through" | |||
echo " --cmd (run.pl|queue.pl...) # specify how to run the sub-processes" | |||
echo " --segmentation-opts '--opt1 opt1val --opt2 opt2val' # options for segmentation.pl" | |||
echo " --min-duration <m> # filtering out any generated subsegment with lower duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd change the comment to:
# min duration in seconds for segments (smaller ones are discarded)
I just realized that (I think) Vimal's VAD code has minimum durations configurable in its algorithms. |
This happened when using the 'basic' energy-based VAD. |
Oh, OK. I think Vimal's tools output segments directly; this script wouldn't be involved. |
I just came upon the following issue:
the subsegments created by
vad_to_segments.sh
contained a few of very small length (e.g. 20ms), which caused other issues down the line.I fixed it by filtering these subsegments by duration before writing them. I chose the value 0.25 but we maybe could add a
--min-subsegment-length
option.