Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[egs] Add recipes for CN-Celeb #3758

Merged
merged 16 commits into from
Dec 14, 2019
Prev Previous commit
[scripts] improve some comments
  • Loading branch information
csltstu committed Dec 12, 2019
commit e79c44db3bbbffeecb98f4e2a79d8120e0748c29
9 changes: 7 additions & 2 deletions egs/wsj/s5/utils/data/combine_short_segments.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,22 @@

# begin configuration section
cleanup=true
speaker_only=false
speaker_only=false # If true, utterances are only combined from the same speaker.
# It may be useful for the speaker recognition task.
# If false, utterances are preferentially combined from the same speaker,
# and then combined across different speakers.
# end configuration section


. utils/parse_options.sh

if [ $# != 3 ]; then
echo "Usage: "
echo " $0 [options] <srcdir> <min-segment-length-in-seconds> <dir>"
echo "e.g.:"
echo " $0 data/train 1.55 data/train_comb"
# options documentation here.
echo " Options:"
echo " --speaker-only <true|false> # options to internal/choose_utts_to_combine.py, default false."
exit 1;
fi

Expand Down
8 changes: 4 additions & 4 deletions egs/wsj/s5/utils/data/internal/choose_utts_to_combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@
help="Minimum utterance duration")
parser.add_argument("--merge-within-speakers-only", type = str, default = 'false',
choices = ['true', 'false'],
help="If true, utterances were only combined from the same speaker."
help="If true, utterances are only combined from the same speaker."
"It may be useful for the speaker recognition task."
"If false, utterances were preferentially combined from the same speaker,"
"If false, utterances are preferentially combined from the same speaker,"
"and then combined across different speakers.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were -> are in the usage message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. I have modified it from the PR.

parser.add_argument("spk2utt_in", type = str, metavar = "<spk2utt-in>",
help="Filename of [input] speaker to utterance map needed "
Expand Down Expand Up @@ -222,8 +222,8 @@ def SelfTest():
# This function figures out the grouping of utterances.
# The input is:
# 'min_duration' which is the minimum utterance length in seconds.
# 'merge_within_speakers_only' which is the option to control if
# utterances were only combined from the same speaker.
# 'merge_within_speakers_only' which is a ['true', 'false'] choice.
# If true, then utterances are only combined if they belong to the same speaker.
# 'spk2utt' which is a list of pairs (speaker-id, [list-of-utterances])
# 'utt2dur' which is a dict from utterance-id to duration (as a float)
# It returns a lists of lists of utterances; each list corresponds to
Expand Down