Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed some bugs in mgb_data_prep.sh of mgb2_arabic #3501

Merged
merged 5 commits into from
Aug 13, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
fix bugs in mgb_data_prep.sh
  • Loading branch information
Dongji Gao committed Aug 1, 2019
commit 86097ef272bfaecc9f71a3794db3192634e6d197
18 changes: 14 additions & 4 deletions egs/mgb2_arabic/s5/local/mgb_data_prep.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

# Copyright (C) 2016, Qatar Computing Research Institute, HBKU
# 2016-2019 Vimal Manohar
# 2019 Dongji Gao

if [ $# -ne 3 ]; then
if [ $# -ne 2 ]; then
echo "Usage: $0 <DB-dir> <mer-sel>"
exit 1;
fi
Expand All @@ -18,7 +19,7 @@ for x in $train_dir $dev_dir; do
mkdir -p $x
if [ -f ${x}/wav.scp ]; then
mkdir -p ${x}/.backup
mv $x/{wav.scp,feats.scp,utt2spk,spk2utt,segments,text} ${train_dir}/.backup
mv $x/{wav.scp,feats.scp,utt2spk,spk2utt,segments,text} ${x}/.backup
fi
done

Expand All @@ -44,7 +45,10 @@ cat $train_dir/wav_list | while read basename; do
echo $basename $wavDir/$basename.wav >> $train_dir/wav.scp
done

cp $db_dir/dev/{text,segments} $dev_dir
for x in text segments; do
cp $db_dir/dev/${x}.all $dev_dir/${x}
done

find $db_dir/dev/wav -type f -name "*.wav" | \
awk -F/ '{print $NF}' | perl -pe 's/\.wav//g' > \
$dev_dir/wav_list
Expand All @@ -56,11 +60,17 @@ done
#Creating a file reco2file_and_channel which is used by convert_ctm.pl in local/score.sh script
awk '{print $1" "$1" 1"}' $dev_dir/wav.scp > $dev_dir/reco2file_and_channel

# Creating utt2spk for dev from segments
if [ ! -f $dev_dir/utt2spk ]; then
cut -d ' ' -f1 $dev_dir/segments > $dev_dir/utt_id
cut -d '_' -f1-2 $dev_dir/utt_id | paste -d ' ' $dev_dir/utt_id - > $dev_dir/utt2spk
fi

for list in overlap non_overlap; do
rm -rf ${dev_dir}_$list || true
cp -r $dev_dir ${dev_dir}_$list
for x in segments text utt2spk; do
utils/filter_scp.pl $db_dir/dev/${list}_speech.lst $dev_dir/$x > ${dev_dir}_$list/$x
utils/filter_scp.pl $db_dir/dev/${list}_speech $dev_dir/$x > ${dev_dir}_$list/${x}
done
done

Expand Down