All source URLs of the 1,000 songs for creating melody-lyric alignment data [1]
In progress (512 songs / 1,000 songs)
We provide scripts for melody-lyric alignment.
Python2
pip install romkan
pip install jaconv
pip install jcconv
install stanford corenlp pywrapper
Japanese Morpheme Parser Mecab
url
Japanese Dependency Parser CaboCha
url
python module for MeCab
and CaboCha
MeCab
Dictionary ipadic
and UniDic
nkf
(character code converter (Shift-JIS -> UTF8))
Change directory python site-packages (e.g. ~/anaconda2/envs/py2/lib/python2.3/site-packages/).
Edit sitecustomize.py
import sys
sys.setdefaultencoding("utf-8")
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2013-06-20.zip
unzip stanford-corenlp-full-2013-06-20.zip
Download ipadic
and unidic
from MeCab: Yet Another Part-of-Speech and Morphological Analyzer and UniDic.
mv unidic dic/
mv dic/dicrc dic/unidic/
mv ipadic dic/
- Prepare
lyrics.txt
of the following format.
@title sample
@artist anonymous
これはサンプルです
歌詞は行と段落で構成されます
段落の間には1行の空行があります
英語が混ざっている日本語の曲も対応しています
-
Prepare
melody.ust
of the following format. (See Utau - Wikipedia) -
Convert character code of UTAU file. (Shift-JIS -> UTF8)
nkf -w8 --overwrite melody.ust
mkdir pair_data
mkdir pair_data/sample
cp lyrics.txt pair_data/sample/sample.txt
cp melody.ust pair_data/sample/sample.ust
-
Text format
python align_data_readable.py > data.txt
-
JSON format
python align_data_json.py > data.jsonl
See sample data.txt
or data.jsonl
- [1] Kento Watanabe, Yuichiroh Matsubayashi, Satoru Fukayama, Masataka Goto, Kentaro Inui and Tomoyasu Nakano. A Melody-conditioned Lyrics Language Model. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018)