From 0c75a46321e6e4b15b1e6c751d9c69ff9cbdda69 Mon Sep 17 00:00:00 2001 From: Felix Kreuk Date: Fri, 9 Jun 2023 16:10:10 +0300 Subject: [PATCH 1/8] Update README.md --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ed10bb91..a152cbc8 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,10 @@ pip install -e . # or if you cloned the repo locally ``` ## Usage -You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing). Finally, a demo is also available on the [`facebook/MusiGen` HugginFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support). +We offer a number of way to interact with MusicGen: +1. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing). +2. You can use the gradio demo locally by running `python app.py`. +3. Finally, a demo is also available on the [`facebook/MusiGen` HugginFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support). ## API From b40a60c61616e7dab9dc27302f51c4300bc98822 Mon Sep 17 00:00:00 2001 From: Sungkyun Chang <1ronyar@gmail.com> Date: Fri, 9 Jun 2023 23:51:48 +0100 Subject: [PATCH 2/8] typo (#3) --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a152cbc8..495a2fb3 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ Audiocraft is a PyTorch library for deep learning research on audio generation. ## MusicGen Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive -Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't not require a self-supervised semantic representation, and it generates +Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. Check out our [sample page][musicgen_samples] or test the available demo! From c81b8e6ad870918f9f0ccc9b3e586d17dbb3375a Mon Sep 17 00:00:00 2001 From: Jamie Pond Date: Fri, 9 Jun 2023 15:52:30 -0700 Subject: [PATCH 3/8] Fix minor typos in `README.md` (#5) * typos * one more --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 495a2fb3..a1126994 100644 --- a/README.md +++ b/README.md @@ -36,9 +36,9 @@ pip install -e . # or if you cloned the repo locally ## Usage We offer a number of way to interact with MusicGen: -1. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing). +1. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing). 2. You can use the gradio demo locally by running `python app.py`. -3. Finally, a demo is also available on the [`facebook/MusiGen` HugginFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support). +3. Finally, a demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support). ## API @@ -55,7 +55,7 @@ GPUs will be able to generate short sequences, or longer sequences with the `sma **Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`. You can install it with: ``` -apt get install ffmpeg +apt-get install ffmpeg ``` See after a quick example for using the API. From 5c52af7f9e17870643a41381021cbdb71b9945e9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexandre=20D=C3=A9fossez?= Date: Sat, 10 Jun 2023 00:53:21 +0200 Subject: [PATCH 4/8] Update requirements.txt --- requirements.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/requirements.txt b/requirements.txt index f7adda22..aa3fa0dd 100644 --- a/requirements.txt +++ b/requirements.txt @@ -17,3 +17,4 @@ transformers xformers demucs librosa +gradio From b15aea084614954a5e4d5c97cf9ee9eb52cee524 Mon Sep 17 00:00:00 2001 From: syhw Date: Sat, 10 Jun 2023 09:12:51 +0200 Subject: [PATCH 5/8] Update README.md added line on dataset source (licensed data) --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index a1126994..10a0200c 100644 --- a/README.md +++ b/README.md @@ -21,6 +21,8 @@ Check out our [sample page][musicgen_samples] or test the available demo!
+We use 20K hours of licensed music to train MUSICGEN. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data. + ## Installation Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following: From e61b2989ea87a24faaf994e822d97d84b2a8c8b7 Mon Sep 17 00:00:00 2001 From: syhw Date: Sat, 10 Jun 2023 09:13:32 +0200 Subject: [PATCH 6/8] Update README.md formatting --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 10a0200c..b36ac079 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ Check out our [sample page][musicgen_samples] or test the available demo!
-We use 20K hours of licensed music to train MUSICGEN. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data. +We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data. ## Installation Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following: From 984b3755a1b37c85dcff24fb516b946ea75da4aa Mon Sep 17 00:00:00 2001 From: syhw Date: Sat, 10 Jun 2023 09:26:02 +0200 Subject: [PATCH 7/8] Update MODEL_CARD.md wording --- MODEL_CARD.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MODEL_CARD.md b/MODEL_CARD.md index fe8159e6..6c2c9f88 100644 --- a/MODEL_CARD.md +++ b/MODEL_CARD.md @@ -52,7 +52,7 @@ The model was evaluated on the [MusicCaps benchmark](https://www.kaggle.com/data ## Training datasets -The model was trained using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound), [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing. +The model was trained on licensed data using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound), [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing. ## Quantitative analysis @@ -62,7 +62,7 @@ More information can be found in the paper [Simple and Controllable Music Genera **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model. -**Mitigations:** All vocals have been removed from the data source using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs). The model is therefore not able to produce vocals. +**Mitigations:** Vocals have been removed from the data source using corresponding tags, and then using using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs). **Limitations:** From bffb181b33ec7d30bc7928da60411c54d35ed665 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexandre=20D=C3=A9fossez?= Date: Sun, 11 Jun 2023 11:52:16 +0200 Subject: [PATCH 8/8] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b36ac079..3cc10867 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,8 @@ pip install -e . # or if you cloned the repo locally We offer a number of way to interact with MusicGen: 1. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing). 2. You can use the gradio demo locally by running `python app.py`. -3. Finally, a demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support). +3. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support). +4. Finally, @camenduru did a great notebook that combines [the MusicGen Gradio demo with Google Colab](https://github.com/camenduru/MusicGen-colab) ## API