Merge branch 'main' of github.com:facebookresearch/audiocraft

rkfg · Jun 11, 2023 · 10b60b3 · 10b60b3
2 parents 2471bc3 + bffb181
commit 10b60b3
Show file tree

Hide file tree

Showing 3 changed files with 12 additions and 5 deletions.
diff --git a/MODEL_CARD.md b/MODEL_CARD.md
@@ -52,7 +52,7 @@ The model was evaluated on the [MusicCaps benchmark](https://www.kaggle.com/data
 
 ## Training datasets
 
-The model was trained using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound),  [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing.
+The model was trained on licensed data using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound),  [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing.
 
 ## Quantitative analysis
 
@@ -62,7 +62,7 @@ More information can be found in the paper [Simple and Controllable Music Genera
 
 **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
 
-**Mitigations:** All vocals have been removed from the data source using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs). The model is therefore not able to produce vocals.
+**Mitigations:** Vocals have been removed from the data source using corresponding tags, and then using using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
 
 **Limitations:**
 

diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@ Audiocraft is a PyTorch library for deep learning research on audio generation.
 ## MusicGen
 
 Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
-Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't not require a self-supervised semantic representation, and it generates
+Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates
 all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
 them in parallel, thus having only 50 auto-regressive steps per second of audio.
 Check out our [sample page][musicgen_samples] or test the available demo!
@@ -21,6 +21,8 @@ Check out our [sample page][musicgen_samples] or test the available demo!
 </a>
 <br>
 
+We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
+
 ## Installation
 Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
 
@@ -35,7 +37,11 @@ pip install -e .  # or if you cloned the repo locally
 ```
 
 ## Usage
-You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing). Finally, a demo is also available on the [`facebook/MusiGen`  HugginFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
+We offer a number of way to interact with MusicGen:
+1. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing).
+2. You can use the gradio demo locally by running `python app.py`.
+3. A demo is also available on the [`facebook/MusicGen`  HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
+4. Finally, @camenduru did a great notebook that combines [the MusicGen Gradio demo with Google Colab](https://github.com/camenduru/MusicGen-colab)
 
 ## API
 
@@ -52,7 +58,7 @@ GPUs will be able to generate short sequences, or longer sequences with the `sma
 **Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
 You can install it with:
 ```
-apt get install ffmpeg
+apt-get install ffmpeg
 ```
 
 See after a quick example for using the API.

diff --git a/requirements.txt b/requirements.txt
@@ -17,3 +17,4 @@ transformers
 xformers
 demucs
 librosa
+gradio