Simple and Effective Masked Diffusion Language Models

By Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov

This is an experimental fork of the main MDLM repo. This code is experimental, may be broken, and is being actively hacked on as a personal experiment. Please use the official repo for anything serious.

Code Organization

main.py: Routines for training and evaluation
noise_schedule.py: Noise schedules
diffusion.py: Forward/reverse diffusion
dataloader.py: Dataloaders
utils.py: LR scheduler, logging, fsspec handling
models/: Denoising network architectures. Supports DiT, AR transformer, and Mamba
configs/: Config files for datasets/denoising networks/noise schedules/LR schedules
scripts/: Shell scripts for training/evaluation

Getting started in this repository

To get started, create a conda environment containing the required dependencies.

conda env create -f requirements.yaml
conda activate mdlm

Create the following directories to store saved models and slurm logs:

mkdir outputs
mkdir watch_folder

and run the training as a batch job:

sbatch scripts/train_owt_mdlm.sh

Checkpoints

We have uploaded MDLM model trained on OpenWebText for 1M training steps to the Huggingface hub 🤗: kuleshov-group/mdlm-owt Furthermore, we have released the checkpoints for the AR and SEDD baselines trained on OpenWebText in this Google Drive folder.

Reproducing Experiments

Below, we describe the steps required for reproducing the experiments in the paper. Throughout, the main entry point for running experiments is the main.py script. We also provide sample slurm scripts for launching pre-training and downstream fine-tuning experiments in the scrips/ directory.

Generate Samples

The argument to sampling.predictor specifies the sampler which takes one of the following values:

ddpm_cache: our proposed sampler that's ~3-4x faster than the samplers propsed in D3PM and SEDD.
ddpm: Ancestral sampling proposed in D3PM.
analytic: Analytic sampler proposed in SEDD.

To generate samples from a pre-trained model use one of the following commands:

Huggingface model

python main.py \
  mode=sample_eval \
  eval.checkpoint_path=kuleshov-group/mdlm-owt \
  data=openwebtext-split  \
  model.length=1024  \
  sampling.predictor=ddpm_cache  \
  sampling.steps=1000 \
  loader.eval_batch_size=1 \
  sampling.num_sample_batches=10 \
  backbone=hf_dit

Local checkpoint

python main.py \
  mode=sample_eval \
  eval.checkpoint_path=/path/to/checkpoint/mdlm.ckpt \
  data=openwebtext-split  \
  model.length=1024  \
  sampling.predictor=ddpm_cache  \
  sampling.steps=10000 \
  loader.eval_batch_size=1 \
  sampling.num_sample_batches=1 \
  backbone=dit

Citation

@misc{sahoo2024simple,
      title={Simple and Effective Masked Diffusion Language Models}, 
      author={Subham Sekhar Sahoo and Marianne Arriola and Yair Schiff and Aaron Gokaslan and Edgar Marroquin and Justin T Chiu and Alexander Rush and Volodymyr Kuleshov},
      year={2024},
      eprint={2406.07524},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple and Effective Masked Diffusion Language Models

Code Organization

Getting started in this repository

Checkpoints

Reproducing Experiments

Generate Samples

Huggingface model

Local checkpoint

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
algorithms		algorithms
configs		configs
models		models
scripts		scripts
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
diffusion.py		diffusion.py
main.py		main.py
noise_schedule.py		noise_schedule.py
requirements.yaml		requirements.yaml
utils.py		utils.py

License

kuleshov/diffusion-llm

Folders and files

Latest commit

History

Repository files navigation

Simple and Effective Masked Diffusion Language Models

Code Organization

Getting started in this repository

Checkpoints

Reproducing Experiments

Generate Samples

Huggingface model

Local checkpoint

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages