Deblurring Variational Autoencoders with Short-Time Fourier Transform

Deblurring Variational Autoencoders with Short-Time Fourier Transform
Vibhu Dalal

Abstract: Variational Autoencoders (VAEs) are powerful generative models, however their generated samples are known to suffer from a characteristic blurriness, as compared to the outputs of alternative generating techniques. Extensive research efforts have been made to tackle this problem, and several works have focused on modifying the reconstruction term of the evidence lower bound (ELBO). In particular, many have experimented with augmenting the reconstruction loss with losses in the frequency domain. Such loss functions usually employ the Fourier transform to explicitly penalise the lack of higher frequency components in the generated samples, which are responsible for sharp visual features. In this paper, we explore the aspects of previous such approaches which aren’t well understood, and we propose an augmentation to the reconstruction term in response to them. Our reasoning leads us to use the short-time Fourier transform and to emphasise on local phase coherence between the input and output samples. We illustrate the potential of our proposed loss on the MNIST dataset by providing both qualitative and quantitative results.

Access the full paper here.

Usage

Install requirements

After cloning the repository, install the requirements with:

$ pip install -r requirements.txt

Generate samples

The repository contains 5 checkpoints of a VAE model in checkpoints/, which correspond to the following reconstruction loss functions which were tested during the training of the VAE:

L1
L2
SSIM
DFT + SSIM
Ours

Samples can be generated from the models by running generate.py. An example run would be:

$ python generate.py --loss=ssim, --num_samples=16 --name=gen_samples --out_dir=results

The following flags can be specified:

$ python generate.py --help
usage: generate.py [-h] [--loss LOSS] [--num_samples NUM_SAMPLES] [--name NAME]
               [--out_dir OUT_DIR]

optional arguments:
  -h, --help            show this help message and exit
  --loss LOSS           Options: l1, l2, ssim, dft+ssim, ours
  --num_samples NUM_SAMPLES
                        Number of samples to generate
  --name NAME           Name of the generated image
  --out_dir OUT_DIR     Name of output directory

Train the model

To train the VAE model, please refer to the configuration files config/train_config.yml and config/model_config.yml to customise the training procedure. Once the configuration files are ready, start the training with:

$ python train.py

Note: the configuration parameters can be either set from before in the configuration files or they can be set with the previous command, e.g.

$ python train.py --batch_size=40 min_lr=0.0001 max_lr=0.001 --epochs=30

Citation

If you use or extend this work, please cite it as below:

@software{Vibhu_Dalal_Deblurring-Variational-Autoencoders-with-Short-Time-Fourier-transform_2023,
  author = {Dalal, Vibhu},
  month = {3},
  title = {Deblurring-Variational-Autoencoders-with-Short-Time-Fourier-transform},
  url = {https://github.com/Vibhu04/Deblurring-Variational-Autoencoders-with-STFT},
  version = {1.0.0},
  year = {2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deblurring Variational Autoencoders with Short-Time Fourier Transform

Usage

Install requirements

Generate samples

Train the model

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
checkpoints		checkpoints
config		config
results		results
vae		vae
README.md		README.md
generate.py		generate.py
paper.pdf		paper.pdf
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Vibhu04/Deblurring-Variational-Autoencoders-with-STFT

Folders and files

Latest commit

History

Repository files navigation

Deblurring Variational Autoencoders with Short-Time Fourier Transform

Usage

Install requirements

Generate samples

Train the model

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages