Windows Support #425

vincentqb · 2020-02-04T14:40:24Z

peterjc123 · 2020-02-05T04:16:23Z

The kaldi_io test is passing on Windows now. BTW, I think it's hard to compile Sox on Windows. Other things sound reasonable to me.

vincentqb · 2020-02-05T16:20:38Z

Thanks for the input. Can you share the output of CircleCI where the kaldi_io tests are passing?

If SoX is not possible to compile on Windows, we'll need to identify an alternative backend that offers similar file support on Windows: mp3, flac, wav, at least. soundfile unfortunately doesn't support mp3. See e.g. comparison.

peterjc123 · 2020-02-06T13:38:47Z

Thanks for the input. Can you share the output of CircleCI where the kaldi_io tests are passing?

Sure. It was posted here: #419 (comment).

If SoX is not possible to compile on Windows, we'll need to identify an alternative backend that offers similar file support on Windows: mp3, flac, wav, at least.

What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa?

vincentqb · 2020-02-06T20:57:29Z

What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa?

aubio seems to perform better than librosa, according to this, and supports more format than audioread. Thoughts?

peterjc123 · 2020-02-07T04:56:48Z

Well, it looks good to me except its package on pypi is a source package. However, if we use the C/C++ part then we should be okay.

vincentqb · 2020-02-14T21:02:29Z

Well, it looks good to me except its package on pypi is a source package. However, if we use the C/C++ part then we should be okay.

What is the implication of a source package?

peterjc123 · 2020-02-15T05:16:08Z

As you can see from https://pypi.org/project/aubio/#files, only the file ends with .tar.gz is available.

vincentqb · 2020-02-18T18:22:43Z

What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa?

Both seems good then. Let's go for audioread then, since it appears to be faster than librosa. I've updated the description above to reflect the choice of audioread over sox for windows.

dachosen1 · 2020-02-19T00:06:54Z

Have you looked into pydub? https://github.com/jiaaro/pydub

I've been using it on windows, and it works great for mp3 and wav files. The installation is a bit involved since it requires the user to add ffmpeg to the environment path?

faroit · 2020-02-19T10:47:57Z

@vincentqb Just some small remarks...

For the various use cases of audio i/o there are two scenarios where loading is used within torchaudio:

Training

Here, loading and decoding performance is crucial and easily becomes the bottleneck of dataloaders that deal with raw audio. Typically expensive compression formats should be avoided and simple formats such as wav, flac and mp3 should be used instead. Furthermore seeking support is crucial to load chunked audio from original (larger tracks)
In this use-case we already have libsndfile, interfaced with pysoundfile that cover wav and flac (at one point it would make sense to directly interface libsndfile to avoid numpy). Regarding MP3 support (+windows) I just discovered minimp3 that ticks all boxes. Also it is ridiculously fast and therefore could easily be the best tradeoff between loading and decoding speed.

Inference

Here, performance is not that crucial but support for various formats such as m4a/mp4/aac would be beneficial. As we often discussed in torchaudio-contrib, I still don't see any way around ffmpeg. ;-)

To sum up, I don't think it make sense to add another python package for audio i/o and instead focus on more low level and faster alternatives such as minimp3 that also come with less dependencies. What do you think?

vincentqb · 2020-03-07T23:35:29Z

In this use-case we already have libsndfile, interfaced with pysoundfile that cover wav and flac (at one point it would make sense to directly interface libsndfile to avoid numpy). Regarding MP3 support (+windows) I just discovered minimp3 that ticks all boxes. Also it is ridiculously fast and therefore could easily be the best tradeoff between loading and decoding speed.

@faroit -- Have you run your benchmark with minimp3? I'd love to see how it compares.

You are suggesting having a mix of backend for different format? That could be an option, yes. However, the context of this particular pull request is to make torchaudio available on Windows with the same features as the other OSs supported, and so this particular pull request doesn't push the boundaries of speed :)

Inference

Here, performance is not that crucial but support for various formats such as m4a/mp4/aac would be beneficial. As we often discussed in torchaudio-contrib, I still don't see any way around ffmpeg. ;-)

To sum up, I don't think it make sense to add another python package for audio i/o and instead focus on more low level and faster alternatives such as minimp3 that also come with less dependencies. What do you think?

I agree that there are already many python libraries loading audio files. In particular, those that load into numpy can be then used to load into pytorch, since pytorch can convert tensors from/to numpy at no cost. This means most users that want some very specific audio file can already do so.

It is still convenient for the users to get support for some common audio file format directly in torchaudio. But we can focus on the most critical format (wav, flac, mp3), and support them well and fast.

In that context, since ffmpeg is a heavy dependency, I would avoid depending on it for as long as I can. :)

peterjc123 · 2020-03-09T03:37:05Z

@vincentqb Actually both audioread and aubio relies on ffmpeg.

vincentqb · 2020-03-09T14:42:12Z

Ah, good point. Has any of you faced any challenges such as this installing audioread? If not, I'd say we move forward anyway.

By the way, torchvision is also moving toward ffmpeg for video.

@cpuhrsch -- You voiced not being in favor of ffmpeg in the past. Any comments?

peterjc123 · 2020-03-09T14:48:51Z

@vincentqb It will be easy for conda users because they can simply do conda install -c conda-forge ffmpeg. To make it convenient for other users, we may just distribute the DLLs for them.

peterjc123 · 2020-03-09T14:52:34Z

@vincentqb BTW, users can only read a file using audioread, but not write. If we want to create a new backend like sndfile and sox, we'd better choose something else.

vincentqb · 2020-03-09T14:59:06Z

Let's list the requirements for a backend:

Easy installation with torchaudio for the user in windows (for this PR).
Read wav/mp3/flac whole files, or chunks at specified location of a file.
Write wav/mp3/flac whole files.
Optional: Perform well in this benchmark.

@peterjc123 -- Please do let me know if I forget anything in this list. Do you know any other backend that would work well with those criteria?

faroit · 2020-03-09T15:18:37Z

@vincentqb

Have you run your benchmark with minimp3? I'd love to see how it compares.

There is no functional python/numpy interface yet – see status of pyminimp3, so I used the implementation recently added to tf.io. The performance looks incredible:

(ar_ffmpeg is audioreads ffmpeg interface)

faroit · 2020-03-09T15:23:58Z

@vincentqb @peterjc123

Sorry for hijacking this thread.

In that context, since ffmpeg is a heavy dependency, I would avoid depending on it for as long as I can. :)

I totally agree with you. FFMPEG is going to painful. But I don't think there is any other alternative to support a large number of formats.

That's why I think we should have some fast decoder-only alternatives for a limited number of formats (useful for training). I am still in favor of removing sox and just go sndfile/minimp3 for this scenario. Then ffmpeg for writing and everything else where loading speed in not an issue.

cpuhrsch · 2020-03-12T00:56:18Z

On ffmpeg, I'd like to add the idea that, in general, we want backends to be opt-in.

By default we should pick a light library that works for most common formats and then allow the user to switch to different backends (such as ffmpeg) for either performance or features.

Figuring out how to setup this backend dispatch mechanism could probably resolve many of the discussions here. Essentially we want to have load and save dispatch to a different backend depending on file-format and the user's settings.

The simplest approach is to make a choice at compile-time. We're already beyond that with our global run-time backend mechanism.

A more granular approach is to then allow users setting different backends for each file format.

Then beyond that we can even introduce preferred orders per fileformat based on available formats (e.g. use specialized library X over Y when available, but transparently default to Y otherwise).

vincentqb · 2020-03-12T14:19:34Z

Right, although the current choice for globack runtime backend dispatch, we do not support mp3 for windows. One option is to switch default global backend to something that also supports mp3 for windows. Another is to add a file-format-dependent dispatch.

The former would favor going all-in with ffmpeg. The latter favors minimp3.

Based on feedback above from @faroit and @cpuhrsch, the latter is preferred as the next step. I'm good with that conclusion, so I'll update the todo/description above to reflect that.

peterjc123 · 2020-05-16T01:18:28Z

@vincentqb I saw a post that describes how to compile torchaudio with Sox. Will try that later.

peterjc123 · 2020-05-18T04:34:15Z

Torchaudio with Sox: #648

vincentqb · 2020-11-03T21:13:12Z

mp3 for windows without sox in #1000

adefossez · 2021-04-30T12:41:53Z

@vincentqb if you want also support writing MP3s on Windows, I would recommend https://github.com/chrisstaite/lameenc

I have been using it for a while inside demucs, and it is amazing (in the sense that it is small, no extra dependencies, and works perfectly with just a pip install on all OSes). At the moment though it seems their build for python3.9 is broken...

vincentqb · 2021-04-30T19:03:51Z

thanks for the input :)

zackees · 2023-02-22T20:24:43Z

Hi there, I see that ffmpeg and sox are issues for this library. I want to let you know that I've solved these exact problems for tools like this so that these binaries can be easily deployed for Mac/Win/Linux.

Please see:

https://github.com/zackees/static-ffmpeg
https://github.com/zackees/static-sox

Using tools like ffmpeg will allow you to write mp3's with minimal code and have it work everywhere. I recommend using static_ffmpeg.add_paths(weak=True) and static_sox.add_paths(weak=True).

These python packages are available through pip as well so can be included in your dependency management. The binaries are only downloaded when they are first used. By specifying weak=True the libraries will only download ffmpeg/sox if the binaries don't already exist on the system.

haideraltahan mentioned this issue Feb 18, 2020

No matching distribution found for torchaudio #436

Closed

vincentqb mentioned this issue Apr 1, 2020

Windows build failing in CircleCI #488

Closed

This was referenced Apr 17, 2020

_audio_backends platform independent #554

Closed

any specific usage for Windows #563

Closed

peterjc123 mentioned this issue May 2, 2020

Turn on tests when building through conda-build #493

Closed

peterjc123 mentioned this issue Jun 16, 2020

Test building with Sox on Windows #648

Closed

vincentqb added the module: windows label Jun 16, 2020

vincentqb mentioned this issue Jul 21, 2020

🚀 Feature Request: Loading audio data from BytesIO or memory #800

Closed

vincentqb mentioned this issue Jan 14, 2021

support on windows #1178

Closed

1enn0 mentioned this issue Feb 26, 2021

Windows build missing on PyPI? #1320

Closed

mthrok closed this as completed Jan 8, 2023

jpc mentioned this issue Jan 30, 2024

need a simple python script to run WhisperSpeech locally to compare to bark collabora/WhisperSpeech#67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows Support #425

Windows Support #425

vincentqb commented Feb 4, 2020 •

edited

Loading

peterjc123 commented Feb 5, 2020

vincentqb commented Feb 5, 2020 •

edited

Loading

peterjc123 commented Feb 6, 2020 •

edited

Loading

vincentqb commented Feb 6, 2020

peterjc123 commented Feb 7, 2020

vincentqb commented Feb 14, 2020

peterjc123 commented Feb 15, 2020 •

edited

Loading

vincentqb commented Feb 18, 2020 •

edited

Loading

dachosen1 commented Feb 19, 2020

faroit commented Feb 19, 2020 •

edited

Loading

vincentqb commented Mar 7, 2020

peterjc123 commented Mar 9, 2020

vincentqb commented Mar 9, 2020 •

edited

Loading

peterjc123 commented Mar 9, 2020 •

edited

Loading

peterjc123 commented Mar 9, 2020

vincentqb commented Mar 9, 2020 •

edited

Loading

faroit commented Mar 9, 2020 •

edited

Loading

faroit commented Mar 9, 2020 •

edited

Loading

cpuhrsch commented Mar 12, 2020

vincentqb commented Mar 12, 2020 •

edited

Loading

peterjc123 commented May 16, 2020

peterjc123 commented May 18, 2020

vincentqb commented Nov 3, 2020

adefossez commented Apr 30, 2021

vincentqb commented Apr 30, 2021

zackees commented Feb 22, 2023 •

edited

Loading

Windows Support #425

Windows Support #425

Comments

vincentqb commented Feb 4, 2020 • edited Loading

peterjc123 commented Feb 5, 2020

vincentqb commented Feb 5, 2020 • edited Loading

peterjc123 commented Feb 6, 2020 • edited Loading

vincentqb commented Feb 6, 2020

peterjc123 commented Feb 7, 2020

vincentqb commented Feb 14, 2020

peterjc123 commented Feb 15, 2020 • edited Loading

vincentqb commented Feb 18, 2020 • edited Loading

dachosen1 commented Feb 19, 2020

faroit commented Feb 19, 2020 • edited Loading

vincentqb commented Mar 7, 2020

peterjc123 commented Mar 9, 2020

vincentqb commented Mar 9, 2020 • edited Loading

peterjc123 commented Mar 9, 2020 • edited Loading

peterjc123 commented Mar 9, 2020

vincentqb commented Mar 9, 2020 • edited Loading

faroit commented Mar 9, 2020 • edited Loading

faroit commented Mar 9, 2020 • edited Loading

cpuhrsch commented Mar 12, 2020

vincentqb commented Mar 12, 2020 • edited Loading

peterjc123 commented May 16, 2020

peterjc123 commented May 18, 2020

vincentqb commented Nov 3, 2020

adefossez commented Apr 30, 2021

vincentqb commented Apr 30, 2021

zackees commented Feb 22, 2023 • edited Loading

vincentqb commented Feb 4, 2020 •

edited

Loading

vincentqb commented Feb 5, 2020 •

edited

Loading

peterjc123 commented Feb 6, 2020 •

edited

Loading

peterjc123 commented Feb 15, 2020 •

edited

Loading

vincentqb commented Feb 18, 2020 •

edited

Loading

faroit commented Feb 19, 2020 •

edited

Loading

vincentqb commented Mar 9, 2020 •

edited

Loading

peterjc123 commented Mar 9, 2020 •

edited

Loading

vincentqb commented Mar 9, 2020 •

edited

Loading

faroit commented Mar 9, 2020 •

edited

Loading

faroit commented Mar 9, 2020 •

edited

Loading

vincentqb commented Mar 12, 2020 •

edited

Loading

zackees commented Feb 22, 2023 •

edited

Loading