Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase vocoder #90

Open
jpedrick opened this issue Apr 30, 2024 · 10 comments
Open

Phase vocoder #90

jpedrick opened this issue Apr 30, 2024 · 10 comments

Comments

@jpedrick
Copy link

I'd like to be able to:

  1. Change the pitch of a sample without speeding it up or slowing it down. It seems the tool for this is a phase vocoder, which involves taking a short window fft, shifting the frequencies and then inverse fft. From the existing filters it's not clear to me how to do this. For my use case, I don't mind preprocessing an audio resource(doesn't have to be realtime) to get the desired pitch change.
  2. Compress or extend a sample without changing the frequency.

There's at least one Rust library that can do this: https://github.com/ajyoon/rocoder, though it claims it may not be quite correct.

Mostly, I'm hoping to open a discussion about how frequency domain filters could be best implemented in Kira.

@tesselode
Copy link
Owner

This is a common feature request and I don't have the DSP know-how to implement it. I don't think the existing effects will help you here, and this should be built in to the sound implementations anyway. So if you learn how to do it, let me know!

@jpedrick
Copy link
Author

jpedrick commented May 8, 2024

This is a common feature request and I don't have the DSP know-how to implement it. I don't think the existing effects will help you here, and this should be built in to the sound implementations anyway. So if you learn how to do it, let me know!

I don't think it should be too difficult if I can get something like the following:

pub trait TransformAudio { fn apply_transform( left: &Vec<f32>, right: &Vec<f32> ) -> ( Vec<f32>, Vec<32> ); }

Then, the user can implement whatever transformations they need for their effect and return ( left, right ) to be processed by the next transformer.

I believe streaming transforms should be able to apply the same function on subsamples.

@SolarLiner
Copy link
Contributor

The first step to writing these kinds of algorithms is to provide a way to buffer the input. Algorithms like the Phase Vocoder require a buffer with a fixed size, consistent between calls, and usually work best with a power of 2 because of the FFT algorithms they use (although these days the FFT is performant in many more cases than just lengths of powers of 2). The way it works is with a circular buffer that fills up with audio samples, and then calls the buffered processing method every time the circular buffer has done a full rotation of its contents.

The buffered processing algorithm then does not have to care about this buffering process.

If I may shoehorn my own library, here's an example (which uses two slices instead of a circular buffer, but the process is the same): https://github.com/SolarLiner/valib/blob/febaa5bab58d103fa31b4a06cdb63ec3a330815d/src/dsp/mod.rs#L186

@jpedrick
Copy link
Author

jpedrick commented May 8, 2024

@SolarLiner , that sounds correct and it looks like you have a lot more experience with writing this kind of transformation. Do you think you could submit a PR or describe within the framework of Kira's existing code what you would expect for that kind of plugin architecture?

I picture two use cases.

  1. My current use case, which is taking samples from instruments and changing the pitch or stretching/compressing without changing the pitch. These would ideally be preprocessed and kept as assets in memory.
  2. Applying a transform to a stream, for example to change the pitch of someone's voice or apply a filter to a track(many audio sources summed together).

For 1, I would want something where I can transform an StaticSoundData and get a new StatidSoundData which has the transformation applied.

@SolarLiner
Copy link
Contributor

If you want to pre-process a sound then your best bet is to use another tool (either graphical or CLI) for that, because the pitch shifting algorithms there are most probably more robust and reliable, not to mention that some of the best implementations are paid, so using a paid tool will probably give you the best results (look up Elastique by zplane for the library, used in tools like FL Studio, Ableton, Traktor, etc.). For open-source alternative, check out Rubberband. It's always going to be better to do this processing ahead of time, where the best quality parameters can be used for the best output. I personally don't see any advantage to doing it at runtime and in memory, rather than having the files already processed, ready to be loaded.

Your next best bet is to write an integration to Rubberband as a Kira effect -- you won't be able to time stretch the files, but you will be able to pitch-shift. Rubberband will probably expect a buffer of audio samples as input, which means you would also need to provide an adapter that does the bufferizing that I explained above. You can even make the buffering code generic, so that it becomes a normal Kira effect that wrap buffered effects.

If you want to stay with pure Rust solutions, then I don't know of any high-quality time-stretching/pitch-shifting crates.

Note that using StaticSoundData as the input and output types for the transform isn't ideal, as they are just holding the file contents, and have not yet been decoded into audio data. A better solution would be to have a function that takes in a Vec<f32> and returns another one, as it's more generic. I don't think StaticSoundData has a way to be created with raw audio data, though, which is a bummer.

@tesselode
Copy link
Owner

Note that using StaticSoundData as the input and output types for the transform isn't ideal, as they are just holding the file contents, and have not yet been decoded into audio data. A better solution would be to have a function that takes in a Vec and returns another one, as it's more generic. I don't think StaticSoundData has a way to be created with raw audio data, though, which is a bummer.

Do you mean StreamingSoundData? StaticSoundData is raw audio and can be constructed manually.

@SolarLiner
Copy link
Contributor

Oh I didn't see that the data fields were public (I need to change my handling in bevy-kira-components 😆 )

@jpedrick
Copy link
Author

jpedrick commented May 9, 2024

If you want to pre-process a sound then your best bet is to use another tool (either graphical or CLI) for that, because the pitch shifting algorithms there are most probably more robust and reliable, not to mention that some of the best implementations are paid, so using a paid tool will probably give you the best results (look up Elastique by zplane for the library, used in tools like FL Studio, Ableton, Traktor, etc.). For open-source alternative, check out Rubberband. It's always going to be better to do this processing ahead of time, where the best quality parameters can be used for the best output. I personally don't see any advantage to doing it at runtime and in memory, rather than having the files already processed, ready to be loaded.

Yeah, so in my case, I'm essentially wanting to build something like a midi instrument in Rust that can play sounds from samples, stretching them or changing the pitch. I would know what notes will be played in a just-in-time manner, but pre-rendering all the samples would be a bit of a i * j * k combinatorial problem. I as may also want to apply additional effects. (For example, an app like sforzando ).

Your next best bet is to write an integration to Rubberband as a Kira effect -- you won't be able to time stretch the files, but you will be able to pitch-shift. Rubberband will probably expect a buffer of audio samples as input, which means you would also need to provide an adapter that does the bufferizing that I explained above. You can even make the buffering code generic, so that it becomes a normal Kira effect that wrap buffered effects.

If you want to stay with pure Rust solutions, then I don't know of any high-quality time-stretching/pitch-shifting crates.

I'm not set on a pure-Rust solution. Rubberband looks like an appropriate tool. Can we discuss what changes would be needed to Kira and what an implementation might look like to integrate Rubberband to do pitch changes and time stretching?

I'd like to break ground on this, I just need a bit of direction on where to start.

@tesselode @SolarLiner

@SolarLiner
Copy link
Contributor

Kira is a game audio engine, not a general-purpose one, and as such you'll run into obstacles. Again, @tesselode is the arbiter here, but I don't think it would make sense to add those features in for this reason.

That being said, sample players (like SFZ players mentioned above) don't to time-stretching when pitch-shifting, they simply change the playback rate, which is already what's implemented in Kira. The reason it doesn't sound bad with SFZ instruments is that they're multi-sampled: you have different files depending on pitch, velocity, and/or other parameters (one of my piano SFZ libraries has 3 velocity per note, all 88 notes, with sustain pedal on and off, and from two mics. That is 3 * 88 * 2 * 2 = 1056 wav files, for a grand total of 12 GB of audio files :)

All that is to say, with sampling, the combinatorial problem is very much real.

For additional effects like delay, chorus or reverb, those are already possible (and some already available) in Kira.

If you really want to choose Kira for this anyway, your first step is going to implement a multi-sample player that can select the right sample to play based on parameters of your choosing (most often used being pitch), and do relative playback rate changes based on the root note of the selected sample, to make it match the key being played.

@jpedrick
Copy link
Author

jpedrick commented May 9, 2024

@SolarLiner I see, that makes a lot of sense(just changing the playback rate) and sounds like it'll do what I need. I had noticed the SFZ files were multi-sampled. So it would make sense to preprocess the frequency shifts to generate higher/lower frequency samples using another tool and then use Kira to get the specific pitch shifts and add effects. I'll give that a try! I hopefully shouldn't need a full 12Gb of sample data.

I am using Kira in a game through bevy_kira_audio, but the game has a midi-like component to it. I guess even as a game oriented library FFT based transformations would be a cool feature to have. So it might be worth continuing the discussion on how to add a plugin interface to Kira that would support transformations from a circular buffer.

As is, your explanation on SFZ should allow me to do what I'm trying to accomplish for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants