Improved detect silence #745

lumip · 2023-07-23T15:10:52Z

Overview

Reimplementation of detect_silence: Previously this function would invoke RMS computations independently for each slice of min_silence_len in the given audio segment, which leads to a lot of recomputing of similar values of the seek_step is small. The new implementation avoids this, resulting in much smaller detection time.

Caveats

This introduces numpy as a new dependency. This is for two reasons:

it makes the computation easy to express
it is very performant due to numpy being highly optimized for computations on large numeric arrays

While implementing this without numpy would be possible, it would likely not see the same amount of performance increase and easy of implementation.

detect_silence previously used audioop to compute RMS values of slices, which rounds the computed value down to the nearest integers - the silence threshold is not rounded. This is no longer the case in the new implementation, resulting in some slices that were previously detected as silent to not be so anymore. In practice this means that detected silent regions might be slightly shorter than before (by usually one or two seek_steps).

Performance results

%timeit results on audio segments consisting mostly of silence

20 minute segment

# old
> %timeit detect_silence(aus_short, silence_thresh=-50, seek_step=1)
1min 36s ± 914 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# new
> %timeit detect_silence(aus_short, silence_thresh=-50, seek_step=1)
2.66 s ± 20.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

~114 minute segment

# old
> %timeit detect_silence(aus, silence_thresh=-50, seek_step=1)
8min 37s ± 10.5 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

# new
> %timeit detect_silence(aus, silence_thresh=-50, seek_step=1)
15 s ± 392 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

detect_silence finds separate slices of silence and in a last step combines subsequent silent slices into ranges of continuous silence. The added tests specifically ensure the correct function of this combination step.

Previously, detect_silence would collect all slices of min_silence_len in a list, then processed that list to merge subsequent slices into continuous silent ranges. This change performs the merging immediately when silence is detected for a slice, eliminating the need for a second pass over and the memory overhead associated with the internal list of silent slices.

Using numpy to compute RMS for silence detection to reduce redundant computation (and benefit from numpys highly optimized implementation) compared to previous implementation of detect_silence. Some caveats: - adds numpy as new dependency - previously RMS values where rounded down to the next integer; this is now not the case anymore, resulting in borders of silence ranges to possibly vary slightly compared to previous implementation

emsi · 2024-02-05T12:43:45Z

pydub/silence.py


 from .utils import db_to_float


-def detect_silence(audio_segment, min_silence_len=1000, silence_thresh=-16, seek_step=1):
+def _convert_to_numpy(audio_segment):


How about adding a property to AudioSegment?
Something like:

@property def as_numpy(self):

emsi · 2024-02-05T15:22:41Z

It does not seem to work properly.
When ran on YouTube video (~4h length) with:
split_on_silence(audio_segment, min_silence_len=800, keep_silence=True))

It returns the following ranges (just 4 segments):
(0, 6812736, 6812736, 13615464, 13615464, 13635080, 13635080, 13677621)

When running the same file with the same arguments (min_silence_len=800, silence_thresh=-16) in Audacity it finds lots and lots of silence (and I can confirm at glance that those findings are correct):

lumip · 2024-02-24T16:17:46Z

Hey, sorry I saw your responses a bit late just now. Could you perhaps provide a link to the video in question so that I can have a look?

emsi · 2024-02-24T17:20:39Z

I believe I was processing the audio from this video:

https://youtu.be/AY9MnQ4x3zk

BTW: I've used ffmpeg eventually. Super fast and accurate.

lumip · 2024-02-29T20:17:08Z

It does not seem to work properly. When ran on YouTube video (~4h length) with: split_on_silence(audio_segment, min_silence_len=800, keep_silence=True))

It returns the following ranges (just 4 segments): (0, 6812736, 6812736, 13615464, 13615464, 13635080, 13635080, 13677621)

When running the same file with the same arguments (min_silence_len=800, silence_thresh=-16) in Audacity it finds lots and lots of silence (and I can confirm at glance that those findings are correct):

To come back to this, I first want to point out that the changes made in this PR match the regions of silence found by the current implementation in pydub overall fairly well, although there were some larger deviations that I might look into a bit more, but I think these are all explained by the caveats I already pointed out.

With regards to the discrepancy with audacity and ffmpeg: If I run detect_silence with silence_thresh=-32 I obtain results that also reasonably match those produced by ffmpeg with threshold -16. pydub's db_to_float conversion applies different conversion based on whether the using_amplitude keyword argument is True or not - in one case an initial division of the passed in decibel value is a factor of 2 larger than in the other, so I believe that there is a difference in the interpretation of the dB value between pydub's silence detection and that of audacity and ffmpeg. I tried to figure out which one would be more canocical, but I couldn't find reliable definitions for dBFS that do not contradict each other.

lumip added 3 commits July 23, 2023 17:51

Adding specific tests for detect_silence.

eef2e46

detect_silence finds separate slices of silence and in a last step combines subsequent silent slices into ranges of continuous silence. The added tests specifically ensure the correct function of this combination step.

lumip force-pushed the improved_detect_silence_rebased branch from 29837f7 to 61a9459 Compare July 25, 2023 07:54

emsi reviewed Feb 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved detect silence #745

Improved detect silence #745

lumip commented Jul 23, 2023 •

edited

Loading

emsi Feb 5, 2024

emsi commented Feb 5, 2024

lumip commented Feb 24, 2024 •

edited

Loading

emsi commented Feb 24, 2024

lumip commented Feb 29, 2024

Improved detect silence #745

Are you sure you want to change the base?

Improved detect silence #745

Conversation

lumip commented Jul 23, 2023 • edited Loading

Overview

Caveats

Performance results

20 minute segment

~114 minute segment

emsi Feb 5, 2024

Choose a reason for hiding this comment

emsi commented Feb 5, 2024

lumip commented Feb 24, 2024 • edited Loading

emsi commented Feb 24, 2024

lumip commented Feb 29, 2024

lumip commented Jul 23, 2023 •

edited

Loading

lumip commented Feb 24, 2024 •

edited

Loading