Skip to content

Commit

Permalink
Merge branch 'master' into patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
GreyAlien502 committed Apr 19, 2021
2 parents 5658398 + 72b474e commit ebdedc8
Show file tree
Hide file tree
Showing 14 changed files with 449 additions and 36 deletions.
7 changes: 4 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
sudo: required
dist: bionic
os: linux
dist: bionic # focal
language: python
before_install:
- sudo apt-get update --fix-missing
install:
- sudo apt-get install -y ffmpeg libopus-dev python-scipy python3-scipy
python:
- "2.7"
- "3.5"
- "3.6"
- "3.7"
- "3.8"
- "3.9"
- "pypy2"
- "pypy3"
script:
Expand Down
114 changes: 112 additions & 2 deletions API.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ Currently Undocumented:
- Signal Processing (compression, EQ, normalize, speed change - `pydub.effects`, `pydub.scipy_effects`)
- Signal generators (Sine, Square, Sawtooth, Whitenoise, etc - `pydub.generators`)
- Effect registration system (basically the `pydub.utils.register_pydub_effect` decorator)
- Silence utilities (detect silence, split on silence, etc - `pydub.silence`)


## AudioSegment()
Expand Down Expand Up @@ -91,14 +90,18 @@ The first argument is the path (as a string) of the file to read, **or** a file

**Supported keyword arguments**:

- `format` | example: `"aif"` | default: `"mp3"`
- `format` | example: `"aif"` | default: autodetected
Format of the output file. Supports `"wav"` and `"raw"` natively, requires ffmpeg for all other formats. `"raw"` files require 3 additional keyword arguments, `sample_width`, `frame_rate`, and `channels`, denoted below with: **`raw` only**. This extra info is required because raw audio files do not have headers to include this info in the file itself like wav files do.
- `sample_width` | example: `2`
**`raw` only** — Use `1` for 8-bit audio `2` for 16-bit (CD quality) and `4` for 32-bit. It’s the number of bytes per sample.
- `channels` | example: `1`
**`raw` only**`1` for mono, `2` for stereo.
- `frame_rate` | example: `2`
**`raw` only** — Also known as sample rate, common values are `44100` (44.1kHz - CD audio), and `48000` (48kHz - DVD audio)
- `start_second` | example: `2.0` | default: `None`
Offset (in seconds) to start loading the audio file. If `None`, the audio will start loading from the beginning.
- `duration` | example: `2.5` | default: `None`
Number of seconds to be loaded. If `None`, full audio will be loaded.


### AudioSegment(…).export()
Expand Down Expand Up @@ -553,6 +556,33 @@ shifted_samples_array = array.array(sound.array_type, shifted_samples)
new_sound = sound._spawn(shifted_samples_array)
```

Here's how to convert to a numpy float32 array:

```python
import numpy as np
from pydub import AudioSegment

sound = AudioSegment.from_file("sound1.wav")
sound = sound.set_frame_rate(16000)
channel_sounds = seg.split_to_mono()
samples = [s.get_array_of_samples() for s in channel_sounds]

fp_arr = np.array(samples).T.astype(np.float32)
fp_arr /= np.iinfo(samples[0].typecode).max
```

And how to convert it back to an AudioSegment:

```python
import io
import scipy.io.wavfile

wav_io = io.BytesIO()
scipy.io.wavfile.write(wav_io, 16000, fp_arr)
wav_io.seek(0)
sound = pydub.AudioSegment.from_wav(wav_io)
```

### AudioSegment(…).get_dc_offset()

Returns a value between -1.0 and 1.0 representing the DC offset of a channel. This is calculated using `audioop.avg()` and normalizing the result by samples max value.
Expand Down Expand Up @@ -581,3 +611,83 @@ Collection of DSP effects that are implemented by `AudioSegment` objects.
### AudioSegment(…).invert_phase()

Make a copy of this `AudioSegment` and inverts the phase of the signal. Can generate anti-phase waves for noise suppression or cancellation.

## Silence

Various functions for finding/manipulating silence in AudioSegments. For creating silent AudioSegments, see AudioSegment.silent().

### silence.detect_silence()

Returns a list of all silent sections [start, end] in milliseconds of audio_segment. Inverse of detect_nonsilent(). Can be very slow since it has to iterate over the whole segment.

```python
from pydub import AudioSegment, silence

print(silence.detect_silence(AudioSegment.silent(2000)))
# [[0, 2000]]
```

**Supported keyword arguments**:

- `min_silence_len` | example: `500` | default: 1000
The minimum length for silent sections in milliseconds. If it is greater than the length of the audio segment an empty list will be returned.

- `silence_thresh` | example: `-20` | default: -16
The upper bound for how quiet is silent in dBFS.

- `seek_step` | example: `5` | default: 1
Size of the step for checking for silence in milliseconds. Smaller is more precise. Must be a positive whole number.

### silence.detect_nonsilent()

Returns a list of all silent sections [start, end] in milliseconds of audio_segment. Inverse of detect_silence() and has all the same arguments. Can be very slow since it has to iterate over the whole segment.

**Supported keyword arguments**:

- `min_silence_len` | example: `500` | default: 1000
The minimum length for silent sections in milliseconds. If it is greater than the length of the audio segment an empty list will be returned.

- `silence_thresh` | example: `-20` | default: -16
The upper bound for how quiet is silent in dBFS.

- `seek_step` | example: `5` | default: 1
Size of the step for checking for silence in milliseconds. Smaller is more precise. Must be a positive whole number.

### silence.split_on_silence()

Returns list of audio segments from splitting audio_segment on silent sections.

**Supported keyword arguments**:

- `min_silence_len` | example: `500` | default: 1000
The minimum length for silent sections in milliseconds. If it is greater than the length of the audio segment an empty list will be returned.

- `silence_thresh` | example: `-20` | default: -16
The upper bound for how quiet is silent in dBFS.

- `seek_step` | example: `5` | default: 1
Size of the step for checking for silence in milliseconds. Smaller is more precise. Must be a positive whole number.

- `keep_silence` ~ example: True | default: 100
How much silence to keep in ms or a bool. leave some silence at the beginning and end of the chunks. Keeps the sound from sounding like it is abruptly cut off.
When the length of the silence is less than the keep_silence duration it is split evenly between the preceding and following non-silent segments.
If True is specified, all the silence is kept, if False none is kept.

### silence.detect_leading_silence()

Returns the millisecond/index that the leading silence ends. If there is no end it will return the length of the audio_segment.

```python
from pydub import AudioSegment, silence

print(silence.detect_silence(AudioSegment.silent(2000)))
# 2000
```

**Supported keyword arguments**:

- `silence_thresh` | example: `-20` | default: -50
The upper bound for how quiet is silent in dBFS.

- `chunk_size` | example: `5` | default: 10
Size of the step for checking for silence in milliseconds. Smaller is more precise. Must be a positive whole number.
12 changes: 12 additions & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -84,3 +84,15 @@ Carlos del Castillo

Yudong Sun
github: sunjerry019

Jorge Perianez
github: JPery

Chendi Luo
github: Creonalia

Daniel Lefevre
gitHub: dplefevre

Grzegorz Kotfis
github: gkotfis
12 changes: 11 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# on master
# v0.25.1
- Fix crashing bug in new scipy-powered EQ effects

# v0.25.0
- Don't show a runtime warning about the optional ffplay dependency being missing until someone trys to use it
- Documentation improvements
- Python 3.9 support
- Improved efficiency of loading wave files with `pydub.AudioSegment.from_file()`
- Ensure `pydub.AudioSegment().export()` always retuns files with a seek position at the beginning of the file
- Added more EQ effects to `pydub.scipy_effects` (requires scipy to be installed)
- Fix a packaging bug where the LICENSE file was not included in the source distribution
- Add a way to instantiate a `pydub.AudioSegment()` with a portion of an audio file via `pydub.AudioSegment().from_file()`

# v0.24.1
- Fix bug where ffmpeg errors in Python 3 are illegible
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include LICENSE
6 changes: 3 additions & 3 deletions README.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -210,12 +210,12 @@ Mac (using [homebrew](http://brew.sh)):

```bash
# libav
brew install libav --with-libvorbis --with-sdl --with-theora
brew install libav

#### OR #####

# ffmpeg
brew install ffmpeg --with-libvorbis --with-sdl2 --with-theora
brew install ffmpeg
```

Linux (using aptitude):
Expand Down Expand Up @@ -249,7 +249,7 @@ some of a number of potential codecs (see page 3 of the rfc) that can be used fo
encapsulated data.

When no codec is specified exporting to `ogg` will _default_ to using `vorbis`
as a convinence. That is:
as a convenience. That is:

```python
from pydub import AudioSegment
Expand Down
2 changes: 1 addition & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ install:
- "%PYTHON%/python.exe -m pip install wheel"
- "%PYTHON%/python.exe -m pip install -e ."
# Install ffmpeg
- ps: Start-FileDownload ('https://ffmpeg.zeranoe.com/builds/win64/shared/ffmpeg-' + $env:FFMPEG + '-win64-shared.zip') ffmpeg-shared.zip
- ps: Start-FileDownload ('https://github.com/advancedfx/ffmpeg.zeranoe.com-builds-mirror/releases/download/20200915/ffmpeg-' + $env:FFMPEG + '-win64-shared.zip') ffmpeg-shared.zip
- 7z x ffmpeg-shared.zip > NULL
- "SET PATH=%cd%\\ffmpeg-%FFMPEG%-win64-shared\\bin;%PATH%"
# check ffmpeg installation (also shows version)
Expand Down
85 changes: 72 additions & 13 deletions pydub/audio_segment.py
Original file line number Diff line number Diff line change
Expand Up @@ -503,7 +503,7 @@ def from_mono_audiosegments(cls, *mono_segments):
)

@classmethod
def from_file_using_temporary_files(cls, file, format=None, codec=None, parameters=None, **kwargs):
def from_file_using_temporary_files(cls, file, format=None, codec=None, parameters=None, start_second=None, duration=None, **kwargs):
orig_file = file
file, close_file = _fd_or_path_or_tempfile(file, 'rb', tempfile=False)

Expand All @@ -526,7 +526,14 @@ def is_format(f):
obj = cls._from_safe_wav(file)
if close_file:
file.close()
return obj
if start_second is None and duration is None:
return obj
elif start_second is not None and duration is None:
return obj[start_second*1000:]
elif start_second is None and duration is not None:
return obj[:duration*1000]
else:
return obj[start_second*1000:(start_second+duration)*1000]
except:
file.seek(0)
elif is_format("raw") or is_format("pcm"):
Expand All @@ -542,7 +549,14 @@ def is_format(f):
obj = cls(data=file.read(), metadata=metadata)
if close_file:
file.close()
return obj
if start_second is None and duration is None:
return obj
elif start_second is not None and duration is None:
return obj[start_second * 1000:]
elif start_second is None and duration is not None:
return obj[:duration * 1000]
else:
return obj[start_second * 1000:(start_second + duration) * 1000]

input_file = NamedTemporaryFile(mode='wb', delete=False)
try:
Expand Down Expand Up @@ -581,10 +595,17 @@ def is_format(f):
conversion_command += [
"-i", input_file.name, # input_file options (filename last)
"-vn", # Drop any video streams if there are any
"-f", "wav", # output options (filename last)
output.name
"-f", "wav" # output options (filename last)
]

if start_second is not None:
conversion_command += ["-ss", str(start_second)]

if duration is not None:
conversion_command += ["-t", str(duration)]

conversion_command += [output.name]

if parameters is not None:
# extend arguments with arbitrary set
conversion_command.extend(parameters)
Expand All @@ -610,10 +631,18 @@ def is_format(f):
os.unlink(input_file.name)
os.unlink(output.name)

return obj
if start_second is None and duration is None:
return obj
elif start_second is not None and duration is None:
return obj[0:]
elif start_second is None and duration is not None:
return obj[:duration * 1000]
else:
return obj[0:duration * 1000]


@classmethod
def from_file(cls, file, format=None, codec=None, parameters=None, **kwargs):
def from_file(cls, file, format=None, codec=None, parameters=None, start_second=None, duration=None, **kwargs):
orig_file = file
try:
filename = fsdecode(file)
Expand All @@ -637,7 +666,14 @@ def is_format(f):

if is_format("wav"):
try:
return cls._from_safe_wav(file)
if start_second is None and duration is None:
return cls._from_safe_wav(file)
elif start_second is not None and duration is None:
return cls._from_safe_wav(file)[start_second*1000:]
elif start_second is None and duration is not None:
return cls._from_safe_wav(file)[:duration*1000]
else:
return cls._from_safe_wav(file)[start_second*1000:(start_second+duration)*1000]
except:
file.seek(0)
elif is_format("raw") or is_format("pcm"):
Expand All @@ -650,7 +686,14 @@ def is_format(f):
'channels': channels,
'frame_width': channels * sample_width
}
return cls(data=file.read(), metadata=metadata)
if start_second is None and duration is None:
return cls(data=file.read(), metadata=metadata)
elif start_second is not None and duration is None:
return cls(data=file.read(), metadata=metadata)[start_second*1000:]
elif start_second is None and duration is not None:
return cls(data=file.read(), metadata=metadata)[:duration*1000]
else:
return cls(data=file.read(), metadata=metadata)[start_second*1000:(start_second+duration)*1000]

conversion_command = [cls.converter,
'-y', # always overwrite existing files
Expand Down Expand Up @@ -703,10 +746,17 @@ def is_format(f):

conversion_command += [
"-vn", # Drop any video streams if there are any
"-f", "wav", # output options (filename last)
"-"
"-f", "wav" # output options (filename last)
]

if start_second is not None:
conversion_command += ["-ss", str(start_second)]

if duration is not None:
conversion_command += ["-t", str(duration)]

conversion_command += ["-"]

if parameters is not None:
# extend arguments with arbitrary set
conversion_command.extend(parameters)
Expand All @@ -726,12 +776,20 @@ def is_format(f):

p_out = bytearray(p_out)
fix_wav_headers(p_out)
obj = cls._from_safe_wav(BytesIO(p_out))
p_out = bytes(p_out)
obj = cls(p_out)

if close_file:
file.close()

return obj
if start_second is None and duration is None:
return obj
elif start_second is not None and duration is None:
return obj[0:]
elif start_second is None and duration is not None:
return obj[:duration * 1000]
else:
return obj[0:duration * 1000]

@classmethod
def from_mp3(cls, file, parameters=None):
Expand Down Expand Up @@ -839,6 +897,7 @@ def export(self, out_f=None, format='mp3', codec=None, bitrate=None, parameters=

# for easy wav files, we're done (wav data is written directly to out_f)
if easy_wav:
out_f.seek(0)
return out_f

output = NamedTemporaryFile(mode="w+b", delete=False)
Expand Down
Loading

0 comments on commit ebdedc8

Please sign in to comment.