Skip to content

Filter banks

Tim Sharii edited this page Aug 31, 2021 · 8 revisions

Static class FilterBanks provides methods for obtaining the most widely used frequency bands as well as the general shapes of frequency responses (weights) of filters that can be used in various filter bank systems.

Frequency bands

The frequency band is essentially the tuple of three values (double, double, double) (left frequency, center frequency, right frequency of the band). Center frequencies may be distributed uniformly according to one of the following widely used and well-known scales:

  • Herz bands
  • Mel bands
  • Mel bands according to Malcolm Slaney
  • Bark bands (version 1)
  • Bark bands (version 2)
  • Critical bands
  • Octaves (as specified in MPEG-7 standard)

For each scale there's corresponding static method (e.g. MelBands, OctaveBands, etc.).

Assumed size of FFT and sampling rate must be given (since methods need to know the spectral frequency resolution in Hz).

Example:

// 12 overlapping mel bands in frequency range [0, 4000] Hz
var melBands1 = FilterBanks.MelBands(12, sampRate, 0, 4000, true);
var melBands2 = FilterBanks.MelBandsSlaney(12, sampRate, 0, 4000, true);

// 16 non-overlapping bark bands in frequency range [200, 4200] Hz
var barkBands1 = FilterBanks.BarkBands(16, sampRate, 200, 4200, false);
var barkBands2 = FilterBanks.BarkBandsSlaney(16, sampRate, 200, 4200, false);

// 10 non-overlapping critical bands in frequency range [1000, 5000] Hz
var criticalBands = FilterBanks.CriticalBands(10, sampRate, 1000, 5000);

// 6 non-overlapping bark bands in frequency range [100, samplingRate/2] Hz
var octaveBands = FilterBanks.OctaveBands(6, sampRate, 100, overlap: false);

// 3 custom bands - [100,500] Hz, [500, 1500] Hz, [1500, 3500] Hz:
var customBands = new [] 
{
    (100,  300,  500),
    (500,  1000, 1500),
    (1500, 2500, 3500)
};

Note. CriticalBands method ignores last parameter (overlap flag). Critical bands are non-overlapping. Actually, all critical band frequencies are pre-computed:

double[] centerFrequencies = 
{ 
    50, 150, 250, 350, 450, 570, 700, 840, 1000, 1170, 1370,
    //...
};

This method also will ignore the parameter filterCount if it's greater than the number of actual bands in the given frequency range. The same goes for OctaveBands() method.

By default low frequency is set to 0 and high frequency is set to samplingRate/2, except for OctaveBands(), where low frequency is 62.5 Hz by default.

Finally, let's take a look at variations of Herz/Mel and Herz/Bark mappings.

Mel frequencies are computed from Herz frequencies by formula:

double HerzToMel(double herz) => 1127 * Math.Log(herz / 700 + 1);

Malcolm Slaney suggested another algorithm described here.

Bark frequencies are computed from Herz frequencies by formula [Traunmüller, 1990]:

double HerzToBark(double herz) => (26.81 * herz) / (1960 + herz) - 0.53;

In Slaney's filterbank another formula is used [Wang et al., 1992]:

double HerzToBarkSlaney(double herz) => 6 * MathUtils.Asinh(herz / 600);

Shapes

The above-mentioned methods return the array of frequency band tuples (left, center, right). This array itself is not very useful, but it can be passed as a parameter to one of the methods that generate the frequency response of a particular shape:

  • Triangular
  • Rectangular
  • Trapezoidal (slightly overlapping frequency responses of FIR-bandpass filters)
  • Overlapping frequency responses of BiQuad-bandpass filters
  • ERB
var filterbank1 = FilterBanks.Triangular(fftSize, samplingRate, melBands);
var filterbank2 = FilterBanks.Trapezoidal(fftSize, samplingRate, barkBands);
var filterbank3 = FilterBanks.BiQuad(fftSize, samplingRate, criticalBands);
var filterbank4 = FilterBanks.Rectangular(fftSize, samplingRate, octaveBands);

There are actually two additional optional parameters in each of the methods above:

  • VtlnWarper object
  • Frequency scaler (function)

Implementation of VTLN (Vocal Tract Length Normalization) is similar to Kaldi. Read more about it here.

Although filterbank center frequencies are computed on the scale specified by user (e.g., Mel), the filterbank weights, by default, are computed on Herz scale. This is how Librosa works, for example. Unlike Librosa, Kaldi and HTK frameworks compute weights on Mel scale, so the final weights will be slightly different (quite close, though). You can specify Func<double, double> (frequency scaler / mapper) to map frequency in Hz to any other value during weighting.

For most widely used conversions there are ready-to-use mappers in NWaves.Util.Scale static class:

  • Scale.HerzToMel
  • Scale.HerzToBark
  • Scale.HerzToMelSlaney
  • Scale.HerzToBarkSlaney
  • Scale.HerzToErb
  • Scale.FreqToPitch
// Kaldi, HTK:
var filterbank = FilterBanks.Triangular(fftSize, samplingRate, melBands, null, Scale.HerzToMel);

// Librosa:
var filterbank = FilterBanks.Triangular(fftSize, samplingRate, melBands);


// VTLN warp factor = 0.85:
var vtln = new VtlnWarper(0.85, 0, 200, 7200, 8000);

// Kaldi, HTK with VTLN:
var filterbank = FilterBanks.Triangular(fftSize, samplingRate, melBands, vtln, Scale.HerzToMel);

There are also filter banks that do not follow the general scheme described above:

  • Gammatone (ERB)
  • Slaney mel filters
  • Slaney bark filters
  • Chroma filters

For these filter banks there are no methods for generating frequency bands. Instead, all frequency responses are generated at once like this:

var filterbank = FilterBanks.Erb(filterCount, fftSize, samplingRate, lowFreq, highFreq);

Gammatone filterbank is calculated as described here:

Malcolm Slaney (1998) "Auditory Toolbox Version 2", Technical Report #1998-010, Interval Research Corporation, 1998

Similarly, there are methods returning Slaney's mel filterbank and bark filterbank in one line:

var melbank = FilterBanks.MelBankSlaney(filterCount, fftSize, samplingRate, lowFreq, highFreq);
var barkbank = FilterBanks.BarkBankSlaney(filterCount, fftSize, samplingRate, lowFreq, highFreq);

And chroma banks can be obtained like this:

var chromabank = Chroma(fftSize, samplingRate, chromaCount, tuning, centerOctave, octaveWidth, norm, baseC);

The variety of parameters is mostly for compliance with librosa:

var chromabank = Chroma(fftSize, samplingRate);

// chromaCount = 12;
// tuning = 0.0;
// centerOctave = 5.0;
// octaveWidth = 2;
// norm = 2;
// baseC = true;

Applying filter banks

Let's see how previously evaluated filter banks can be applied to spectra and spectrograms.

Apply() method calculates total spectral energy in each frequency band and fills the corresponding array (3rd parameter):

float[] bandEnergies = new float [filterCount];

Filterbanks.Apply(filterbank, spectrum, bandEnergies);

There are also few similar methods that post-process filtered spectra:

Filterbanks.ApplyAndLog(filterbank, spectrum, bandEnergies);
Filterbanks.ApplyAndLog10(filterbank, spectrum, bandEnergies);
Filterbanks.ApplyAndToDecibel(filterbank, spectrum, bandEnergies);
Filterbanks.ApplyAndPow(filterbank, spectrum, bandEnergies, 0.33);

These methods are used in MFCC-like feature extractors and you can choose what post-processing scheme should be applied in particular case. Read more about it here.

Finally, there's overloaded method Apply() for spectrograms (collections of spectra). Spectrogram is of any type IList<float[]>.

var bandSpectrogram = Filterbanks.Apply(filterbank, spectrum, spectrogram);

Note, in all cases spectrum must have length fftSize/2 + 1.

Clone this wiki locally