Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SI-SDR #3

Open
faroit opened this issue May 21, 2019 · 12 comments
Open

Add SI-SDR #3

faroit opened this issue May 21, 2019 · 12 comments
Assignees

Comments

@faroit
Copy link
Member

faroit commented May 21, 2019

Add scale invariant SDR metric. See https://arxiv.org/abs/1811.02508

@faroit
Copy link
Member Author

faroit commented May 22, 2019

@Jonathan-LeRoux would you be able to contribute Python code or could actively do a code review if we'd need to figure out the code from the paper ourself?

@Jonathan-LeRoux
Copy link
Collaborator

Jonathan-LeRoux commented May 22, 2019

I'm happy to help in the most efficient way. I could try to insert our code into the framework, but I'm thinking that, given the simplicity of SI-SDR, giving you our code and discussing how to insert it may be faster and also less risky in terms of introducing bugs.
SI-SDR basically involves slightly modified (and much simpler) versions of _bss_decomp_mtifilt and _bss_crit. The main function is this one, in which the decomposition and the criterion computation are merged:

def compute_measures(estimated_signal, reference_signals, j, scaling=True):
    Rss= np.dot(reference_signals.transpose(), reference_signals)
    this_s= reference_signals[:,j]

    if scaling:
        # get the scaling factor for clean sources
        a= np.dot( this_s, estimated_signal) / Rss[j,j]
    else:
        a= 1

    e_true= a * this_s
    e_res= estimated_signal - e_true

    Sss= (e_true**2).sum()
    Snn= (e_res**2).sum()

    SDR= 10 * math.log10(Sss/Snn)
    
    # Get the SIR
    Rsr= np.dot(reference_signals.transpose(), e_res)
    b= np.linalg.solve(Rss, Rsr)

    e_interf= np.dot(reference_signals , b)
    e_artif= e_res - e_interf
    
    SIR= 10 * math.log10(Sss / (e_interf**2).sum())
    SAR= 10 * math.log10(Sss / (e_artif**2).sum())

    return SDR, SIR,SAR

I'm hoping that this could be easily inserted within a function that figures out the best permutation.

@faroit
Copy link
Member Author

faroit commented May 24, 2019

thanks. Thats impressively simple. Will add this as soon as the refactoring is done

@pseeth
Copy link

pseeth commented May 30, 2019

Happy to give this a shot once the refactor is done! Putting together a PyTorch one should be easy as well. How does this work for multichannel? Do each channel separately and then take the mean? Or report the numbers separately? I don't remember an equivalent of ISR for this measure, also, so that might change the return spec between bsseval and si-sdr.

@pseeth
Copy link

pseeth commented Jun 10, 2019

I refactored the SI-SDR code for nussl. Once bsseval is mature, I'll strip it from nussl and have nussl depend on bsseval instead. Putting my code here in case it's helpful!

from . import EvaluationBase
from itertools import permutations
import numpy as np

class ScaleInvariantSDR(EvaluationBase):
    def __init__(self, true_sources_list, estimated_sources_list, 
                 compute_permutation=False, source_labels=None, scaling=True):
        self.true_sources_list = true_sources_list
        self.estimated_sources_list = estimated_sources_list
        self.compute_permutation = compute_permutation
        self.scaling = scaling
        
        if source_labels is None:
            source_labels = []
            for i, x in enumerate(self.true_sources_list):
                if x.path_to_input_file:
                    label = x.path_to_input_file
                else:
                    label = f'source_{i}'
                source_labels.append(label)
        self.source_labels = source_labels
        self.reference_array, self.estimated_array = self._preprocess_sources()
        
    def evaluate(self):
        num_sources = self.reference_array.shape[-1]
        num_channels = self.reference_array.shape[1]
        orderings = (
            list(permutations(range(num_sources))) 
            if self.compute_permutation 
            else [list(range(num_sources))]
        )
        results = np.empty((len(orderings), num_channels, num_sources, 3))
        
        for o, order in enumerate(orderings):
            for c in range(num_channels):
                for j in order:
                    SDR, SIR, SAR = self._compute_sdr(
                        self.estimated_array[:, c, j], self.reference_array[:, c, order], j, scaling=self.scaling
                    )
                    results[o, c, j, :] = [SDR, SIR, SAR]
        return self._populate_scores_dict(results, orderings)
    
    def _populate_scores_dict(self, results, orderings):
        best_permutation_by_sdr = np.argmax(results[:, :, :, 0].mean(axis=1).mean(axis=-1))
        results = results[best_permutation_by_sdr]
        best_permutation = orderings[best_permutation_by_sdr]
        scores = {'permutation': list(best_permutation)}
        for j in best_permutation:
            label = self.source_labels[j]
            scores[label] = {
                metric: results[:, j, m].tolist()
                for m, metric in enumerate(['SDR', 'SIR', 'SAR'])
            }
        return scores
    
    @staticmethod
    def _compute_sdr(estimated_signal, reference_signals, source_idx, scaling=True):
        references_projection = reference_signals.T @ reference_signals
        source = reference_signals[:, source_idx]
        scale = (source @ estimated_signal) / references_projection[source_idx, source_idx] if scaling else 1

        e_true = scale * source
        e_res = estimated_signal - e_true

        signal = (e_true ** 2).sum()
        noise = (e_res ** 2).sum()
        SDR = 10 * np.log10(signal / noise)
        
        references_onto_residual = np.dot(reference_signals.transpose(), e_res)
        b = np.linalg.solve(references_projection, references_onto_residual)

        e_interf = np.dot(reference_signals , b)
        e_artif = e_res - e_interf

        SIR = 10 * np.log10(signal / (e_interf**2).sum())
        SAR = 10 * np.log10(signal / (e_artif**2).sum())
        return SDR, SIR, SAR
    
    def _preprocess_sources(self):
        """
        Prepare the :ref:`audio_data` in the sources. Uses the format:
            (num_samples, num_channels, num_sources)
        Returns:
            (:obj:`np.ndarray`, :obj:`np.ndarray`) reference_source_array, estimated_source_array

        """
        reference_source_array = np.stack([np.copy(x.audio_data.T)
                                            for x in self.true_sources_list], axis=2)
        estimated_source_array = np.stack([np.copy(x.audio_data.T)
                                            for x in self.estimated_sources_list], axis=2)
        reference_source_array -= reference_source_array.mean(axis=0)
        estimated_source_array -= estimated_source_array.mean(axis=0)
        
        return reference_source_array, estimated_source_array

@mpariente
Copy link

Sorry to barge in on this issue but what are the news regarding SI-SDR?
I'll need to rewrite my evaluation scripts soon and between speechmetrics, bsseval, mireval and others, the choice is quite hard ;-)

@j-paulus
Copy link

This is maybe more an algorithmic question to Jonathan regarding the computation than plain BSSEval, but since the reference implementation is shown here I'm using the opportunity:

The way I'm understanding the algorithm, the computation of the full SI-BSSEval uses first the greedy target scaling (for SI-SDR) and after that MMSE assignment to all references (for SI-SIR and SI-SAR). If the reference sources are uncorrelated, then the j'th entry in the mixing array b should be 0, since the maximal contribution was used in the SI-SDR computation already and the residual should be orthogonal to the j'th reference component. If the reference sources are not fully uncorrelated, the j'th element of b will be non-zero, and some of the j'th source may be assigned to the interferer. If the computation of b would be done without the initial greedy subtraction of the target component, this would be identical to scaling-only BSSEval projection (v2.1).

Assuming the above is correct:

  • The main computational efficiency benefit (over BSSEval v2.1) is if we use only SI-SDR and omit the other two measures (i.e., omitting the MMSE-fit).
  • If we compute all three measures, the main difference to BSSEval v2.1 is in the definition of the components in SIR and SAR, and the use of 2-step decomposition instead of 1-step MMSE.
  • If we compute all three measures and the reference components have correlations, the measures from this greedy + MMSE may deviate slightly from the results of MMSE-only. These differences are quite likely negligible in practical use.

No real question, but trying to check if my understanding of the algorithm is correct.

@Jonathan-LeRoux
Copy link
Collaborator

I think you're understanding is correct. SI-SDR implicitly assumes that the references are orthogonal to each other (i.e., uncorrelated). In most cases, the diagonal terms stemming from the correlation will be negligible compared to the norms of the references, so the results will be almost the same.

@jvel07
Copy link

jvel07 commented Dec 13, 2022

I'm happy to help in the most efficient way. I could try to insert our code into the framework, but I'm thinking that, given the simplicity of SI-SDR, giving you our code and discussing how to insert it may be faster and also less risky in terms of introducing bugs. SI-SDR basically involves slightly modified (and much simpler) versions of _bss_decomp_mtifilt and _bss_crit. The main function is this one, in which the decomposition and the criterion computation are merged:

def compute_measures(estimated_signal, reference_signals, j, scaling=True):
    Rss= np.dot(reference_signals.transpose(), reference_signals)
    this_s= reference_signals[:,j]

    if scaling:
        # get the scaling factor for clean sources
        a= np.dot( this_s, estimated_signal) / Rss[j,j]
    else:
        a= 1

    e_true= a * this_s
    e_res= estimated_signal - e_true

    Sss= (e_true**2).sum()
    Snn= (e_res**2).sum()

    SDR= 10 * math.log10(Sss/Snn)
    
    # Get the SIR
    Rsr= np.dot(reference_signals.transpose(), e_res)
    b= np.linalg.solve(Rss, Rsr)

    e_interf= np.dot(reference_signals , b)
    e_artif= e_res - e_interf
    
    SIR= 10 * math.log10(Sss / (e_interf**2).sum())
    SAR= 10 * math.log10(Sss / (e_artif**2).sum())

    return SDR, SIR,SAR

I'm hoping that this could be easily inserted within a function that figures out the best permutation.

Thanks for the code, @Jonathan-LeRoux !
As the function returns SRD, SIR, and SAR, how do you actually compute SI-SDR or SI-SDRi from there? Also, what is the role of j?

@Jonathan-LeRoux
Copy link
Collaborator

The code says SDR, SIR, SAR, but it's really SI-SDR, SI-SIR, SI-SAR. SI-SDRi can be computed using the difference between the SI-SDR obtained with the estimate and that obtained with the original mixture as estimate.

@jvel07
Copy link

jvel07 commented Dec 14, 2022

I see, thanks for the reply @Jonathan-LeRoux :) What are the roles of the arguments j and scaling? Do scaling really impacts the metrics?

@Jonathan-LeRoux
Copy link
Collaborator

j determines which reference source to compare with, and scaling switches between SNR and SI-SDR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants