An explanation for the source code of finding the alignment path in GlowTTS? #72

alifarrokh · 2022-09-12T03:57:06Z

Hi. I'm reading the source code of GlowTTS model for educational purposes. One of the sections that I can't really understand is where we try to find the alignment path using Monotonic Alignment Search in the training phase. Could anyone please explain me the following lines of code?

with torch.no_grad():
    x_s_sq_r = torch.exp(-2 * x_logs)
    logp1 = torch.sum(-0.5 * math.log(2 * math.pi) - x_logs, [1]).unsqueeze(-1) # [b, t, 1]
    logp2 = torch.matmul(x_s_sq_r.transpose(1,2), -0.5 * (z ** 2)) # [b, t, d] x [b, d, t'] = [b, t, t']
    logp3 = torch.matmul((x_m * x_s_sq_r).transpose(1,2), z) # [b, t, d] x [b, d, t'] = [b, t, t']
    logp4 = torch.sum(-0.5 * (x_m ** 2) * x_s_sq_r, [1]).unsqueeze(-1) # [b, t, 1]
    logp = logp1 + logp2 + logp3 + logp4 # [b, t, t']
attn = monotonic_align.maximum_path(logp, attn_mask.squeeze(1)).unsqueeze(1).detach()

Thanks in advance.

shivammehta25 · 2023-01-17T11:21:35Z

Hi! I am assuming this late you might not need this! But I am still writing it for the future in case someone else also encounters this!

    x_s_sq_r = torch.exp(-2 * x_logs)
    logp1 = torch.sum(-0.5 * math.log(2 * math.pi) - x_logs, [1]).unsqueeze(-1) # [b, t, 1]
    logp2 = torch.matmul(x_s_sq_r.transpose(1,2), -0.5 * (z ** 2)) # [b, t, d] x [b, d, t'] = [b, t, t']
    logp3 = torch.matmul((x_m * x_s_sq_r).transpose(1,2), z) # [b, t, d] x [b, d, t'] = [b, t, t']
    logp4 = torch.sum(-0.5 * (x_m ** 2) * x_s_sq_r, [1]).unsqueeze(-1) # [b, t, 1]
    logp = logp1 + logp2 + logp3 + logp4 # [b, t, t']

It is the log-likelihood computation from a gaussian centred at (x_m, x_logs).

And in

attn = monotonic_align.maximum_path(logp, attn_mask.squeeze(1)).unsqueeze(1).detach()

They find a Viterbi approximation (using dynamic programming) over the data likelihood to maximise it further.

Hope this helps!

xiaozhah · 2024-04-09T03:30:45Z

For a Gaussian distribution $\mathcal{N}(\mu, \sigma^2)$, the probability density function is:

$$f(x; \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$

Taking the logarithm, we get:

$$\log f(x; \mu, \sigma) = -\frac{1}{2}\log(2\pi\sigma^2) - \frac{(x-\mu)^2}{2\sigma^2}$$

Now, let's see how each term in the code corresponds to the above formula:

x_logs corresponds to $\log\sigma$, thus x_s_sq_r = torch.exp(-2 * x_logs) corresponds to $\frac{1}{\sigma^2}$.
logp1 = torch.sum(-0.5 * math.log(2 * math.pi) - x_logs, [1]).unsqueeze(-1) # [b, t, 1]:
This term corresponds to $-\frac{1}{2}\log(2\pi\sigma^2)$, which is the logarithm of the normalization constant of the Gaussian distribution.
logp2 = torch.matmul(x_s_sq_r.transpose(1,2), -0.5 * (z ** 2)) # [b, t, d] x [b, d, t'] = [b, t, t']:
This term corresponds to $-\frac{x^2}{2\sigma^2}$ in $-\frac{(x-\mu)^2}{2\sigma^2}$.
logp3 = torch.matmul((x_m * x_s_sq_r).transpose(1,2), z) # [b, t, d] x [b, d, t'] = [b, t, t']:
This term corresponds to $\frac{x\mu}{\sigma^2}$ in $-\frac{(x-\mu)^2}{2\sigma^2}$.
logp4 = torch.sum(-0.5 * (x_m ** 2) * x_s_sq_r, [1]).unsqueeze(-1) # [b, t, 1]:
This term corresponds to $-\frac{\mu^2}{2\sigma^2}$ in $-\frac{(x-\mu)^2}{2\sigma^2}$.

Adding these four terms together, we get:
logp = logp1 + logp2 + logp3 + logp4 # [b, t, t'] corresponds to $\log f(x; \mu, \sigma)$.

shivammehta25 mentioned this issue Feb 15, 2024

Could you explain transformations before applying MAS? shivammehta25/Matcha-TTS#55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An explanation for the source code of finding the alignment path in GlowTTS? #72

An explanation for the source code of finding the alignment path in GlowTTS? #72

alifarrokh commented Sep 12, 2022

shivammehta25 commented Jan 17, 2023 •

edited

Loading

xiaozhah commented Apr 9, 2024 •

edited

Loading

An explanation for the source code of finding the alignment path in GlowTTS? #72

An explanation for the source code of finding the alignment path in GlowTTS? #72

Comments

alifarrokh commented Sep 12, 2022

shivammehta25 commented Jan 17, 2023 • edited Loading

xiaozhah commented Apr 9, 2024 • edited Loading

shivammehta25 commented Jan 17, 2023 •

edited

Loading

xiaozhah commented Apr 9, 2024 •

edited

Loading