-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this system can run inference only with audio prompt and text input? #121
Comments
If you want to inference with a target text but without an audio prompt, the answer is yes. In this case the model will generate speech of a random speaker and style. |
Is this training system is identical to original paper? If not, I want to know the reason of making different in training AR system. |
yes, as far as I know it is identical to the official paper.
…---Original---
From: ***@***.***>
Date: Sun, May 14, 2023 15:45 PM
To: ***@***.***>;
Cc: ***@***.***>;"State ***@***.***>;
Subject: Re: [lifeiteng/vall-e] Is this system can run inference only withaudio prompt and text input? (Issue #121)
Is this training system is identical to original paper? If not, I want to know the reason of making different in training AR system.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you modified the open/close state.Message ID: ***@***.***>
|
Then this system should also works with (text input & audio prompt) I think, which MS has applied and made demo wave files in their official site. I know that the structure is identical. |
First, let's clarify the terms, by "text input" do you mean the target text only, which is "the phoneme sequence for This is what the paper says at chapter 4.2.1, take a look:
However, I believe it only takes a few lines of code change to satisfy your textless prompt requirements. |
Hi, since I get the reasonable result in LibriTTS dataset, I want this model works in zero-shot task.
In inference code, I think it needs both text prompt and audio prompt when giving prompts.
Could it works in only audio prompt and text input?
The text was updated successfully, but these errors were encountered: