Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for additional TTS integrations through non-Microsoft focused SpeechService interface #2379

Open
druggedhippo opened this issue Aug 19, 2022 · 3 comments
Labels
9. enhancement The behaviour is as specified, but we would like to modify or extend the spec.

Comments

@druggedhippo
Copy link

EDDI currently uses whatever built in Windows TTS system is installed. Unfortunately, the built in Windows TTS are not particularly good.

This feature request is to ask for a better more modular SpeechService class that allows other speech engines to "plugin" that do not rely on the Windows TTS interfaces and provide the same WAV stream as the existing class uses.

Examples of other engines could include (but are not limited to):

As a proof of concept, here is an Amazon polly implementation I created.

https://gist.github.com/druggedhippo/0a887973ee019dea1fc9e522f513b0f5

Example audio of Amazon Polly processing a EDDI TTS prompt in real-time:

https://imgur.com/zyoWmQg

@Tkael Tkael added the 9. enhancement The behaviour is as specified, but we would like to modify or extend the spec. label Aug 19, 2022
@Tkael
Copy link
Member

Tkael commented Oct 3, 2022

Thank you for this. 😀

As you have effectively demonstrated, it is indeed possible to add additional speech synthesizers to EDDI, including for voices sourced from various cloud development environments (Azure, AWS, etc.).

These cloud voices typically require the user to provide specific credentials and are limited in some way (either as timed trials or offering to render a limited number of words for free each month).

We're happy to support additional voices in EDDI but it is also important to note that voices from different sources do not always behave alike (in terms of SSML support, lexicons, etc).

We would need to do some additional work to document the new capability and help users enter their credentials for accessing the voice. Some UI changes to allow capturing credentials in EDDI would probably also be very welcome.

@Tkael
Copy link
Member

Tkael commented Nov 18, 2022

@Tkael
Copy link
Member

Tkael commented May 15, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
9. enhancement The behaviour is as specified, but we would like to modify or extend the spec.
Projects
None yet
Development

No branches or pull requests

2 participants