-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text to Speech Support #755
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
The comments I've added are mostly "food for thought"
pyproject.toml
Outdated
@@ -57,6 +57,7 @@ pandas = {version = "^1.3", optional = true} | |||
pypdf = {version = "^3.9", optional = true} | |||
pillow = {version = "^10.2.0", optional = true} | |||
mail-parser = {version = "^3.15.0", optional = true} | |||
elevenlabs = "^1.1.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an optional dependency?
griptape/utils/play_audio.py
Outdated
def play_audio(artifact: AudioArtifact) -> AudioArtifact: | ||
elevenlabs.play(artifact.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it at all matter what format the AudioArtifact.value
is? (Or are we ok with relying on elevenlabs
to throw a runtime error?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realistically, we shouldn't rely on the Eleven Labs SDK to play audio, that's just convenience for demo purposes and this should be reworked before approval/merge. We should expect to receive audio data in common enough formats that we should be able to play it with common Python/OS utilities.
class ImageArtifactFileOutputMixin: | ||
class MediaArtifactFileOutputMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this mixin is just for make it easier to writes bytes to a file, then why not generalize all the way to BlobArtifactFileOutputMixin
? (Or a FileOutputMixin
that takes a bytes
in the write method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want to accept some sort of artifact here because we fall back to the artifact name as output filename if one isn't provided (if output_dir
is set and we might expect multiple artifacts to end up there). Agreed that there's no reason to limit ourselves to MediaArtifacts
, though.
Nice work but...docs. There's no escaping them now 😄 |
49295bb
to
29cd347
Compare
29cd347
to
41f8f0c
Compare
metadata={"serializable": True}, | ||
) | ||
voice: str = field(kw_only=True, metadata={"serializable": True}) | ||
output_format: str = field(default="mp3_44100_128", kw_only=True, metadata={"serializable": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe move this default to a top level constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what the guideline is for inline defaults vs top-level constants. seems like its done both ways.
Introduces support for Text to Speech workloads. For example:
todos:
TextToSpeechClient
implementationTextToSpeechClient
documentationMediaArtifactFileOutputMixin