Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: add hugging face text-to-speech inference API #18880

Merged
merged 9 commits into from
Mar 29, 2024

Conversation

h0rv
Copy link
Contributor

@h0rv h0rv commented Mar 10, 2024

Description: I implemented a tool to use Hugging Face text-to-speech inference API.

Issue: n/a

Dependencies: n/a

Twitter handle: No Twitter, but do have LinkedIn lol.

Copy link

vercel bot commented Mar 10, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Mar 29, 2024 3:00pm

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. Ɑ: Runnables Related to Runnables 🔌: huggingface Primarily related to HuggingFace integrations 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Mar 10, 2024
@h0rv h0rv force-pushed the huggingface-tts branch 2 times, most recently from 82860ef to 5cac9ec Compare March 11, 2024 02:54
model: str
api_url: str
huggingface_api_key: SecretStr
format: HuggingFaceSupportedAudioFormat
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid using the enum here instead use Literal and do a run-time check, it saves user the trouble of having to import enums in their code.

format = Literal['wave']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here: d49cf82

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just turned it into a string. The model I was exploring supported only wav, but I believe others support other formats, but wasn't able to find anything in the huggingface docs. So I think it would be easier to let the user specify rather than trying to keep up with supported formats.

output_name = (
f"{output_name or self._default_output_name()}.{self.format.value}"
)
output_path = os.path.join(self.output_dir, output_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic forces the user to reinstantiate the tool every time. I think it would be better to have an option to generate an output_name that's a uuid.uuid4. what do you think about making that the default? Users would be allowed to over-ride to specify a name if they really want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be good for the user to instantiate the tool with a directory, so it doesn't give flood the directory where its called from, and then output everything there, this way they can define either a tmp directory they will throw out or a place they want to save it.

I am using a timestamp rather than a UUID if the user does not specify a specific filename. I will switch it to a UUID though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah I will remove the output_dir field and the user can handle that logic by optionally passing in a file name.

@h0rv h0rv force-pushed the huggingface-tts branch 2 times, most recently from 0287697 to 91701d7 Compare March 14, 2024 13:18
@h0rv
Copy link
Contributor Author

h0rv commented Mar 14, 2024

This is ready for another review

def _run(
self,
query: str,
output_base_name: Optional[str] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_base_name should not be parameter for the the tool input -- without proper validation this exposes a security risk and allows an LLM or a malicious user using the LLM to write content anywhere on the file system.

It's OK to specify a containing folder as part of the initializer of the tool. (e.g., directory)

HuggingFaceTextToSpeechModelInference(
   destination_dir='...'
)

We can also add some additional configuration in the initializer that user to specify what file names are chosen (e.g., timestamp or UUID etc) -- if we want to parameterize this aspect of file naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added destination_dir and added functionality to name the files using uuid4 or timestamps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default will just be uuid4 for as well. Additionally, I made it so each run call creates the destination directory if it does not exist.

Is that the right approach, or create it on tool initialization? If the directory gets deleted between init and writing to the output, the file write will fail

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Mar 29, 2024
@eyurtsev eyurtsev enabled auto-merge (squash) March 29, 2024 15:00
@eyurtsev eyurtsev merged commit f7e8a38 into langchain-ai:master Mar 29, 2024
59 checks passed
@h0rv h0rv deleted the huggingface-tts branch March 29, 2024 17:47
@h0rv
Copy link
Contributor Author

h0rv commented Mar 30, 2024

@eyurtsev Thanks for merging and the review! Good timing for the release of VoiceCraft: HuggingFace Demo

gkorland pushed a commit to FalkorDB/langchain that referenced this pull request Mar 30, 2024
…chain-ai#18880)

Description: I implemented a tool to use Hugging Face text-to-speech
inference API.

Issue: n/a

Dependencies: n/a

Twitter handle: No Twitter, but do have
[LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol.

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
hinthornw pushed a commit that referenced this pull request Apr 26, 2024
Description: I implemented a tool to use Hugging Face text-to-speech
inference API.

Issue: n/a

Dependencies: n/a

Twitter handle: No Twitter, but do have
[LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol.

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features 🔌: huggingface Primarily related to HuggingFace integrations lgtm PR looks good. Use to confirm that a PR is ready for merging. Ɑ: Runnables Related to Runnables size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants