Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AI] Semantic Cache API Updates, (mode, score_threshold) #9931

Merged
merged 18 commits into from
Aug 26, 2024

Conversation

EItanya
Copy link
Member

@EItanya EItanya commented Aug 22, 2024

Description

AI Semantic caching API updates

API changes

  • Added a mode field to semantic caching to allow the user to selectively add new entries to the cache
  • Added a score_threshold field to semantic caching to allow the user to decide how similar a prompt must be in order for caching to trigger.

@github-actions github-actions bot added keep pr updated signals bulldozer to keep pr up to date with base branch work in progress signals bulldozer to keep pr open (don't auto-merge) labels Aug 22, 2024
@EItanya EItanya changed the title [AI] Semantic Cache API Updates [AI] Semantic Cache API Updates, (mode, score_threshold) Aug 22, 2024
@EItanya EItanya marked this pull request as ready for review August 22, 2024 15:19
@EItanya EItanya requested a review from a team as a code owner August 22, 2024 15:19
@solo-changelog-bot
Copy link

Issues linked to changelog:
https://github.com/solo-io/solo-projects/issues/6783

// two queries need to be in order to return a cached result.
// The lower the number, the more similar the queries need to be for a cache hit.
//
// +kubebuilder:validation:Minimum=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our code gen via solo-kit likely isn't enabling kubebuilder markers, we may want to do that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a much bigger issue, I agree with you but probably not for right now.

enum Mode {
// Read and write to the cache as a part of the request/response lifecycle
READ_WRITE = 0;
// Only read from the cache, do not write to it. Data will be written to the cache outside the request/response cycle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop the period or add it to all comments for consistency

// Which data store to use
DataStore datastore = 1;
// Model to use to get embeddings for prompt
Embedding embedding = 2;
// Time before data in the cache is considered expired
uint32 ttl = 3;
// Cache mode to use: READ_WRITE or READ_ONLY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another comment line on what it does similar to the enum desc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? if a user wants more detail they can click into the enum, this is just meant as high level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes it easier in the doc to now click another link, nothing more

Copy link

github-actions bot commented Aug 22, 2024

Visit the preview URL for this PR (updated for commit 0dc82d3):

https://gloo-edge--pr9931-eitanya-ai-cache-v2-tvom25a5.web.app

(expires Mon, 02 Sep 2024 20:06:05 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: 77c2b86e287749579b7ff9cadb81e099042ef677

@@ -249,6 +249,8 @@ spec:
{{- if (($gateway.aiExtension).enabled) }}
- name: gloo-ai-extension
image: "{{ template "gloo-gateway.gateway.image" $gateway.aiExtension.image }}"
args:
- "extproc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the entire spec is opaque, so perhaps this should be exposed via GatewayParameters as well. However, I would rather not make this change and instead use an env var and let the process use that instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still opaque, essentially I wanted to re-use the program rather than creating a new image for the apiserver. If we make this configurable the user can accidentally not include this, but it's necessary 100% of the time if this is enabled, so it belongs here. Really this entire spec should be located alongside the code rather than in a different repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't because "extproc" doesn't make sense without knowing about the container being used and its impl. Why not use an env var as it allows to do exactly what you are intending to?

Copy link
Member Author

@EItanya EItanya Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "use an env var", where would we use one that's opaque to the user? Putting it in the Params requires the user to but it in the params, but I specifically want to avoid the user having to do anything. I can name it sidecar if extproc is specifically bad in this case, but I don't think that's what you're saying

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can default the impl to use a sidecar and we can set an env var when running as an apiservice?

@EItanya EItanya merged commit 34b8664 into main Aug 26, 2024
18 checks passed
@EItanya EItanya deleted the eitanya/ai-cache-v2 branch August 26, 2024 20:32
npolshakova pushed a commit that referenced this pull request Aug 30, 2024
Co-authored-by: Shashank Ram <shashank.ram@solo.io>
Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep pr updated signals bulldozer to keep pr up to date with base branch work in progress signals bulldozer to keep pr open (don't auto-merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants