-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AI] Semantic Cache API Updates, (mode, score_threshold) #9931
Conversation
…into eitanya/ai-cache-v2
Issues linked to changelog: |
// two queries need to be in order to return a cached result. | ||
// The lower the number, the more similar the queries need to be for a cache hit. | ||
// | ||
// +kubebuilder:validation:Minimum=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our code gen via solo-kit likely isn't enabling kubebuilder markers, we may want to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a much bigger issue, I agree with you but probably not for right now.
enum Mode { | ||
// Read and write to the cache as a part of the request/response lifecycle | ||
READ_WRITE = 0; | ||
// Only read from the cache, do not write to it. Data will be written to the cache outside the request/response cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop the period or add it to all comments for consistency
// Which data store to use | ||
DataStore datastore = 1; | ||
// Model to use to get embeddings for prompt | ||
Embedding embedding = 2; | ||
// Time before data in the cache is considered expired | ||
uint32 ttl = 3; | ||
// Cache mode to use: READ_WRITE or READ_ONLY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe another comment line on what it does similar to the enum desc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? if a user wants more detail they can click into the enum
, this is just meant as high level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes it easier in the doc to now click another link, nothing more
Visit the preview URL for this PR (updated for commit 0dc82d3): https://gloo-edge--pr9931-eitanya-ai-cache-v2-tvom25a5.web.app (expires Mon, 02 Sep 2024 20:06:05 GMT) 🔥 via Firebase Hosting GitHub Action 🌎 Sign: 77c2b86e287749579b7ff9cadb81e099042ef677 |
@@ -249,6 +249,8 @@ spec: | |||
{{- if (($gateway.aiExtension).enabled) }} | |||
- name: gloo-ai-extension | |||
image: "{{ template "gloo-gateway.gateway.image" $gateway.aiExtension.image }}" | |||
args: | |||
- "extproc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the entire spec is opaque, so perhaps this should be exposed via GatewayParameters as well. However, I would rather not make this change and instead use an env var and let the process use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still opaque, essentially I wanted to re-use the program rather than creating a new image for the apiserver. If we make this configurable the user can accidentally not include this, but it's necessary 100% of the time if this is enabled, so it belongs here. Really this entire spec should be located alongside the code rather than in a different repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't because "extproc" doesn't make sense without knowing about the container being used and its impl. Why not use an env var as it allows to do exactly what you are intending to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean "use an env var", where would we use one that's opaque to the user? Putting it in the Params requires the user to but it in the params, but I specifically want to avoid the user having to do anything. I can name it sidecar if extproc
is specifically bad in this case, but I don't think that's what you're saying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can default the impl to use a sidecar and we can set an env var when running as an apiservice?
…into eitanya/ai-cache-v2
Co-authored-by: Shashank Ram <shashank.ram@solo.io> Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Description
AI Semantic caching API updates
API changes
score_threshold
field to semantic caching to allow the user to decide how similar a prompt must be in order for caching to trigger.