Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy to pass hits_addend to RateLimitService #12969

Open
wwillsey opened this issue Sep 3, 2020 · 20 comments
Open

Envoy to pass hits_addend to RateLimitService #12969

wwillsey opened this issue Sep 3, 2020 · 20 comments

Comments

@wwillsey
Copy link

wwillsey commented Sep 3, 2020

Description

The RLS v3 api describes the RateLimitService as able to injest a hits_addend field to determine number of tokens to use for the rate limiting request.
Envoy should provide a method for extracting a value from a request header (or some other method) to populate this method on a per request basis. If hits_addend is only static, then it is effectively the same as modifying the ratelimit.

Use case

In the HTTP Rate Limit Filter allow for a configuration of a request header containing an integer hits_addend value to send with the rate limit request, allowing for greater configurability of rate limiting capabilities.

@wwillsey
Copy link
Author

wwillsey commented Sep 3, 2020

Hey @mattklein123, I've created this issue to follow up on envoyproxy/ratelimit#167. Please let me know if you think any more details would be helpful.

Thanks!

@mattklein123
Copy link
Member

Yeah this makes sense to me. Marking help wanted.

@medalliaerlich
Copy link

any news regarding this?

@sc0ttbeardsley
Copy link

Pinterest is interested in this also. cc @fishcakez @JuniorHsu

@lizzzcai
Copy link

lizzzcai commented Jul 5, 2023

Hi, any news regarding this? We would like to use it to limit the token/minutes for the LLM use case, as they are usually limited by tokens-per-minutes rather than requests/secs.

@PeterL328
Copy link
Contributor

Related work.
I updated the ratelimit client to support the hits_addend field (#28939). Some extra work would be required so users can configure ratelimit sidecar to send hits_addend

@PeterL328
Copy link
Contributor

PeterL328 commented Aug 18, 2023

@lizzzcai In case you are using the OpenAI API, I think they limit on request token + response token. So further work would be required either in the ratelimit filter or another new filter so the response token can be sent to the ratelimit sidecar on the response flow.

@lizzzcai
Copy link

Hi @PeterL328 , thanks for your update, I will follow your other PR for the progress.

In case you are using the OpenAI API, I think they limit on request token + response token.

For our case, we are using Azure OpenAI. However, I think the limit is not on the response token at least for Azure OpenAI. For our case we are using prompt text token + max_tokens(max number of token will be responded) in the request.

Reference: Azure OpenAI

As each request is received, Azure OpenAI computes an estimated max processed-token count that includes the following:

Prompt text and count
The max_tokens parameter setting
The best_of parameter setting

As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets.

@PeterL328
Copy link
Contributor

Hi @lizzzcai,
We use OpenAI API and also Azure OpenAI. I believe both will report back the total token consumed (request + response token) in the response body.

Yea you can use the max token on the response but it will not be accurate if that is what you need if you plan to track it.

@EItanya
Copy link
Contributor

EItanya commented May 16, 2024

I have opened #34184 as a potential solution to setting hits_addend in an unobtrusive way.

@zirain
Copy link
Contributor

zirain commented Aug 8, 2024

after #34184 merged, able to close this?

@OS-ramamurtisubramanian
Copy link

OS-ramamurtisubramanian commented Aug 13, 2024

Hi @EItanya, I'm trying to use the hits addend with istio. Can you please provide me an example of how to configure this as an EnvoyFilter?

I was trying to use the set filter state filter to set the envoy.ratelimit.hits_addend filter state from a request header, but It was not working.

I get the following error.

Error adding/updating listener(s) virtualInbound: 'envoy.ratelimit.hits_addend' does not have an object factory.

@zirain
Copy link
Contributor

zirain commented Aug 13, 2024

please use master branch

@OS-ramamurtisubramanian
Copy link

OS-ramamurtisubramanian commented Aug 14, 2024

Hi @zirain , I managed to build and use the piot and proxyv2 images of istio from master branch.

I am tryting to create the EnvoyFilter objects.
envoyfilter_hits_addend.txt

Is this the correct way to set the envoy.ratelimit.hits_addend filter state from a request header called hits, before the rate limit filter?

@zirain
Copy link
Contributor

zirain commented Aug 14, 2024

be careful of inserting a filter based on something that is created by another envoyfilter.

@gcalmettes
Copy link

gcalmettes commented Oct 8, 2024

Seeing the same problem than the one described by @OS-ramamurtisubramanian on the latest v1.31.2, when trying to use the envoy.filters.http.set_filter_state filter to set the envoy.ratelimit.hits_addend state key.

          - name: envoy.filters.http.set_filter_state
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config
              on_request_headers:
              - object_key: envoy.ratelimit.hits_addend
                format_string:
                  text_format_source:
                    inline_string: "0"

Error log is:

[main] [source/server/server.cc:412] error initializing config '  /etc/envoy/envoy.yaml': 'envoy.ratelimit.hits_addend' does not have an object factory 

Is there another configuration to add ?

@zirain
Copy link
Contributor

zirain commented Oct 8, 2024

Seeing the same problem than the one described by @OS-ramamurtisubramanian on the latest v1.31.2, when trying to use the envoy.filters.http.set_filter_state filter to set the envoy.ratelimit.hits_addend state key.

          - name: envoy.filters.http.set_filter_state
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config
              on_request_headers:
              - object_key: envoy.ratelimit.hits_addend
                format_string:
                  text_format_source:
                    inline_string: "0"

Error log is:

[main] [source/server/server.cc:412] error initializing config '  /etc/envoy/envoy.yaml': 'envoy.ratelimit.hits_addend' does not have an object factory 

Is there another configuration to add ?

I cannot recall, but can you give a try with main branch?

@gcalmettes
Copy link

@zirain I just tried using a freshly built envoy binary from the main branch.

> ./envoy --version          

./envoy  version: 51e253405a2be7f94df8c0ba78bd884dc79bb8a5/1.32.0-dev/Modified/DEBUG/BoringSSL

Configuration tested:

admin:
  address:
    socket_address: { address: 127.0.0.1, port_value: 9901 }

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 127.0.0.1, port_value: 10000 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                route: { cluster: some_service }
          http_filters:
          - name: envoy.filters.http.set_filter_state
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config
              on_request_headers:
              - object_key: envoy.ratelimit.hits_addend
                format_string:
                  text_format_source:
                    inline_string: "0"
          - name: envoy.filters.http.ratelimit
            typed_config:
              '@type': type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
              domain: rpm
              enable_x_ratelimit_headers: DRAFT_VERSION_03
              failure_mode_deny: false
              rate_limit_service:
                grpc_service:
                  envoy_grpc:
                    cluster_name: ratelimit
                transport_api_version: V3
              rate_limited_as_resource_exhausted: true
              request_type: external
              stage: 0
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
  - name: some_service
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: some_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 1234
  - name: ratelimit
    connect_timeout: 1s
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: ratelimit
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 5001

same error:

[2024-10-09 18:14:52.204][619209][info][main] [source/server/server.cc:871] runtime: {}
[2024-10-09 18:14:52.206][619209][info][admin] [source/server/admin/admin.cc:65] admin address: 127.0.0.1:9901
[2024-10-09 18:14:52.209][619209][info][config] [source/server/configuration_impl.cc:168] loading tracing configuration
[2024-10-09 18:14:52.209][619209][info][config] [source/server/configuration_impl.cc:124] loading 0 static secret(s)
[2024-10-09 18:14:52.209][619209][info][config] [source/server/configuration_impl.cc:130] loading 2 cluster(s)
[2024-10-09 18:14:52.231][619209][info][config] [source/server/configuration_impl.cc:138] loading 1 listener(s)
[2024-10-09 18:14:52.241][619209][info][config] [source/server/configuration_impl.cc:168] loading tracing configuration
[2024-10-09 18:14:52.241][619209][info][config] [source/server/configuration_impl.cc:124] loading 0 static secret(s)
[2024-10-09 18:14:52.241][619209][info][config] [source/server/configuration_impl.cc:130] loading 2 cluster(s)
[2024-10-09 18:14:52.258][619209][info][config] [source/server/configuration_impl.cc:138] loading 1 listener(s)
[2024-10-09 18:14:52.266][619209][critical][main] [source/server/server.cc:412] error initializing config '  envoy-basic.yaml': 'envoy.ratelimit.hits_addend' does not have an object factory
[2024-10-09 18:14:52.268][619209][info][main] [source/server/server.cc:1042] exiting
'envoy.ratelimit.hits_addend' does not have an object factory

@zirain
Copy link
Contributor

zirain commented Oct 9, 2024

I'm not sure how you build it, I cannot reproduce it on my machine.

bazel build envoy
cp bazel-bin/source/exe/envoy-static /usr/local/bin/envoy-dev
envoy-dev -c envoy.yaml

@gcalmettes
Copy link

gcalmettes commented Oct 10, 2024

@zirain , sorry, I must have missed something in my first build (I was using the docker script provided). Trying with your command indeed works. Thank you !
It's very useful to set a different hitsAddend value per filter for different domains when multiple ratelimit filters are chained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants