Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config revamp and purpose #108

Open
alexsnaps opened this issue Oct 11, 2024 · 3 comments
Open

Config revamp and purpose #108

alexsnaps opened this issue Oct 11, 2024 · 3 comments

Comments

@alexsnaps
Copy link
Member

alexsnaps commented Oct 11, 2024

We have a few issues with the way we configure our code to do the proper thing based of how a Policy gets attached to Gateway API objects. Here is a tentative at a proposal to refine the config to a bare minimal of what wasm is supposed to do within the Gateway.

What is wasm even need for?

Upon serving a request, our wasm module needs to decide whether some actions need to be performed. This happens in "conceptual" steps:

  • Are there any Policy attached (directly or indirectly) to the HTTPRouteMatch?
  • Are the conditions of that Policy met?
  • If an action is taken, how does it affect the request?

The first step is to evaluate whether the request matches any of HTTPRouteMatch which do have a Policy attached, otherwise there is no action to be taken.

Mapping HTTPRouteRule to actual requests

In order to map a request to an HTTPRouteMatch, we need to evaluate which is matched by the request. There is a whole section in the Gateway API specification that describes the precedence rules for matching requests. Ideally the configuration would have the list of Actions to be sorted accordingly. Simplistically, we'd just "fall through" the list to find the best (i.e. the first) match. Obviously that list would contain only HTTPRouteMatch for which actions are to be taken (i.e. ones for which at least one policy has been attached to). That "ActionSet" to be taken would be identified by these "RouteRuleConditions", composed of whatever the HTTPRouteMatch expresses, added to the Hostnames defined by both the HTTPRouteSpec and the GatewaySpec's Listeners.

That first step would be the RouteRuleConditions expressed as "top-level" conditions in the order list of ActionSet.

Evaluating whether an action is to be taken

Actions within that ActionSet are themselves conditional. These Conditions are expressed by the user as part of their Policy definition (e.g. only apply rate limiting if the request isn't from a known set of IP addresses, like our offices).

Each individual Action has first its Conditions evaluated and only upon them being satisfied, it is invoked. Otherwise, the next Action of the ActionSet gets evaluated for invocation (e.g. Authorino). Further down the ActionSets, Conditions can reuse data from the previous stage, e.g. only limit "anonymous" traffic.

Invocation itself can the be "enriched" with request specific data and/or data from a previous step. e.g. passing the IP address to Limitador so it can maintain a counter per IP address; pass the user_id from the "auth phase" to Limitador.

Dealing with the request

Upon each Action evaluated, there are 3 possible outcome for the request:

  • wait on the result of the action/invoke a further action, i.e. pause the request;
  • allow the request thru;
  • hijack the request and terminate it

Concretely, what do we need?

Most of the config as it stands does what we need it to do, e.g. definition of services and their invocations. But we could probably simplify by having a ordered list of ActionSet narrowed down by RouteRuleConditions, which then define Actions to be taken, each with the option to be Conditionally invoked, and have data flow thru that Action chain (to be used in Conditions or invocation of the service.

Example

  • Pre-auth RL per IP for all requests, but from an allow-list of IPs
  • Auth for admin requests
  • Post-auth RL for all requests, with exceptions for certain users

Config (WIP)

---
- routeRuleConditions:
    hostnames:
    - "*.foo.com"
    - "users.bar.com"
    matches:
    - request.method == "GET"
    - request.path.startsWith('/bar')
  actions:
  - conditions:
    - "!['127.0.0.1', '192.168.1.1'].contains(source.ip)"
    action:
      service: limitador
      data:
        ip: source.ip
        magic_RL_name: "'1'"
  - conditions: []
    action:
      service: authorino
      data:
        authCfg: "'someIdentifier'"
  - conditions:
    - auth.user.name == 'bob'
    service: limitador
    data:
      user: auth.user_id
      magic_RL_bob: "'1'"
      magic_RL_all: "'1'" 
  - conditions:
    - auth.user.name == 'alice'
    service: limitador
    data:
      user: auth.user_id
      magic_RL_alice: "'1'"
      magic_RL_all: "'1'" 
  - conditions:
    - auth.user.name != 'alice'
    - auth.user.name != 'bob'
    service: limitador
    data:
      magic_RL_all: "'1'" 

That last bit about all, alice & bob, could be "simplified" by letting the wasm-shim "merge" adjacent calls to a same service, and merge the data part automagically. I think we can come up with the heuristics to do that there. I think this only applies to calls to Limitador as well. That mostly if we want to keep the contract of applying multiple RateLimitPolicyies despite knowing that the most restrictive kicks in first (which is what this example does for alice & bob), so that the last bit of the config would be more like this:

  - conditions:
    - auth.user.name == 'bob'
    service: limitador
    data:
      user: auth.user_id
      magic_RL_bob: "'1'"
  - conditions:
    - auth.user.name == 'alice'
    service: limitador
    data:
      user: auth.user_id
      magic_RL_alice: "'1'"
  - conditions:
    service: limitador
    data:
      magic_RL_all: "'1'" 

but would then be this:

  - conditions: []
    service: limitador
    data:
      user: auth.user.name
      user_id: auth.user_id
      magic_RL_alice: "'1'"
      magic_RL_bob: "'1'"
      magic_RL_all: "'1'" 

Where the Limit in limitador would be the one honoring the user == bob or alice part.

@alexsnaps
Copy link
Member Author

So wrt the case where multiple limits are to possibly be applied to a HTTPRouteRule, I think I'm in favor of merging the Action (i.e. its data) and fold it in a single request to Limitador where the conditions will be evaluated on its side.
So indeed something like:

  - conditions: []
    service: limitador
    data:
      user: auth.user.name
      user_id: auth.user_id
      magic_RL_alice: "'1'"
      magic_RL_bob: "'1'"
      magic_RL_all: "'1'" 

i.e. always call into Limitador and let it evaluate with of the 3 (alice, bob and/or all) have to be enforced. It becomes the reconciler's job to merge these and have the conditions propagate to the Limitador CR. All three would be potential candidates (based of the same namespace the matching Limits are declared in) and will be discriminated based on the DescriptorEntryies we send along. So in this case, for auth.user.name == "charlie", only all would apply, as we check that all entries required by a Limit are present, but there could be more entries... which will just be ignored.

@guicassolato
Copy link

Overall, +1 to the proposal of reviewing the config.

One thing I still find hard to ensure though, based on the (WIP) sample config provided, is the sorting the ActionSets and more so the sorting of Actions within an action set.

ActionSet conditions (routeRuleConditions) seem to map to HTTPRouteRules plus hostnames – or, equivalently, things we can obtain from Gateway API resources. My understanding of the specification provided by Gateway API is that it works fine for sorting HTTPRouteMatches, but not groupings of those, such as HTTPRouteRules and HTTPRoutes exactly.

E.g.:

Between /foo, /foo*, and /fo*, the spec clearly states that /foo ("Exact" path match) has precedence over /foo* ("Prefix" path match), which in turn has precedence over /fo* (also "Prefix" path match but with lower number of characters).

However, for the following two ActionSets, I suppose we would not be able to tell which one goes first:

- routeRuleConditions:
    matches:
    - request.path.startsWith('/foo')
    - request.path.startsWith('/')
    actions: <A>
- routeRuleConditions:
    matches:
    - request.path.startsWith('/fo')
    actions: <B>

And it's similar for hostnames of course. One can sort ["foo.com", "*.com", "*"], but not quite [["foo.com", "*"], ["*.com"]].

This makes me wonder if the configuration wouldn't have to be modeled so each HTTPRouteMatch-hostname combination maps to one ActionSet.

Regarding the sorting of Actions within an ActionSet, I honestly have no idea how we can calculate specificity and thus infer precedence for conditions based on completely arbitrary sets, i.e. not otherwise limited by a well-defined domain, such as the domain of all possible HTTP requests.

@alexsnaps
Copy link
Member Author

alexsnaps commented Oct 12, 2024

I'm confused:

This makes me wonder if the configuration wouldn't have to be modeled so each HTTPRouteMatch-hostname combination maps to one ActionSet.

Yes! That's what I'd do. No, actually, not HTTPRouteMatch, I don't think we need that precision Yes! 🤦 HTTPRouteMatch you are absolutely correct! I mixed up the two! "The set of conditions on the request that are AND together! also don't think we need a single "hostname" always - tho it might make is simpler indeed! Based on same reasoning: the conditions AND together. Or is you fear exactly that, the explosion of "options"? i.e. a combination of "startsWith" & "exact match" because of that mix being present in a single HTTPRouteRule? With the previous I think it'd be good, agreed?

However, for the following two ActionSets, I suppose we would not be able to tell which one goes first:

The first goes first, as it is first. Why did it come first? Because the Kuadrant operator did put it first, based of the sorting algorithm spec'ed by GwAPI and as such decided that's the order. Now I think /foo, may need to resolve in a 3rd (that'd be before these 2 tho that does say request.path == "/foo" tho. But rereading the spec this could not be distinguished from the GwAPI objects, is that correct? Everything is a startsWith if I read this right?

Now, both HTTPPathMatch and HTTPMethod appear once on a HTTPRouteMatch, so with the mapping of HTTPRouteMatch to ActionSet, that'd work fine. The hostname, if treated explicitly, can leverage the Trie to keep on hitting the "most specific" match.

Regarding the sorting of Actions within an ActionSet

What? The order of the actions is "pre-auth RL", then "Auth" and finally "authenticated RL"... Confused as to where the problem is?


Again, the wasm module would simply fall through this list (conceptually again, this can be optimized to not be O(n)). So the order and routeRuleConditions are of the essence here to properly end up matching to the proper equivalent HTTPRouteRule(s) HTTPRuleMatch(es) affected by a given (synthetic) Kuadrant policy. But afaict everything needed is available to the Kuadrant controller to output such a configuration for the wasm module to consume, just as Envoy Gateway orders these HTTPRouteMatches that do map to its Route entries

ActionSet conditions (routeRuleConditions) seem to map to HTTPRouteRules plus hostnames

Yeah, I mixed up there, that was my intent, but we need HTTPRouteMatch. We could tho have routeRuleConditions be a set of ORed together HTTPRouteMatch equivalent, if we'd want to keep the mapping closer to what Gateway API does. And probably as such avoid loads of repetition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants