Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementer's guide #2454

Merged
merged 6 commits into from
Oct 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ nav:
- gRPC Routing: guides/grpc-routing.md
- Migrating from Ingress: guides/migrating-from-ingress.md
- Reference:
- Implementer's Guide: reference/implementers-guide.md
- API Types:
GatewayClass: api-types/gatewayclass.md
Gateway: api-types/gateway.md
Expand Down
2 changes: 1 addition & 1 deletion site-src/concepts/guidelines.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Implementation guidelines
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started this PR reviewing this page, but it turned out that these guidelines are actually for us, the project contributors and maintainers, not for implementers. So "Design guidelines" felt more appropriate.

# Design guidelines

There are some general design guidelines used throughout this API.

Expand Down
337 changes: 337 additions & 0 deletions site-src/references/implementers-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,337 @@
# Gateway API Implementer's Guide

Everything you wanted to know about building a Gateway API implementation
but were too afraid to ask.

This document is a place to collect tips and tricks for _writing a Gateway API
implementation_ that have no straightforward place within the godoc fields of the
underlying types.

It's also intended to be a place to write down some guidelines to
help implementers of this API to skip making common mistakes.

It may not be very relevant if you are intending to _use_ this API as an end
user as opposed to _building_ something that uses it.

This is a living document, if you see something missing, PRs welcomed!

## Important things to remember about Gateway API

Hopefully most of these are not surprising, but they sometimes have non-obvious
implications that we'll try and lay out here.

### Gateway API is a `kubernetes.io` API

Gateway API uses the `gateway.networking.k8s.io` API group. This means that,
like APIs delivered in the core Kubernetes binaries, each time a release happens,
the APIs have been reviewed by upstream Kubernetes reviewers, just like the APIs
delivered in the core binaries.

### Gateway API is delivered using CRDs

Gateway API is supplied as a set of CRDs, version controlled using our [versioning
policy][versioning].

The most important part of that versioning policy is that what _appears to be_
the same object (that is, it has the same `group`,`version`, and `kind`) may have
a slightly different schema. We make changes in ways that are _compatible_, so
things should generally "just work", but there are some actions implementations
need to take to make "just work"ing more reliable; these are detailed below.

The CRD-based delivery also means that if an implementation tries to use (that is
get, list, watch, etc) Gateway API objects when the CRDs have _not_ been installed,
then it's likely that your Kubernetes client code will return serious errors.
Tips to deal with this are also detailed below.

The CRD definitions for Gateway API objects all contain two specific
annotations:

- `gateway.networking.k8s.io/bundle-version: <semver-release-version>`
- `gateway.networking.k8s.io/channel: <channel-name>`

The concepts of "bundle version" and "channel" (short for "release channel") are
explained in our [versioning][versioning] documentation.

Implementations may use these to determine what schema versions are installed in
the cluster, if any.

[versioning]: /concepts/versioning

### Changes to the Gateway API CRDs are backwards compatible

Part of the contract for Gateway API CRDs is that changes _within an API version_
must be _compatible_.

"Within an API Version" means changes to a CRD that occur while the same API version
(`v1alpha2` or `v1` for example) is in use, and "compatible" means that any new
fields, values, or validation will be added to ensure that _previous_
objects _will still be valid objects_ after the change.

This means that once Gateway API objects move to the `v1` API version, then _all_
changes must be compatible.

This contract also means that an implementation will not fail with a higher version
of the API than the version it was written with, because the newer schema being
stored by Kubernetes will definitely be able to be serialized into the older version
used in code by the implementation.

Similarly, if an implementation was written with a _higher_ version, the newer
values that it understands will simply _never be used_, as they are not present
in the older version.

## Implementation Rules and Guidelines

### CRD Management

For a Gateway API implementation to work, the Gateway API CRDs must be installed
in the Kubernetes cluster the implementation is watching.

Implementations have two possible options: installing CRDs themselves (implementation
controlled) or requiring installation by some other mechanism before working
(externally controlled). Both have tradeoffs, but implementation controlled has
significantly more, and so we DO NOT recommend using implementation controlled
methods at this time.

Regardless, either way has certain things that SHOULD be true, however:

Whatever method is used, infra and cluster admins SHOULD attempt to ensure that
the Bundle version of the CRDs is not _downgraded_. Although we ensure that
API changes are backwards compatible, changing CRD definitions can change the
storage version of the resource, which could have unforseen effects. Most of the
time, things will probably work, but if it doesn't work, it will most likely
break in weird ways.

Additionally, older versions of the API may be missing fields or features, which
could be very disruptive for users.

Try your best to ensure that the bundle version doesn't roll backwards. It's safer.

Implementations SHOULD also handle the Gateway API CRDs _not_ being present in
the cluster without crashing or panicking. Exiting with a clear fatal error is
acceptable in this case, as is disabling Gateway API support even if enabled in
configuration.

Practically, for implementations using tools like `controller-runtime` or
similar tooling, they may need to check for the _presence_ of the CRDs by
getting the list of installed CRDs before attempting to watch those resources.
(Note that this will require the implementation to have `read` access to those
resources though.)

#### Implementation-controlled CRD installation

Implementation-controlled CRD installation also includes automatic installation
mechanisms such as Helm, if the CRDs are included in a Helm chart with the
implementation's installation.

Because of significant caveats we DO NOT recommend doing implementation-controlled
CRD management at this time.

However, if you really must, CRD definitions MAY be installed by implementations,
but if they do, they MUST have a way to ensure:

- there are no other Gateway API CRDs installed in the cluster before starting, or
- that the CRD definitions are only installed if they are a higher bundle version
than any existing Gateway API CRDs. Note that even this may not be safe if there
are breaking changes in the experimental channel resources, so implementations
should be _very_ careful with doing this.

This avoids problems if another implementation is also installed in the cluster
and expects a higher version of the CRDs to be installed.

The worst outcome here would be two implementations trying to do automatic install
of _different_ CRD versions, resulting in the CRD versions flapping between
versions or channels. This would _not_ produce good outcomes.

The safer method for an automatic installation would require the implementation
to:

- Check if there are any Gateway API CRDs installed in the cluster.
- If not, install its most compatible version of the CRDs.
- If so, only install its version of the CRDs if the bundle version is higher
than the existing one, and the mechanism will also need to check if there are
incompatible changes included in any versions as well.

This is going to be _very_ difficult to pull off in practice.

It should also be noted that many infra and cluster admins manage CRDs using
externally controlled methods that will not be visible to a Gateway
implementation, so if you still proceed with automatic installation, it MUST be
able to be disabled by the installation owner (whether that is the infra or cluster
admin).

Because of all these caveats, we DO NOT recommend doing automatic CRD management
at this time.
Comment on lines +162 to +163
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something that should be closer to the top of the section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded this pretty substantially, PTAL.


#### Externally controlled CRD installation

Because of all of the complexities mentioned in the "Implementation controlled"
section of this document, we recommend that implementations supply documentation
on how to check if CRDs are installed and upgrade versions if required.

Additions to this document to add suggested commands here are welcomed.

### Conformance and Version compatibility

A conformant Gateway API implementation is one that passes the conformance tests
that are included in each Gateway API bundle version release.

An implementation MUST pass the conformance suite with _no_ skipped tests to be
conformant. Tests may be skipped during development, but a version you want to
be conformant MUST have no skipped tests.

Extended features may, as per the contract for Extended status, be disabled.

Gateway API conformance is version-specific. An implementation that passes
conformance for version N may not pass conformance for version N+1 without changes.

Implementations SHOULD submit a report from the conformance testing suite back
to the Gateway API Github repo containing details of their testing.

The conformance suite output includes the Gateway API version supported.

#### Version compatibility

Once v1.0 is released, for implementations supporting Gateway and GatewayClass,
they MUST set a new Condition, `SupportedVersion`, with `status: true` meaning
that the installed CRD version is supported, and `status: false` meaning that it
is not.

### Standard Status fields and Conditions

Gateway API has many resources, but when designing this, we've worked to keep
the status experience as consistent as possible across objects, using the
Condition type and the `status.conditions` field.

Most resources have a `status.conditions` field, but some also have a namespaced
field that _contains_ a `conditions` field.

For the latter, Gateway's `status.listeners` and the Route `status.parents`
fields are examples where each item in the slice identifies the Conditions
associated with some subset of configuration.

For the Gateway case, it's to allow Conditions per _Listener_, and in the Route
case, it's to allow Conditions per _implementation_ (since Route objects can
be used in multiple Gateways, and those Gateways can be reconciled by different
implementations).

In all of these cases, there are some relatively-common Condition types that have
similar meanings:

- `Accepted` - the resource or part thereof contains acceptable config that will
produce some configuration in the underlying data plane that the implementation
controls. This does not mean that the _whole_ configuration is valid, just that
_enough_ is valid to produce some effect.
- `Programmed` - this represents a later phase of operation, after `Accepted`,
when the resource or part thereof has been Accepted and programmed into the
underlying dataplane. Users should expect the configuration to be ready for
traffic to flow _at some point in the near future_. This Condition does _not_
say that the dataplane is ready _when it's set_, just that everything is valid
and it _will become ready soon_. "Soon" may have different meanings depending
on the implementation.
- `ResolvedRefs` - this Condition indicates that all references in the resource
or part thereof were valid and pointed to an object that both exists and allows
that reference. If this Condition is set to `status: false`, then _at least one_
reference in the resource or part thereof is invalid for some reason, and the
`message` field should indicate which one are invalid.

Implementers should check the godoc for each type to see the exact details of
these Conditions on each resource or part thereof.

Additionally, the upstream `Conditions` struct contains an optional
`observedGeneration` field - implementations MUST use this field and set it to
the `metadata.generation` field of the object at the time the status is generated.
This allows users of the API to determine if the status is relevant to the current
version of the object.


### Resource details

For each currently available conformance profile, there are a set of resources
that implementations are expected to reconcile.

The following section goes through each Gateway API object and indicates expected
behaviors.

#### GatewayClass

GatewayClass has one main `spec` field - `controllerName`. Each implementation
is expected to claim a domain-prefixed string value (like
`example.com/example-ingress`) as its `controllerName`.

Implementations MUST watch _all_ GatewayClasses, and reconcile GatewayClasses
that have a matching `controllerName`. The implementation must choose at least
one compatible GatewayClass out of the set of GatewayClasses that have a matching
`controllerName`, and indicate that it accepts processing of that GatewayClass
by setting an `Accepted` Condition to `status: true` in each. Any GatewayClasses
that have a matching `controllerName` but are _not_ Accepted must have the
`Accepted` Condition sett to `status: false`.

Implementations MAY choose only one GatewayClass out of the pool of otherwise
acceptable GatewayClasses if they can only reconcile one, or, if they are capable
of reconciling multiple GatewayClasses, they may also choose as many as they like.

If something in the GatewayClass renders it incompatibie (at the time of writing,
the only possible reason for this is that there is a pointer to a `paramsRef`
object that is not supported by the implementation), then the implementation
SHOULD mark the incompatible GatewayClass as not `Accepted`.

#### Gateway

Gateway objects MUST refer in the `spec.gatewayClassName` field to a GatewayClass
that exists and is `Accepted` by an implementation for that implementation to
reconcile them.

Gateway objects that fall out of scope (for example, because the GatewayClass
they reference was deleted) for reconciliation MAY have their status removed by
the implementation as part of the delete process, but this is not required.

#### General Route information

All Route objects share some properties:

- They MUST be attached to an in-scope parent for the implementation to consider
them reconcilable.
- The implementation MUST update the status for each in-scope Route with the
relevant Conditions, using the namespaced `parents` field. See the specific Route
types for details, but this usually includes `Accepted`, `Programmed` and
`ResovledRefs` Conditions.
- Routes that fall out of scope SHOULD NOT have status updated, since it's possible
that these updates may overwrite any new owners. The `observedGeneration` field
will indicate that any remaining status is out of date.


#### HTTPRoute

HTTPRoutes route HTTP traffic that is _unencrypted_ and available for inspection.
Copy link
Member

@robscott robscott Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be a bit confusing when combined with BackendTLSPolicy. My understanding is that HTTPRoute can/should be combined with BackendTLSPolicy, but in that (and other) cases, traffic to the backend would be encrypted. I think it might be more accurate to say that this applies to HTTP(S) traffic that is terminated at the Gateway, but not TLS Passthrough traffic. I'm actually not sure how to word this, just need to be careful that we're not being unnecessarily restrictive here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this wording, PTAL.

This includes HTTPS traffic that's terminated at the Gateway (since that is then
decrypted), and allows the HTTPRoute to use HTTP properties, like path, method,
or headers in its routing directives.

#### TLSRoute

TLSRoutes route encrypted TLS traffic using the SNI header, _without decrypting
the traffic stream_, to the relevant backends.

#### TCPRoute

TCPRoutes route a TCP stream that arrives at a Listener to one of the given
backends.

#### UDPRoute

UDPRoutes route UDP packets that arrive at a Listener to one of the given
backends.

#### ReferenceGrant

ReferenceGrant is a special resource that is used by resource owners in one
namespace to _selectively_ allow references from Gateway API objects in other
namespaces.

A ReferenceGrant is created in the same namespace as the thing it's granting
reference access to, and allows access from other namespaces, from other Kinds,
or both.

Implementations that support cross-namespace references MUST watch ReferenceGrant
and reconcile any ReferenceGrant that points to an object that's referred to by
an in-scope Gateway API object.
2 changes: 2 additions & 0 deletions site-src/references/spec.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# API Specification

This page contains the API field specification for Gateway API.

REPLACE_WITH_GENERATED_CONTENT