Introduce links to CDEvents

This commits introduces the concept of linking events to CDEvents. We take inspiration from Eiffel for certain components, and also outline new APIs and use cases. Signed-off-by: benjamin-j-powell <benjamin_j_powell@apple.com>
cdevents · Jun 21, 2023 · fc04bcf · fc04bcf
1 parent c8e0f85
commit fc04bcf
Show file tree

Hide file tree

Showing 2 changed files with 317 additions and 0 deletions.
diff --git a/images/links_flow.jpeg b/images/links_flow.jpeg
diff --git a/links.md b/links.md
@@ -0,0 +1,317 @@
+# Connecting Events - Links Proposal
+
+## Abstract
+
+This proposal will outline how to connect individual cdEvents to eachother.
+Right now there's no way of associating events to one another without needing
+to backtrack across certain context attributes, eg [id](https://github.com/cdevents/spec/blob/main/spec.md#id-context).
+There is limitations, however, in that we don't know when an event begins
+nor finishes.
+
+// sentence needs to be better
+This proposal will outline a new approach that will allow for
+the semantics of starting of a span and finishing.
+
+## Semantics
+
+This section will line out various definitions to ensure there are no
+assumptions being made when we talk about linking events
+
+* **CI** - Continuous integration
+* **CD** - Continuous delivery
+* **cdEvents Span** - A cdEvents span is an end to end representation of the
+  CI/CD lifecycle of some artifact 
+* **Add Link** - A start span is the beginning of the CI/CD lifecycle.
+  Usually a start start span is associated to some CI event, but is not limited
+  to.
+* **Link** - TODO
+    * **Parent Link** - TODO
+    * **Child Link** - TODO
+* **Global ID** - TODO
+
+## Goals
+
+The biggest challenge in this whole process is ensuring that links can be
+retrieved quickly and efficiently, while also providing the necessary metadata
+and structure to give a detailed picture of how things occurred.
+
+1) Provide a way of quickly retrieving all related links
+2) Keep link data structured small and simple
+3) Scalable
+
+## Design
+
+This section will propose two designs. The first being how individual events
+can be linked to one another and be described in a way where it represents the
+complete picture of the whole CI/CD span. The second portion will address the
+goal of scalability.
+
+### Eiffel Links
+
+[Eiffel links](https://github.com/eiffel-community/eiffel) is a links protocol
+that enables events to have relation(s) to one another. This proposal will use
+to properly connect events in a meaningful manner.
+
+We will utilize the [custom data](https://github.com/eiffel-community/eiffel/blob/master/customization/custom-data.md)
+field in Eiffel, to achieve any missing gaps that are needed but Eiffel doesn't provide.
+
+An individual link may represent some event that occurred in a CI or CD system.
+For example, CD pipeline start event could have started but was triggered by some
+CI job.
+
+A full Eiffel trigger event may look something like this:
+```json
+{
+    "parent": [ // include local id for pipeline stage?
+        {
+            "meta": { // this needs to be moved out https://github.com/eiffel-community/eiffel/blob/master/examples/events/EiffelActivityTriggeredEvent/simple-customdata.json
+                "id": "ed621490-f27b-4d4c-9c90-2b6e51a8bfc6",
+                "type": "dev.cdevent", // update type
+                "version": "4.2.0",
+                "time": 1681185367, 
+                "parentID": "7078376c-61dc-4a00-b1fb-9d509c330c78",
+                "globalID": "f6df13fa-472d-462f-b6ad-2237e917b306"
+            },
+            "data": {
+                "name": "trigger CD",
+                "triggers": [
+                    {
+                        "type": "TIMER",
+                        "description": "some cron"
+                    }
+                ]
+            },
+            "links": [
+                ...
+            ]
+        }
+    ]
+}
+```
+
+Here we utilize the `customData` bag to populate our cdEvent span with all the
+necessary information. Users can further provide more metadata in `customData`
+to suit their needs to make choices based on some link(s).
+
+The new links will be a new added optional field in the **all** cdEvent types.
+
+```json
+{
+    "context": {
+        "version": "0.3.0-draft",
+        "id": "505b31c2-8bc8-47b3-a1a0-269d7a8530ac",
+        "source": "dev/jenkins",
+        "type": "dev.cdevents.testsuite.finished.0.1.1",
+        "timestamp": "2023-03-20T14:27:05.315384Z"
+    },
+    "subject": {
+        "id": "MyTestSuite",
+        "source": "/tests/com/org/package",
+        "type": "testSuite",
+        "content": {}
+    },
+    "links": [ // new proposed field here
+        {
+            "meta": {
+                "id": "c4616b01-cc03-4c46-b0ca-21be4df8c6c8",
+                "type": "EiffelActivityFinishedEvent"
+                "version": "3.3.0",
+                "time": 1681757386,
+                "tags": [
+                    "ci",
+                    "prod"
+                ]
+            },
+            "source": {
+                "domainId": "jenkins",
+                "host": "some-host-id",
+                "name": "Jenkins My Build Started",
+                "uri": "https://my-jenkins.org/build/714"
+            },
+            "data": {
+                "name": "Jenkins Build #714",
+                "outcome": {
+                    "conclusion": "SUCCESSFUL",
+                    "description": "PR build #714 succeeded"
+                }
+                "persistentLogs": [
+                    {
+                        "name": "build logs",
+                        "uri": "https://my-jenkins.org/build/714/logs"
+                    }
+                ],
+                "customData": {
+                    "spanContext": {
+                        "globalId": "",
+                        "parentId": ""
+                    }
+                }
+            }
+        }
+    ]
+}
+```
+
+### Scalability
+
+Scalability is one of the bigger goals in this proposal and we wanted to ensure
+fast lookups. This section is going to describe how the proposed links format
+will be scalable and also provide tactics on how DB read/writes can be done.
+
+The purpose of the `globalID` was to ensure very fast lookups no matter the
+database. We could say that only graph DBs could be used to do a full span
+lookup without a `globalID` but that poses two problems:
+
+* Slower lookups as the graph DB needs to backtrack to find the full span
+* Limits to only graph DBs
+
+Instead a link service that processes and agnostically stores to some DB is
+much prefer as it gives companies and developers options to choose from.  When
+using an SQL database, the `globalID` could be the secondary key to easily
+retrieve indexed entities. Links could be easily sorted by timestamp which
+should roughly coordinate to their linked neighbors, parent and child.
+
+cdEvents that are to be ingested by some service would also have to worry about
+the number of links returned. This problem is mitigated in that only the
+immediate parent(s) links are returned, and any higher ancestry are excluded.
+If some service needs to get access to a higher (a parent's parent) they would
+need to use the links API to retrieve them.
+
+## Client APIs and Links Storage
+
+So far we've only talked about what a service may receive when expecting a
+parent link. However, when we store a link, there's a lot more metadata that
+can and should be added.
+
+The idea is we'd expect users to start link, group, and end links accordingly
+through APIs we'd provide. This is very similar to how tracing works.
+Granularity in tracing is completely up to the engineer which this proposal
+also intends users to do.
+
+This will introduce new APIs to the CDEvents SDKs, such as `addLink`.
+This API will be used to create a new link based on some CDEvent context.
+The context may contain things like parent caller and other useful metadata.
+
+```
+(context: CDEventContext) addLink(link)
+```
+
+startLink will utilize the parent ID to make some sort of relation back to the
+parent.
+This method is attached to the LinkContext which will contain the current
+metadata about the current composed of link, eg what current link is being
+built along with the parent link ID.
+
+When calling `addLink`, it is important to understand the association to the
+parent.
+
+```
+// Adds a new link to the CDEvent context which will be sent to the link
+// service at some point.
+cdEventContext.addLink(link: Link);
+// Link may be a class that can be a ToLink, FromLink, WithLink
+```
+
+Here we see an enum of `CAUSE` which is one of the few types of relations that
+a link can have.
+Below defines the list of enums a relation can be
+
+| Name  | Description                                                                               |
+|-------|-------------------------------------------------------------------------------------------|
+| TO    | When a link is creating an event, it will use TO to signal to what target ID              |
+| FROM  | When a link is receiving an event, it will use FROM to signal what service called it      |
+| GROUP | When a link is to be grouped with other events, it will use GROUP to establish a grouping |
+
+These links can be, but are not limited to, sent when a CDEvent has completed
+to some collector or to the link service itself. Further the link service will
+allow for tagging of various metadata.
+
+For instance, a `WithLink` may be used with some test that is ran in our custom
+pipeline called, `FooPipeline`. It runs a test suite, `BarSuite`, and runs tests `A`,
+`B`, and `C`.
+
+Assume our pipeline was triggered by some git website, which `FooPipeline`
+would create a `FromLink` indicating the trigger from our git website. Further
+at some point our pipeline decides to start the `BarSuite`. `BarSuite` would
+add a new `WithLink` saying it's with the `FooPipeline`. We can keep going with
+other `WithLink`s for each test, but each test will be "with"-ed the suite
+instead of the pipeline. It's important to note when using the `GROUP` link
+type, that links that are grouped under some `ID` will be grouped **under**
+that event.
+
+Some users may prefer to not run a separate links service especially if they
+know their overall flow may only contain a few links. If that is the case,
+simply turning on linking payload aggregation, will send all links in the
+payload. Mind you, this can make the payload very large, but may be good for
+debugging.
+
+![link flow](images/links_flow.jpeg)
+
+This show's an example of how these different types would be used in a CI/CD setting.
+
+## Links API
+* `/links/{global id}`
+  * `GET`
+    * Returns a list of links associated with some global ID
+
+    ```
+    REQUEST
+    ---
+    {
+        "maxLinks": ([0-9]+|null),
+        "pagination_token": <some-pagination-token> # can be null for first page
+    }
+
+    RESPONSE
+    ---
+    {
+        "global_id": <some UUID>,
+        "token": <pagination token>, # used for second request to retrieve further pages
+        "links": [
+            ...
+        ]
+    }
+    ```
+
+* `/links`
+  * `POST`
+    * Uploads a series of created links.
+
+  ```
+  REQUEST
+  ---
+    {
+        "links": [
+            ...
+        ]
+    }
+  ```
+
+* `/links`
+  * `GET`
+    * Returns a list of links
+
+  ```
+  REQUEST
+  ---
+    {
+      "pagination_token": "<some token>"
+    }
+  ```
+
+## Use Cases
+
+This section will go over a few use cases and explain how this approach can be
+used to solve for each particular case.
+
+### 1. Fan Out Fan In
+
+The fan out fan in use case is an example where a system may make parallel
+requests (fan out), and merge back to some other or the very same system (fan in)
+
+Let us assume we have 3 system in our CI/CD environment. A continuous
+integration environment, which we will call CI system,  that runs tests and
+builds artifacts, an artifact store that receives artifacts from the CI system,
+and lastly the CD system which consume these artifacts.
+
+### 2. Generic UI