Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Kibana knowledge base entry assets #807

Merged
merged 13 commits into from
Oct 3, 2024
8 changes: 8 additions & 0 deletions code/go/pkg/validator/validator_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ func TestValidateFile(t *testing.T) {
"custom_ilm_policy": {},
"profiling_symbolizer": {},
"logs_synthetic_mode": {},
"knowledge_base": {},
"bad_additional_content": {
"bad-bad",
[]string{
Expand Down Expand Up @@ -227,6 +228,13 @@ func TestValidateFile(t *testing.T) {
`field data_stream.vars.data_stream.dataset: Does not match pattern '^[a-zA-Z0-9]+[a-zA-Z0-9\._]*$'`,
},
},
"bad_knowledge_base": {
"kibana/kb_entry/foo/manifest.yml",
[]string{
`field (root): Additional property unknown is not allowed`,
`field index: name is required`,
},
},
}

for pkgName, test := range tests {
Expand Down
3 changes: 3 additions & 0 deletions spec/changelog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
- description: Add support for content packages.
type: enhancement
link: https://github.com/elastic/package-spec/pull/777
- description: Add support for "Kibana knowledge base entry" assets.
type: enhancement
link: https://github.com/elastic/package-spec/pull/807
- version: 3.3.0-next
changes:
- description: Add support for `slo` assets.
Expand Down
5 changes: 5 additions & 0 deletions spec/content/kibana/spec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,8 @@ spec:
pattern: '^{PACKAGE_NAME}-.+\.json$'
forbiddenPatterns:
- '^.+-(ecs|ECS)\.json$' # ECS suffix is forbidden
- description: Folder containing Kibana knowledge base entries
type: folder
name: kb_entry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to abbreviate, kb may be actually confusing, it could be "kibana" around this context. Would it make sense to call this directly "knowledge_base"?

Suggested change
name: kb_entry
name: knowledge_base

Or I would also prefer "knowledge_base_entry" rather than its abbreviated form.

Suggested change
name: kb_entry
name: knowledge_base_entry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's totally fair. knowledge_base_entry sounds good, I will adapt accordingly

required: false
$ref: "../../integration/kibana/kb_entry/spec.yml"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I decided to make knowledge base entries (first-class) citizens of the kibana spec instead of having them live in their own, dedicated root folder.

The reasoning is that:

  1. KB entries will be "managed" by Kibana
  2. KB entries will only be used via Kibana (at least for now)

So I felt like it made more sense to have it done that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that this was just data, and it could maybe be reused for other features, but your reasoning makes sense. If it is managed and used only by Kibana, let's keep it under kibana.

at least for now

Do you foresee other uses of this data?

Copy link
Contributor Author

@pgayvallet pgayvallet Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that this was just data, and it could maybe be reused for other features

Yeah I was initially thinking that too, but the KB entry installation will do more than just creating the index and indexing the docs - it will also create an entry in the knowledge base registry to register the source, so I felt like it would be potentially complicated to convert that to some abstract generic concept...

Do you foresee other uses of this data?

We might want at some point to expose this data more directly to the end user, and make is less of an implementation detail. But we don't have any use case for this right now, so I think we should just ignore it for now and eventually revisit later.

45 changes: 45 additions & 0 deletions spec/integration/kibana/kb_entry/manifest.spec.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
##
## Describes the specification for a Kibana knowledge base entry's manifest.yml file
##
spec:
type: object
additionalProperties: false
properties:
name:
description: The name of the knowledge base entry
type: string
description:
description: The name of the knowledge base entry
type: string
index:
type: object
additionalProperties: false
properties:
name:
description: The name of the index associated with the knowledge base entry
type: string
system:
description: Specify whether the index is system-managed or not
type: boolean
required:
- name
- system
retrieval:
type: object
additionalProperties: false
properties:
syntactic_fields:
description: List of fields that should be used for syntactic search during retrieval.
type: array
items:
type: string
semantic_fields:
description: List of fields that should be used for semantic search during retrieval.
type: array
items:
type: string
required:
- syntactic_fields
- semantic_fields
required:
- name
27 changes: 27 additions & 0 deletions spec/integration/kibana/kb_entry/spec.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
spec:
additionalContents: false
totalContentsLimit: 50
contents:
- description: Folder containing a single knowledge base entry definition
type: folder
pattern: '^[a-z0-9_]+[a-z0-9]$'
required: true
additionalContents: false
contents:
- description: A knowledge base entry's manifest file
type: file
contentMediaType: "application/x-yaml"
sizeLimit: 2MB
name: "manifest.yml"
required: true
$ref: "./manifest.spec.yml"
- description: A knowledge base entry's index mapping
type: file
contentMediaType: "application/json"
sizeLimit: 5MB
name: "index-mapping.json"
required: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to KISS and keep the index's mapping as a plain json file, first because I don't think adding validation on the index's mapping format or having it as a yaml file would bring much value (AFAIK we're not doing it for index templates, mappings are just defined as object type with allowed extra props), and I also did not use the fields.yml feature because we don't need, or even want, any kind of discoverability on this type of content.

Please tell me if you think this doesn't make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that having to convert to the fields.yml format can be some additional effort, but it has some advantages.

For example, Fleet already has the code to generate mappings from this format, and over that it can more easily add opinionated settings or mappings, for example settings that could be defined by users, or fields known to be everywhere.

Also, elastic-package has code to perform validation of documents based on the mappings defined in the package. This could be leveraged for example to check that the provided content comply with the defined mappings.

As you mention, in principle these features are not wanted at least by now. But just in case and if it is only for consistency with other features, I would prefer to use the fields.yml format.

Copy link
Contributor Author

@pgayvallet pgayvallet Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So another part of the reasoning was that each knowledge base entry can, technically, have different mappings. I'm not sure how that would play with that fields.yml feature, given that from my understanding, this is only a top level section in the package?

Or should we use the same format but with a different pattern, like having one fields.yml file per knowledge base entry folder (that would then replace the index-mappings.json file)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We reuse the fields definitions on different places. The more direct way to add it here would be to have a fields directory here, something like this:

        - description: Folder containing field definitions to be used as the mappings for the index template
          type: folder
          name: fields
          required: false
          $ref: "../../data_stream/fields/spec.yml"

- description: A knowledge base entry's content file
type: file
contentMediaType: "application/json"
pattern: '^content(\.[0-9]+)?\.json$'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested by @jsoriano, we're allowing an arbitrary number of file following the fixed content.json and
dynamic content.{num}.json naming format. Each file have a single documents property containing the source of the documents we will be indexing into the KB's index.

I did not add any validation on the file's content format, both because I wasn't sure we can add validation on JSON documents, and also because we wouldn't have much to validation (only thing we could do it make sure the only top level property present is documents and that it's an array).

But if there is a way to perform validation on JSON docs with the spec validator, I can absolutely add it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested by @jsoriano, we're allowing an arbitrary number of file following the fixed content.json and
dynamic content.{num}.json naming format. Each file have a single documents property containing the source of the documents we will be indexing into the KB's index.

👍

In other cases where we allow multiple files we allow to use letters too, so it can give meaningful names when this is used for organizational purposes, and we use to separate with hyphens.

Suggested change
- description: A knowledge base entry's content file
type: file
contentMediaType: "application/json"
pattern: '^content(\.[0-9]+)?\.json$'
- description: A knowledge base entry's content file
type: file
contentMediaType: "application/json"
pattern: '^content(-[a-z0-9]+)?\.json$'

I did not add any validation on the file's content format, both because I wasn't sure we can add validation on JSON documents, and also because we wouldn't have much to validation (only thing we could do it make sure the only top level property present is documents and that it's an array).

But if there is a way to perform validation on JSON docs with the spec validator, I can absolutely add it.

In principle we can add validation to JSON documents if we have a schema, but not sure if it is worth for this case.

The spec checks that the JSON is valid for files with contentMediaType: "application/json".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other cases where we allow multiple files we allow to use letters too, so it can give meaningful names when this is used for organizational purposes, and we use to separate with hyphens.

Sounds good, I will follow the pattern for consistency. I'll just wait for the other question about the format (json vs ndjson) to be addressed to perform both changes at the same time if required.

11 changes: 10 additions & 1 deletion spec/integration/kibana/spec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,17 @@ spec:
contentMediaType: "application/json"
pattern: '^{PACKAGE_NAME}-.+\.json$'
forbiddenPatterns:
- '^.+-(ecs|ECS)\.json$' # ECS suffix is forbidden
- '^.+-(ecs|ECS)\.json$' # ECS suffix is forbidden
- description: Folder containing Kibana knowledge base entries
type: folder
name: kb_entry
required: false
$ref: "./kb_entry/spec.yml"
versions:
- before: 3.4.0
patch:
- op: remove
path: "/contents/14" # remove kb_entry definitions
- before: 3.3.0
patch:
- op: remove
Expand Down
93 changes: 93 additions & 0 deletions test/packages/bad_knowledge_base/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
Elastic License 2.0

URL: https://www.elastic.co/licensing/elastic-license

## Acceptance

By using the software, you agree to all of the terms and conditions below.

## Copyright License

The licensor grants you a non-exclusive, royalty-free, worldwide,
non-sublicensable, non-transferable license to use, copy, distribute, make
available, and prepare derivative works of the software, in each case subject to
the limitations and conditions below.

## Limitations

You may not provide the software to third parties as a hosted or managed
service, where the service provides users with access to any substantial set of
the features or functionality of the software.

You may not move, change, disable, or circumvent the license key functionality
in the software, and you may not remove or obscure any functionality in the
software that is protected by the license key.

You may not alter, remove, or obscure any licensing, copyright, or other notices
of the licensor in the software. Any use of the licensor’s trademarks is subject
to applicable law.

## Patents

The licensor grants you a license, under any patent claims the licensor can
license, or becomes able to license, to make, have made, use, sell, offer for
sale, import and have imported the software, in each case subject to the
limitations and conditions in this license. This license does not cover any
patent claims that you cause to be infringed by modifications or additions to
the software. If you or your company make any written claim that the software
infringes or contributes to infringement of any patent, your patent license for
the software granted under these terms ends immediately. If your company makes
such a claim, your patent license ends immediately for work on behalf of your
company.

## Notices

You must ensure that anyone who gets a copy of any part of the software from you
also gets a copy of these terms.

If you modify the software, you must include in any modified copies of the
software prominent notices stating that you have modified the software.

## No Other Rights

These terms do not imply any licenses other than those expressly granted in
these terms.

## Termination

If you use the software in violation of these terms, such use is not licensed,
and your licenses will automatically terminate. If the licensor provides you
with a notice of your violation, and you cease all violation of this license no
later than 30 days after you receive that notice, your licenses will be
reinstated retroactively. However, if you violate these terms after such
reinstatement, any additional violation of these terms will cause your licenses
to terminate automatically and permanently.

## No Liability

*As far as the law allows, the software comes as is, without any warranty or
condition, and the licensor will not be liable to you for any damages arising
out of these terms or the use or nature of the software, under any kind of
legal claim.*

## Definitions

The **licensor** is the entity offering these terms, and the **software** is the
software the licensor makes available under these terms, including any portion
of it.

**you** refers to the individual or entity agreeing to these terms.

**your company** is any legal entity, sole proprietorship, or other kind of
organization that you work for, plus all organizations that have control over,
are under the control of, or are under common control with that
organization. **control** means ownership of substantially all the assets of an
entity, or the power to direct its management and policies by vote, contract, or
otherwise. Control can be direct or indirect.

**your licenses** are all the licenses granted to you for the software under
these terms.

**use** means anything you do with the software requiring one of your licenses.

**trademark** means trademarks, service marks, and similar rights.
5 changes: 5 additions & 0 deletions test/packages/bad_knowledge_base/changelog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
- version: 0.1.0
changes:
- description: Initial release
type: enhancement
link: https://github.com/elastic/package-spec/pull/807
1 change: 1 addition & 0 deletions test/packages/bad_knowledge_base/docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a template for the package README.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions test/packages/bad_knowledge_base/img/system.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
106 changes: 106 additions & 0 deletions test/packages/bad_knowledge_base/kibana/kb_entry/foo/content.0.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"documents": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this file could be just a ndjson file.

Is the documents entry needed? Do we expect other entries in this file?

How are these files generated? Do we expect manual editions on these files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm being (probably overly) cautious with NDJSON, as we've learned the hard way with Kibana import/export that making the format evolve can be a pain (very difficult to add global metadata in addition to the entries). But for Kibana import/export, we're only generating a single file, we don't have a full folder structure like we do there, so the limitations of the format were more impactful.

So no, I don't foresee us needing to add anything to those files, so if you prefer using ndjson, I'm fine with it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion, but I think I'd prefer using ndjson here, yes. If there is some case were we need to add metadata in the future we can use the manifest file, or some other file.

{
"content_title": "Kibana 8.13.3 | Kibana Guide [8.15] | Elastic",
"content_body": {
"text": "\n\nKibana 8.13.3edit\n\nThe 8.13.3 release includes the following bug fixes.\nBug Fixesedit\n\n\n\n\nAlerting\n\n\n\n\n\n\nManage loading fields at initialization (#180412).\n\n\n\n\n\n\nElastic Security\n\n\n\nFor the Elastic Security 8.13.3 release information, refer to Elastic Security Solution Release Notes.\n\n\n\nFleet\n\n\n\n\n\n\nFixes managed agent policy preconfiguration update (#181624).\n\n\nUse lowercase dataset in template names (#180887).\n\n\nFixes KQL/kuery for getting Fleet Server agent count (#180650).\n\n\n\n\n\n\nIndex Management\n\n\n\n\n\n\nFixes allow_auto_create field in the Index Template form (#178321).\n\n\n\n\n\n\nLens & Visualizations\n\n\n\n\n\n\nFixes controls on fields with custom label (#180615).\n\n\n\n\n\n\nMachine Learning\n\n\n\n\n\n\nFixes deep link for Index data visualizer & ES|QL data visualizer (#180389).\n\n\n\n\n\n\nObservability\n\n\n\n\n\n\nMake anomalyDetectorTypes optional (#180717).\n\n\n\n\n\n\nSharedUX\n\n\n\n\n\n\nRevert change to shared UX markdown component for dashboard vis (#180906).\n\n\n\n\n\n\nSharing\n\n\n\n\n\n\nDefault to saved object description when panel description is not provided (#181177).\n\n\n\n\n\n\n",
"inference": {
"inference_id": "kibana-elser2",
"model_settings": {
"task_type": "sparse_embedding"
},
"chunks": [
{
"text": "Kibana 8.13.3edit\n\nThe 8.13.3 release includes the following bug fixes.\nBug Fixesedit\n\n\n\n\nAlerting\n\n\n\n\n\n\nManage loading fields at initialization (#180412).\n\n\n\n\n\n\nElastic Security\n\n\n\nFor the Elastic Security 8.13.3 release information, refer to Elastic Security Solution Release Notes.\n\n\n\nFleet\n\n\n\n\n\n\nFixes managed agent policy preconfiguration update (#181624).\n\n\nUse lowercase dataset in template names (#180887).\n\n\nFixes KQL/kuery for getting Fleet Server agent count (#180650).\n\n\n\n\n\n\nIndex Management\n\n\n\n\n\n\nFixes allow_auto_create field in the Index Template form (#178321).\n\n\n\n\n\n\nLens & Visualizations\n\n\n\n\n\n\nFixes controls on fields with custom label (#180615).\n\n\n\n\n\n\nMachine Learning\n\n\n\n\n\n\nFixes deep link for Index data visualizer & ES|QL data visualizer (#180389).\n\n\n\n\n\n\nObservability\n\n\n\n\n\n\nMake anomalyDetectorTypes optional (#180717).\n\n\n\n\n\n\nSharedUX\n\n\n\n\n\n\nRevert change to shared UX markdown component for dashboard vis (#180906).\n\n\n\n\n\n\nSharing\n\n\n\n\n\n\nDefault to saved object description when panel description is not provided (#181177).",
"embeddings": {
"3": 0.5052793,
"8": 1.3079101
}
}
]
}
},
"root_type": "documentation",
"ai_subtitle": {
"text": "Kibana 8.13.3 Update; Critical Bug Fixes in Alerting, Fleet, Index Management, Visualizations, ML, Observability, SharedUX, Sharing.",
"inference": {
"inference_id": "kibana-elser2",
"model_settings": {
"task_type": "sparse_embedding"
},
"chunks": [
{
"text": "Kibana 8.13.3 Update; Critical Bug Fixes in Alerting, Fleet, Index Management, Visualizations, ML, Observability, SharedUX, Sharing.",
"embeddings": {
"3": 0.37727332,
"8": 1.184852
}
}
]
}
},
"ai_summary": {
"text": "Kibana 8.13.3 Release; Alerting; Elastic Security; Fleet; Index Management; Lens & Visualizations; Machine Learning; Observability; SharedUX; Sharing. The Kibana 8.13.3 update delivers critical bug fixes across various components including Alerting, Fleet, Index Management, Lens & Visualizations, Machine Learning, Observability, SharedUX, and Sharing. Key improvements involve loading fields at initialization for Alerting, managed policy updates and lowercase dataset handling in Fleet, auto-create settings in Index Management, field controls in Lens & Visualizations, deep linking in Machine Learning, anomaly detection in Observability, markdown component in SharedUX, and saved object descriptions in Sharing.",
"inference": {
"inference_id": "kibana-elser2",
"model_settings": {
"task_type": "sparse_embedding"
},
"chunks": [
{
"text": "Kibana 8.13.3 Release; Alerting; Elastic Security; Fleet; Index Management; Lens & Visualizations; Machine Learning; Observability; SharedUX; Sharing. The Kibana 8.13.3 update delivers critical bug fixes across various components including Alerting, Fleet, Index Management, Lens & Visualizations, Machine Learning, Observability, SharedUX, and Sharing. Key improvements involve loading fields at initialization for Alerting, managed policy updates and lowercase dataset handling in Fleet, auto-create settings in Index Management, field controls in Lens & Visualizations, deep linking in Machine Learning, anomaly detection in Observability, markdown component in SharedUX, and saved object descriptions in Sharing.",
"embeddings": {
"3": 0.48509678,
"7": 0.029003777
}
}
]
}
},
"ai_tags": [
"Kibana",
"Alerting",
"Elastic Security",
"Fleet",
"Index Management",
"Lens & Visualizations",
"Machine Learning",
"Observability",
"SharedUX",
"Sharing"
],
"product_name": "Kibana",
"version": "8.15",
"slug": "guide-en-kibana-release-notes-8.13.3.html",
"url": "https://www.elastic.co/guide/en/kibana/8.15/release-notes-8.13.3.html",
"ai_questions_answered": {
"text": [
"What are the main components updated in Kibana 8.13.3?",
"How does the Alerting feature improve in Kibana 8.13.3?",
"What changes were made to Fleet in the Kibana 8.13.3 release?",
"What was the fix applied to Index Management in Kibana 8.13.3?",
"How were field controls enhanced in Lens & Visualizations for Kibana 8.13.3?",
"What Machine Learning deep linking issues were fixed in Kibana 8.13.3?",
"What anomaly detection updates are included in the Observability module for Kibana 8.13.3?",
"What changes occurred in the SharedUX markdown component in Kibana 8.13.3?",
"How does the Sharing feature handle descriptions differently in Kibana 8.13.3?"
],
"inference": {
"inference_id": "kibana-elser2",
"model_settings": {
"task_type": "sparse_embedding"
},
"chunks": [
{
"text": "What are the main components updated in Kibana 8.13.3?",
"embeddings": {
"0": 0.06638469,
"3": 0.4976597
}
}
]
}
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{
"dynamic": "strict",
"properties": {
"content_title": {
"type": "text"
},
"content_body": {
"type": "semantic_text",
"inference_id": "kibana-elser2"
},
"product_name": {
"type": "keyword"
},
"root_type": {
"type": "keyword"
},
"slug": {
"type": "keyword"
},
"url": {
"type": "keyword"
},
"version": {
"type": "version"
},
"ai_subtitle": {
"type": "semantic_text",
"inference_id": "kibana-elser2"
},
"ai_summary": {
"type": "semantic_text",
"inference_id": "kibana-elser2"
},
"ai_questions_answered": {
"type": "semantic_text",
"inference_id": "kibana-elser2"
},
"ai_tags": {
"type": "keyword"
}
}
}
Loading