Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds validations for Saved Object types when calling create or bulkCreate. #118969

Merged
merged 15 commits into from
Jan 5, 2022

Conversation

lukeelmers
Copy link
Member

@lukeelmers lukeelmers commented Nov 17, 2021

Closes #104088

Summary

This adds a schemas option to SavedObjectType that will allow consumers to provide a validation schema for the attributes of their saved object type. This schema is provided per-version similar to migration functions.

coreSetup.savedObjects.registerType({
  name: 'my-type',
  hidden: false,
  namespaceType: 'agnostic',
  mappings: {
    properties: {
      a: { type: 'integer' },
      b: { type: 'text' },
      c: { type: 'boolean' },
    },
  },
  schemas: {
    '8.0.0': schema.object({
      a: schema.number(),
      b: schema.string(),
    }),
    '8.1.0': schema.object({
      a: schema.number(),
      b: schema.string(),
      c: schema.maybe(schema.boolean()),
    })
  }
});

This is not a full implementation of SO type validation; there are a few caveats:

  • Validation is only enforced on create and bulkCreate ATM, because for update operations users are not guaranteed to provide all required fields, which makes strict validation much more difficult. More discussion on this in the original issue.
  • When a new object is added via create and bulkCreate, SO repository does an on-the-fly migration on the object as a convenience. Since objects could be coming from any version, we decided to only run validations after the migrations are complete in this initial implementation.
    • This means that even though consumers can provide schemas for multiple Kibana versions, we will not yet be validating objects against any of those schemas other than the one for the current kibanaVersion.

A more complete solution would be to run validations from within the migration algorithm itself so that each document could be validated before it is passed to the migration function for the current version... But this is something that would be much longer term as it would change the current behavior of migrations which are permissive and can selectively ignore invalid data: #104088 (comment)

Plugin API Changes

Saved Objects now support optional validation schemas using @kbn/config-schema.

When a validation schema is provided during type registration, it will be used to validate subsequent calls to the SavedObjectClient's create and bulkCreate APIs. Validation schemas are not currently enforced on update operations.

We recommend adding a new validation schema any time attributes are added, removed, or changed in your Saved Object type, as they help protect data integrity by preventing users from inadvertently importing malformed objects.

Usage example:

import { SavedObjectsValidationMap } from 'src/core/server';

const validationMap: SavedObjectsValidationMap = {
  '8.0.0': schema.object({
    foo: schema.string(),
  }),
  '8.1.0': schema.object({
    foo: schema.string({
      minLength: 2,
      validate(value) {
        if (!/^[a-z]+$/.test(value)) {
          return 'must be lowercase letters only';
        }
      }
    }),
  }),
};

coreSetup.savedObjects.registerType({
  name: 'my-type',
  hidden: false,
  namespaceType: 'agnostic',
  mappings: {
    properties: {
      foo: { type: 'text' },
    },
  },
  migrations: {
    '8.1.0': (doc) => ({ ...doc, attributes: { ...doc.attributes, foo: doc.attributes.foo.toLowerCase() } }),
  },
  schemas: validationMap,
});

@lukeelmers lukeelmers added release_note:plugin_api_changes Contains a Plugin API changes section for the breaking plugin API changes section. Feature:Saved Objects backport:skip This commit does not require backporting v8.1.0 labels Nov 17, 2021
@lukeelmers lukeelmers self-assigned this Nov 17, 2021
Copy link
Member Author

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-review

Comment on lines 59 to 66
* '2.1.0': (data) => {
* if (typeof data.bar !== 'string') {
* throw new Error(`[bar]: expected value of type [string] but got [${typeof data.bar}]`);
* }
* if (typeof data.foo !== 'string' && typeof data.foo !== 'boolean') {
* throw new Error(`[foo]: expected value of type [string,boolean] but got [${typeof data.foo}]`);
* }
* }
Copy link
Member Author

@lukeelmers lukeelmers Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be a good idea to allow for custom validation functions, similar to what we do for http routes, because it feels like an inevitability that someone would eventually ask to do something that can't be handled by @kbn/config-schema.

However, implementing the custom functions just raised more questions that I'd like to get others' thoughts on:

  1. IMO a validation function isn't the right place for folks to actually change the contents of the objects, so for now I structured the validation functions differently from how they are set up for http routes... I didn't include a factory for returning ok(value) or badRequest(). It's just a function that exists to throw errors, and otherwise doesn't return. It doesn't let you modify or do anything with the data.
    • Is it worth trying to follow the same factory pattern for consistency and keeping safeguards to prevent mutation, or do we prefer this simpler approach?
  2. Do we think users need access to anything other than the attributes? Currently the SO repository just passes attributes in. But there may be situations for custom functions where it is useful to have more info about the object.
    • A more future-proof alternative might be providing ({ attributes }) to the validation functions, so that we can add more later if it proves to be necessary, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion on (1) yet.

(2) I think we should rather pass in the entire saved object, would there be any downside? Might not be a strong use case, but theoretically you could want to do validation on the references e.g. a dashboard can't have references to a config saved object type or on the updated_at field, e.g. can't set a date in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth trying to follow the same factory pattern for consistency and keeping safeguards to prevent mutation, or do we prefer this simpler approach

Ihmo let's KISS for now, and just document that custom validation function should not be used to mutate the data. If at some point we discover that API consumers are doing, or planning to do, so, we can then put real safeguards.

(2) I think we should rather pass in the entire saved object, would there be any downside?

I agree that it would probably be helpful to allow owners to validate more than the attributes, BUT this means that custom validation functions would be more powerful than schema validation, which is an idea I hate to be honest, as ideally config-schema validation should always be favored unless there is a very specific reason to use custom validation instead. This is what we did for route validation for instance.

So if we think that it make sense to validate more than the attributes (which I think I agree on), we should find a way to do so also when using the by-schema validation? wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree. I think the kind of validation I describe is related to the business domain and should ideally live in a layer one higher than saved objects, e.g. a service exposed to other plugins or in the http controller. In such a layer you could do async validation like fetching other saved objects before accepting a create.

So an alternative could be to make sure we have schema validation in core for "root properties" like references and updated_at and then keep only validation on attributes. And then if someone needs more powerful validation the answer is "build your own http API" like we anyway already recommend.

I was thinking when all saved objects are hidden: true saved object validation might become obsolete, but we would always need import/export validation and this would skip any business logic in other layers. I still feel a bit uneasy about import, you wouldn't always apply the same validation as on a create since the import includes other related objects, and it becomes quite messy to enforce the same business logic on an HTTP API as on a batch of imported objects. One potential solution could be to "sign" exports with e.g. a hash of it's contents so that we know it wasn't tampered with and can safely import it, but it also removes the usefulness of this API to some extent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an alternative could be to make sure we have schema validation in core for "root properties" like references and updated_at and then keep only validation on attributes. And then if someone needs more powerful validation the answer is "build your own http API" like we anyway already recommend.

That would be aligned with our recommendations, so I'd say it make sense.

I was thinking when all saved objects are hidden: true saved object validation might become obsolete, but we would always need import/export validation and this would skip any business logic in other layers

When all types will have their own validation layer directly implemented in their services / endpoints, I think that import/export would be the only place remaining owned by core where we'll want to perform validation yes. Once this is the case, I imagine we could still allow owners to register a specific validation function / hook for this with their types, to be used specifically when validating during import or export?

One potential solution could be to "sign" exports with e.g. a hash of it's contents so that we know it wasn't tampered with and can safely import it, but it also removes the usefulness of this API to some extent.

I think quite a few consumers are currently tampering or manually creating export files, so I'm not sure if we could introduce such thing without it being a breaking change. Plus, given that I think we'll want validation for import/export regardless, I'm not sure this would bring a significant plus value, as least for validation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ihmo let's KISS for now, and just document that custom validation function should not be used to mutate the data.

Sounds good, I hate the idea of someone abusing this as a hook to rewrite attributes, but I also think a wait-and-see approach is reasonable here.

So an alternative could be to make sure we have schema validation in core for "root properties" like references and updated_at and then keep only validation on attributes

This sounds like a good middle ground, but to keep us future proof if we change our minds, I'll still update the custom validation functions to take an object with a single attributes key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and added a full schema validation for the "root properties" that gets applied as well, so we are now validating the whole object, not just attributes.

src/core/server/saved_objects/validation/validator.ts Outdated Show resolved Hide resolved

private validateFunction(data: A, validateFn: SavedObjectsValidationFunction) {
try {
validateFn({ ...data }); // shallow clone to prevent mutation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in my other comment, maybe this should be future-proofed?

Suggested change
validateFn({ ...data }); // shallow clone to prevent mutation
validateFn({ attributes: { ...data } }); // shallow clone to prevent mutation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to future-proofing from the get go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shallow clone does not really prevent mutation, e.g

const data = {
   foo: {
      bar: 12
   }
};

const shallow = { ...data };

shallow.foo.bar = 7;
data.foo.bar // <== 7

(and having attributes with nested properties is quite common)

So as I said on a previous comment, I think we should either KISS and do nothing for now, or do it properly via a proper deepCopy or another mutation-prevention pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shallow clone does not really prevent mutation

Yeah I know, I was trying to hit a middle ground between the perf hit of a deep clone while still providing some protection against mutation. But point taken about nested properties being common.

For now I'll go with KISS as discussed in the thread above. If we spot anyone abusing this, we can come back and deep clone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I know, I was trying to hit a middle ground between the perf hit of a deep clone while still providing some protection against mutation.

The problem with this approach that is implicit. validateFn({ attributes: Object.freeze(data) } }); will throw an error, so Kibana devs can see the mutations are explicitly forbidden.

@lukeelmers lukeelmers changed the title Adds validations for Saved Object types. Adds validations for Saved Object types when calling create or bulkCreate. Nov 18, 2021
@lukeelmers lukeelmers marked this pull request as ready for review November 19, 2021 00:49
@lukeelmers lukeelmers requested a review from a team as a code owner November 19, 2021 00:49
@lukeelmers lukeelmers added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Nov 19, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it's looking good as an initial implementation. I still wish we could figure out how to add validation against updates though.


private validateFunction(data: A, validateFn: SavedObjectsValidationFunction) {
try {
validateFn({ ...data }); // shallow clone to prevent mutation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to future-proofing from the get go

src/core/server/saved_objects/validation/validator.ts Outdated Show resolved Hide resolved
/** Validate a migrated doc against the registered saved object type's schema. */
private validateObjectAttributes<T>(type: string, doc: SavedObjectSanitizedDoc<T>) {
const savedObjectType = this._registry.getType(type);
if (!savedObjectType?.schemas) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we log a warning if there isn't a validation schema declared? That way, we'll have some way of monitoring migration failures (or any other failures from malformed SO creation actions) over time and address the critical ones that cause issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, we need to enable it in dev mode only, as it's a developer warning and not a production one.

It could be very verbose though, and I'm not sure it would replace the usual meta issue pinging all the type owners anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be very verbose though, and I'm not sure it would replace the usual meta issue pinging all the type owners anyway?

Verbosity is what we want in debug mode but you're right, we don't know if it will help in the long run or be more "noise" than we can reasonably sift through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to track the SO without validation? How soon do we want to get other teams to cover SO with validation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that with the latest updates to this PR we are now performing validation on the whole SO regardless of whether any validations have been registered for the attributes, IMO it is okay to do the usual pinging of type owners so that they can decide how to prioritize adding their own validations for the actual attributes. I think for some teams this will likely be a much larger concern than for others, so I don't think we necessarily need to put a timetable on it yet...

src/core/server/saved_objects/validation/validator.ts Outdated Show resolved Hide resolved
src/core/server/saved_objects/validation/validator.ts Outdated Show resolved Hide resolved
src/core/server/saved_objects/validation/validator.ts Outdated Show resolved Hide resolved
/** Validate a migrated doc against the registered saved object type's schema. */
private validateObjectAttributes<T>(type: string, doc: SavedObjectSanitizedDoc<T>) {
const savedObjectType = this._registry.getType(type);
if (!savedObjectType?.schemas) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, we need to enable it in dev mode only, as it's a developer warning and not a production one.

It could be very verbose though, and I'm not sure it would replace the usual meta issue pinging all the type owners anyway?

Comment on lines 2235 to 2238
const validator = new SavedObjectsTypeValidator({
type,
validationMap: savedObjectType.schemas,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT/optional: do we really need to instantiate this on every call to validateObjectAttributes? can't we instantiate a validator per type when the repository is created?

src/core/server/saved_objects/types.ts Outdated Show resolved Hide resolved
Comment on lines 59 to 66
* '2.1.0': (data) => {
* if (typeof data.bar !== 'string') {
* throw new Error(`[bar]: expected value of type [string] but got [${typeof data.bar}]`);
* }
* if (typeof data.foo !== 'string' && typeof data.foo !== 'boolean') {
* throw new Error(`[foo]: expected value of type [string,boolean] but got [${typeof data.foo}]`);
* }
* }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth trying to follow the same factory pattern for consistency and keeping safeguards to prevent mutation, or do we prefer this simpler approach

Ihmo let's KISS for now, and just document that custom validation function should not be used to mutate the data. If at some point we discover that API consumers are doing, or planning to do, so, we can then put real safeguards.

(2) I think we should rather pass in the entire saved object, would there be any downside?

I agree that it would probably be helpful to allow owners to validate more than the attributes, BUT this means that custom validation functions would be more powerful than schema validation, which is an idea I hate to be honest, as ideally config-schema validation should always be favored unless there is a very specific reason to use custom validation instead. This is what we did for route validation for instance.

So if we think that it make sense to validate more than the attributes (which I think I agree on), we should find a way to do so also when using the by-schema validation? wdyt?

src/core/server/saved_objects/validation/validator.ts Outdated Show resolved Hide resolved

private validateFunction(data: A, validateFn: SavedObjectsValidationFunction) {
try {
validateFn({ ...data }); // shallow clone to prevent mutation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shallow clone does not really prevent mutation, e.g

const data = {
   foo: {
      bar: 12
   }
};

const shallow = { ...data };

shallow.foo.bar = 7;
data.foo.bar // <== 7

(and having attributes with nested properties is quite common)

So as I said on a previous comment, I think we should either KISS and do nothing for now, or do it properly via a proper deepCopy or another mutation-prevention pattern.

src/core/server/saved_objects/validation/validator.ts Outdated Show resolved Hide resolved
Copy link
Member Author

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - the last week has been crazy, but was finally able to pick this back up.

src/core/server/saved_objects/validation/validator.ts Outdated Show resolved Hide resolved

private validateFunction(data: A, validateFn: SavedObjectsValidationFunction) {
try {
validateFn({ ...data }); // shallow clone to prevent mutation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shallow clone does not really prevent mutation

Yeah I know, I was trying to hit a middle ground between the perf hit of a deep clone while still providing some protection against mutation. But point taken about nested properties being common.

For now I'll go with KISS as discussed in the thread above. If we spot anyone abusing this, we can come back and deep clone.

Comment on lines 59 to 66
* '2.1.0': (data) => {
* if (typeof data.bar !== 'string') {
* throw new Error(`[bar]: expected value of type [string] but got [${typeof data.bar}]`);
* }
* if (typeof data.foo !== 'string' && typeof data.foo !== 'boolean') {
* throw new Error(`[foo]: expected value of type [string,boolean] but got [${typeof data.foo}]`);
* }
* }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ihmo let's KISS for now, and just document that custom validation function should not be used to mutate the data.

Sounds good, I hate the idea of someone abusing this as a hook to rewrite attributes, but I also think a wait-and-see approach is reasonable here.

So an alternative could be to make sure we have schema validation in core for "root properties" like references and updated_at and then keep only validation on attributes

This sounds like a good middle ground, but to keep us future proof if we change our minds, I'll still update the custom validation functions to take an object with a single attributes key.

src/core/server/saved_objects/validation/validator.ts Outdated Show resolved Hide resolved
src/core/server/saved_objects/types.ts Outdated Show resolved Hide resolved
@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looking good. A bunch of comments and NITs, but overall LGTM

src/core/server/server.api.md Outdated Show resolved Hide resolved
src/core/server/saved_objects/validation/schema.ts Outdated Show resolved Hide resolved
src/core/server/saved_objects/validation/schema.ts Outdated Show resolved Hide resolved
src/core/server/saved_objects/validation/types.ts Outdated Show resolved Hide resolved
Comment on lines +46 to +47
const validationSchema = createSavedObjectSanitizedDocSchema(validationRule);
validationSchema.validate(data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT/optional: are we going to use a SavedObjectsTypeValidator to validate multiple objects? if so, we could eventually keep the instanciated per-version schemas to avoid reinstantiating them for each validation? Not sure if the perf increase is very significant though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think at least caching the validator per-type would be useful. Will do a small follow-up with this

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers lukeelmers enabled auto-merge (squash) January 5, 2022 03:51
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
core 1000 1001 +1
Unknown metric groups

API count

id before after diff
core 2329 2333 +4

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @lukeelmers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Saved Objects release_note:plugin_api_changes Contains a Plugin API changes section for the breaking plugin API changes section. Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Saved Objects] introduce per-type validation schemas.
8 participants