Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][skip-ci] Prevent plugins from blocking Kibana startup #45796

Merged
merged 17 commits into from
Dec 18, 2019
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
225 changes: 225 additions & 0 deletions rfcs/text/0006_lifecycle_unblocked.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
- Start Date: 2019-09-11
- RFC PR: (leave this empty)
- Kibana Issue: (leave this empty)

# Summary

Prevent plugin lifecycle methods from blocking Kibana startup by making the
following changes:
1. Synchronous lifecycle methods
2. Synchronous context provider functions
3. Core should not expose API's as observables

# Motivation
Plugin lifecycle methods and context provider functions are async
(promise-returning) functions. Core runs these functions in series and waits
for each plugin's lifecycle function to resolve before calling the next. This
allows plugins to depend on the API's returned from other plugins.

With the current design, a single lifecycle method or context provider that
blocks will block all of Kibana from starting up.

We should make it impossible for a single plugin lifecycle function to
stall all of kibana.

# Detailed design

### 1. Synchronous lifecycle methods
Lifecycle methods are synchronous functions, they can perform asynchronous
operations but Core doesn't wait for these to complete. This guarantees that
no plugin lifecycle function can block other plugins or core from starting up.

### 2. Synchronous Context Provider functions
Making context provider functions synchronous guarantees that a context
handler will never be blocked by registered context providers. They can expose
asynchronous API's which could potentially have blocking behaviour.

```ts
export type IContextProvider<
TContext extends Record<string, any>,
TContextName extends keyof TContext,
TProviderParameters extends any[] = []
> = (
context: Partial<TContext>,
...rest: TProviderParameters
) => TContext[TContextName];
```

### 3. Core should not expose API's as observables
All Core API's should be reactive, when internal state changes their behaviour
should change accordingly. But, exposing these internal state changes as part
of the API contract leaks internal implementation details consumers can't do
anything useful with and don't care about.

For example: Core currently exposes `core.elasticsearch.adminClient$`, an
Observable which emits a pre-configured elasticsearch client every time there's
a configuration change. This includes changes such as the elasticsearch
cluster `hosts` that alter the behaviour of the elasticsearch client. As a
plugin author who wants to make search requests against elasticsearch I
shouldn't have to care about, react to, or keep track of, how many times the
underlying configuration has changed. I want to use the `callAsInternalUser`
method and I expect Core to use the most up to date configuration to send this
request to the correct `hosts`.

This does not mean we should remove all observables from Core's API's. When an
API consumer is interested in the *state changes itself* it absolutely makes
sense to expose this as an Observable. Good examples of this is exposing
plugin config as this is state that changes over time to which a plugin should
directly react to.

This is important in the context of synchronous lifecycle methods and context
handlers since exposing convenient API's become very ugly:

*(3.1): exposing Observable-based API's through the route handler context:*
```ts
// Before: Using an asynchronous context provider
coreSetup.http.registerRouteHandlerContext(coreId, 'core', async (context, req) => {
const adminClient = await coreSetup.elasticsearch.adminClient$.pipe(take(1)).toPromise();
const dataClient = await coreSetup.elasticsearch.dataClient$.pipe(take(1)).toPromise();
return {
elasticsearch: {
adminClient: adminClient.asScoped(req),
dataClient: dataClient.asScoped(req),
},
};
});

// After: Using a synchronous context provider
coreSetup.http.registerRouteHandlerContext(coreId, 'core', async (context, req) => {
return {
elasticsearch: {
// (3.1.1) We can expose a convenient API by doing a lot of work
adminClient: () => {
callAsInternalUser: async (...args) => {
adminClient = await coreSetup.elasticsearch.adminClient$.pipe(take(1)).toPromise();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this proposal, every call of callAsInternalUser can have a different version of adminClient. I suspect it contradicts RFC handler design https://github.com/elastic/kibana/pull/36509/files#diff-8e0211fd03975bea94cb15762f6c7eaaR92

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples show what the implementation would look like if we implemented proposal (1) and (2) but not (3). If we do not implement (3), I agree that we would contradict the handler design.

If we implement (3) the context handler will look like example (4.3) which I don't think contradicts the handler design.

I didn't put this in the RFC itself, but I think the handler pattern tried to solve two problems:
A. Improve the ergonomics of consuming API's that are strongly tied to short-lived state like an incoming request. This means not having to call .asScoped(req) inside a request handler the whole time.
B. Improve the ergonomics of consuming Observable-based API's. This means not having to do await someApi$.pipe(first()).toPromise()

I believe proposal (3) solves (B) across all of Core's API's not just API's that are relevant in handlers. I think this is a better alternative to introducing context everywhere even if there isn't some kind of state to "fold" into the context. IMO the contexts should only be used to solve (A).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If context doesn't try to solve (B) because we solved it with (3) then functionality being "fixed" inside a context as described by the Context pattern becomes less important.

return adminClient.asScoped(req).callAsinternalUser(args);
},
callAsCurrentUser: async (...args) => {
adminClient = await coreSetup.elasticsearch.adminClient$.pipe(take(1)).toPromise();
return adminClient.asScoped(req).callAsCurrentUser(args);
}
},
// (3.1.2) Or a lazy approach which perpetuates the problem to consumers:
dataClient: async () => {
const dataClient = await coreSetup.elasticsearch.dataClient$.pipe(take(1)).toPromise();
dataClient.asScoped(req),
},
},
};
});
```

### 4. Complete example code
*(4.1) Doing async operations in a plugin's setup lifecycle*
```ts
export class Plugin {
public setup(core: CoreSetup) {
// Async setup is possible and any operations involving asynchronous API's
// will still block until these API's are ready, (savedObjects find only
// resolves once the elasticsearch client has established a connection to
// the cluster). The difference is that these details are now internal to
// the API.
(async () => {
const docs = await context.core.savedObjects.client.find({...});
...
await context.core.savedObjects.client.update(...);
})();
}
}
```

*(4.2) Exposing an API from a plugin's setup lifecycle*
```ts
export class Plugin {
public async setup(core: CoreSetup) {
return {
ping: async () => {
// async & await isn't necessary here, but makes example a bit clearer.
// Note that the elasticsearch client no longer exposes an adminClient$
// observable.
const result = await core.elasticsearch.adminClient.callAsInternalUser('ping', ...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging service and Elasticsearch services have a different strategy for handling config updates.
Elasticsearch service re-creates cluster client, so dependent plugins know for sure that something has changed and they should use a newer version of the API.
Logging service provides non-reactive API. it handles all config updates internally and dependent plugins know nothing about internal changes.
@azasypkin I believe you mentioned that different design decisions were done deliberately. Does the proposed approach eliminate all the benefits of having explicit elasticsearch client updates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of it, but the logging service is a great example of an API that hides the reactivity 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azasypkin I believe you mentioned that different design decisions were done deliberately.

Yeah, it was done on purpose, logging is something that is supposed to be used all over the place and adding Rx to the mix could turn into nightmare for the developers. I actually wanted to have the same for ES client, but we didn't get quorum on that in the past.

return result;
}
};
}
}
```

*(4.3) Exposing an observable free Elasticsearch API from the route context*
```ts
coreSetup.http.registerRouteHandlerContext(coreId, 'core', async (context, req) => {
return {
elasticsearch: {
adminClient: coreSetup.elasticsearch.adminClient.asScoped(req),
dataClient: coreSetup.elasticsearch.adminClient.asScoped(req),
},
};
});
```

# Drawbacks
Not being able to block on a lifecycle method also means plugins can no longer
be certain that all setup is complete before they reach the start lifecycle.
Plugins will have to manage such state internally. Core will still expose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes a plugin needs to get async API result. Does it mean that core API has to switch to observables? For example, we have a case for the security plugin
https://github.com/restrry/kibana/blob/48d34abd99ee17fa610e8e5d3ab8dd05e5e4d1d3/src/core/server/http/http_server.ts#L314-L332

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to implement this RFC for the createCookieSessionStorageFactory API could look something like this:

Core Setup API becomes synchronous:

export inter HttpServerSetup {
  createCookieSessionStorageFactory: <T>(
    cookieOptions: SessionStorageCookieOptions<T>
  ) => SessionStorageFactory<T>;

Using the SessionStorageFactory becomes async:

export interface SessionStorageFactory<T> {
  asScoped: (request: KibanaRequest) => Promise<SessionStorage<T>>;
}

So the "setup" API is completely synchronous, all registration happens synchronously, but using the API becomes async.

So this line in the security plugin will become

const sessionStorage = await this.options.sessionStorageFactory.asScoped(request);

special API's that are able block the setup lifecycle such as registering
Saved Object migrations, but this will be limited to operations where the risk
of blocking all of kibana starting up is limited.

# Alternatives
## 1. Introduce a lifecycle/context provider timeout
Lifecycle methods and context providers would timeout after X seconds and any
API's they expose would not be available if the timeout had been reached.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have clear boundaries for every stage. We could monitor execution time and log that a plugin initialization lasts too long (kinda similar to the next approach). After that, a user can decide to disable slow plugin or not.


Drawbacks:
1. For lifecycle methods: there would be no way to recover from a timeout,
once a timeout had been reached the API will remain unavailable.

Context providers have the benefit of being re-created for each handler
call, so a single timeout would not permanently disable the API.

3. Plugins have less control over their behaviour. When an upstream server
becomes unavailable, a plugin might prefer to keep retrying the request
indefinitely or only timeout after more than X seconds. It also isn't able
to expose detailed error information to downstream consumers such as
specifying which host or service is unavailable.

5. (minor) Introduces an additional failure condition that needs to be handled.
Consumers should handle the API not being available in setup, as well as,
error responses from the API itself. Since remote hosts like Elasticsearch
could go down even after a successful setup, this effectively means API
consumers have to handle the same error condition in two places.

## 2. Treat anything that blocks Kibana from starting up as a bug
Effectively do what we've always been doing. We can improve on the status quo
by logging detailed diagnostic info on any plugins that appear to be blocking
startup.

Drawbacks:
1. Since plugins load serially, even if they don't block startup, all the
delays could add up and potentially make startup very slow.
2. This opens up the potential for a third-party plugin to effectively "break"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if 3rd party plugin stucks on init step, it's hard to diagnose the problem

setup(core, deps) {
  return {
    async getData() {
       return await deps.pluginA.search('data'); // Timeout error because pluginA is not ready or request timeout?
    }
  }
}

kibana which creates a bad user experience.

# Adoption strategy (WIP)

Making context provider functions synchronous (2) and not exposing core API's
as observables (3) would require the least amount of change from plugins since
adoption on these API's are still fairly low.

Having synchronous lifecycle methods (1) would have a bigger impact on plugins
since most NP shims were built with asynchronous methods in mind.

# How we teach this (TBD)

What names and terminology work best for these concepts and why? How is this
idea best presented? As a continuation of existing Kibana patterns?

Would the acceptance of this proposal mean the Kibana documentation must be
re-organized or altered? Does it change how Kibana is taught to new developers
at any level?

How should this feature be taught to existing Kibana developers?

# Unresolved questions
Are the drawbacks worth the benefits or can we live with Kibana potentially
being blocked for the sake of convenient asynchronous lifecycle stages?