Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana authentication troubleshooting guide #83914

Open
azasypkin opened this issue Nov 20, 2020 · 5 comments
Open

Kibana authentication troubleshooting guide #83914

azasypkin opened this issue Nov 20, 2020 · 5 comments
Labels
discuss docs Feature:Security/Authentication Platform Security - Authentication Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!

Comments

@azasypkin
Copy link
Member

azasypkin commented Nov 20, 2020

Kibana authentication sub-system includes quite a bit of functionality these days and it's not always easy to troubleshoot problems or misconfiguration in this area. This issue is intended to gather the most common issues our users are experiencing with our authentication layer and ideas on how we can help them to troubleshoot these issues.

We can tackle this from two different angles:

  • Provide a proper troubleshooting guide similar to Common SAML issues guide Elasticsearch team created.

  • Try to detect possible configuration issues in the code and log appropriate warnings.

Most frequent issues

Inconsistent (autogenerated) xpack.security.encryptionKey in Kibana HA setup

This is by far the most common source of confusion. If one instance of Kibana cannot decrypt cookie that was created by another instance the cookie will be cleared.

What we already do:

  • We log a warning on a Kibana startup if key is autogenerated Generating a random key for xpack.security.encryptionKey. To prevent sessions from being invalidated on restart, please set xpack.security.encryptionKey in the kibana.yml or use the bin/kibana-encryption-keys command.
  • Core logs the following message when it cannot decrypt cookie: [debug][server][Kibana][cookie-session-storage][http] Error: Unauthorized
  • We mention that the key should be same in HA setup in our docs

What we can do:

  • Additionally explain this issue in the troubleshooting guide
  • Core can log a bit more helpful message if it cannot decrypt cookie

Inconsistent session and authentication settings in Kibana HA setup

Every instance of Kibana schedules a regular session cleanup job to remove sessions that weren't explicitly invalidated. There are number of criteria we use to determine that session can be safely removed, but the most notable are:

  • If Kibana is configured with a non-0/null session lifespan or idle timeout it will remove all existing sessions that were created without one.
  • Kibana will also remove all sessions that are created using providers that aren't configured anymore (based on | tuple).

That means that if multiple Kibana instances that rely on the same .kibana-x index have different session or providers settings then a cleanup job scheduled by one Kibana instance may invalidate sessions created by another instance. By default, a cleanup job is run on startup and every hour after that, so users may experience sporadic logouts that may be hard to debug.

What we already do:

  • When a cleanup job removes sessions it logs a message similar to Cleaned up 5 invalid or expired session values. that we can potentially correlate with user logouts. But it's too vague and may not indicate any problem.

What we can do:

  • Additionally explain this issue in the troubleshooting guide
  • We can make cleanup a bit more complex and slow, but specifically log sessions that were removed because of config mismatch, I'd wait till it becomes a problem before doing anything here.

Multi-tenancy using the same host name, but different ports

Per RFC6265 cookies for a given host are shared across all the ports on that host, even though the usual "same-origin policy" used by web browsers isolates content retrieved via different ports. That means that if you have multiple Kibana tenants (Kibana instances that use different .kibana-x indices) that are using the same host name, but different ports then the session cookies will be shared between them.

This will lead to sporadic logouts if both tenants are opened in the same browsing context (same browser window) since if one tenant receives a session cookie that references to a session that lives in another tenant then the cookie will be treated as invalid and Kibana will clear it.

The most correct solution is to never host different applications on the same hostname because of a cookie leak. If that's not possible then the workaround is to configure different session cookie names for every tenant with xpack.security.cookieName setting.

What we already do:

  • If tenants don't share encryption key, this case will be indistinguishable from Inconsistent (autogenerated) xpack.security.encryptionKey in Kibana HA setup. Otherwise, we detect this case and log the following debug message: Session value is not available in the index, session cookie will be invalidated.

What we can do:

  • Additionally explain this issue in the troubleshooting guide

Multiple authentication providers without Login Selector

It's still possible to use multiple authentication providers even if Login Selector is disabled. The support is very limited though and we generally discourage our users from that setup. The main reason why we still support this is BWC. There is nothing we can do here, so I'll just outline few notable thing about this setup:

  • When user opens Kibana only the provider with the lowest order will try to authenticate them
  • If a provider that uses Kibana native login form (e.g. basic or token right now) is configured in addition to other providers (e.g. saml or oidc), but the order is higher than that of another provider it's still possible to use it to log in even though it's not used automatically. To do this one should log out and go to the /login route directly.
  • In this setup one can also log in to Kibana using multiple SAML/OIDC providers even if their order isn't the lowest one, but only through IdP or OP initiated login. The caveat here is that user may want to log out from Kibana and in this case they may automatically re-login using the provider with the lowest order instead.

What we can do:

  • Discourage, discourage, discourage and eventually deprecate

Kibana session settings vs access/refresh token expiration

Many of the Kibana authentication providers use Elasticsearch access/refresh tokens under the hood: SAML, OpenID Connect, PKI, Kerberos and Token. And these tokens also have their own expiration settings, that are separate from Kibana's own session expiration settings:

If Kibana's session idle timeout is higher than the expiration time of the underlying access token Kibana will automatically refresh the access token once user becomes active again. But if admin disabled or set Kibana session idle timeout or lifespan higher than 24 hours and user isn't active during this period then underlying refresh token expires and access token cannot be refreshed anymore. Such setup effectively limits Kibana session timeouts to 24 hours.

For example, if Kibana is configured to work with one of the token based authentication providers, and admin wants to disable idle timeout they would do something like this:

xpack.security.session.idleTimeout: 0

But in reality, because of hard-coded 24 hours lifetime of the refresh token, idle timeout will be approximately equal to only 24 hours.

It's even more problematic for the PKI authentication since Elasticsearch doesn't provide refresh token in this case at all effectively limiting idle timeout for the PKI authentication provider to the lifetime of the access token (max 1 hour).

What we already do:

  • We briefly mention this limitation in our docs and only for SAML and OIDC.

What we can do:

  • Additionally explain this issue in the troubleshooting guide
  • Mention this for every affected provider in the main documentation
  • Fine tune session config schema:
    • Don't allow disabling provider specific idle timeout for all token based authentication providers
    • Limit max provider specific idle timeout to 24 hours for SAML/OIDC/Kerberos/Token and to 1 hour for PKI with a custom error message
    • Detect if any of the two conditions are violated through the global session settings and log a clear warning if so.

Misconfigured role mappings

It's more of an Elasticsearch issue, but it's usually Kibana where user is finally stuck, so we can try to help to debug this.

What we already do:

  • We display You do not have permission to access the requested page screen that is already a good enough solution.

What we can do:

  • We can and will eventually disable login if user doesn't have enough privileges to access anything in Kibana.
  • We might want to document /internal/security/me endpoint in a troubleshooting guide so that admins can see what roles where exactly applied to a particular user.

Misconfigured refresh interval for the security-related indices

Security-related indices (and many other system indices) are very sensible to refresh intervals higher than 1s as most update operations are issued with a wait_for_refresh in order to guarantee concurrent edits.

Changing default refresh intervals for the security-related indices is highly discouraged. Typical causes of this are match all index templates which set some common settings or mappings to all indices, or if user mistakenly sets a common refresh interval to ALL indices.

note This should happen less frequently once #134900 merges (target 8.4.0).

This can lead to significant delays and failures during request authentication. To make sure the security-related indices have proper refresh intervals, you can check settings file in the Elastic support diagnostics bundle:

misconfig

@elastic/kibana-security I'll be gradually filling this issue with info I remember, but please feel free to comment here or edit issue description to include issues you know about that I missed.

@azasypkin azasypkin added discuss Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! Feature:Security/Authentication Platform Security - Authentication docs labels Nov 20, 2020
@azasypkin
Copy link
Member Author

Okay, I described all the cases I could remember so far. I'll get back to this issue in a few weeks so that everyone has time to share any other ideas/issues.

@legrego
Copy link
Member

legrego commented Dec 7, 2020

Thanks for putting this together! I agree with a lot of what you said here, and I don't see any glaring omissions.

Multiple authentication providers without Login Selector

Would it be possible to use the new auth_provider_hint query parameter to attempt authentication?

Discourage, discourage, discourage and eventually deprecate

As much as I'd love to deprecate this, I worry that we will end up having to support this in some capacity.

@azasypkin
Copy link
Member Author

azasypkin commented Dec 7, 2020

Would it be possible to use the new auth_provider_hint query parameter to attempt authentication?

Yeah, it should allow you to pick any provider.

As much as I'd love to deprecate this, I worry that we will end up having to support this in some capacity.

Right, my suspicion is that many users upgrade Kibana and just keep their legacy authc config and hence don't leverage Login Selector by default. And right now our Telemetry cannot tell us whether it's the case or users explicitly disabled Login Selector. In 8.0.0 when we drop legacy config completely we'll be able to see how many users explicitly disable it.

@exalate-issue-sync exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Aug 5, 2021
@legrego legrego removed EnableJiraSync loe:small Small Level of Effort impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. labels Aug 18, 2022
@aniketpant1
Copy link

We are currently encountering with this issue. Recently we have integrated Azure AD OIDC realms for authentication in elasticsearch and kibana. Our end users who is using kibana is frequently logging out within 5 minutes 1 hour . They need to re-login again providing email id and password and passcode. If we disable/commented session settings from kibana.yml file it is still logging out. We have escalated this issue to elastic engineers.
We couldn't able to find what causing this frequent logout . Besides of setting xpack.security.session.idleTimeout: "15m" xpack.security.session.lifespan: "24h" it is logging out.

@azasypkin
Copy link
Member Author

We have escalated this issue to elastic engineers.

If you've escalated this issue to our support team, we'll look into it soon. If you don't have access to our support, then please post this question at our Discuss forum. There much more users like you that can help and probably already solved the problem you have. The GitHub issue isn't the right place to debug issues like that.

Having said that, I'm almost sure that you have multiple Kibana instances connected to the same cluster that have different security configurations or something along these lines: https://www.elastic.co/guide/en/kibana/current/production.html#load-balancing-kibana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss docs Feature:Security/Authentication Platform Security - Authentication Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!
Projects
None yet
Development

No branches or pull requests

3 participants