Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Display errors in Agent activity with link to Logs #152583

Merged
merged 12 commits into from
Mar 7, 2023

Conversation

juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented Mar 2, 2023

Summary

Improvement of Agent activity to show action errors with a link to Review error logs

Part of #141206

Extended action_status API to return latest errors, these are the most recent docs from .fleet-action-results that require errors.
We could do something more clever like aggregate the most frequent errors and take the top hits from each bucket if that's a desirable feature to group the same errors together.

To verify:

  • Enroll agents (with horde/normally)
  • Trigger some actions with failures (e.g. upgrade agents that are not upgradeable, change artifact repo to an invalid url)
  • Go to Agent Activity and click on Show errors under the failed actions.
  • The last 3 errors will be shown, with buttons to Review error log. These are distinct errors per agent id.
  • Click on Review error log, verify that the Logs UI shows the expected filters (see here)
GET kbn:/api/fleet/agents/action_status

    {
      "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2",
      "nbAgentsActionCreated": 1,
      "nbAgentsAck": 0,
      "version": "8.6.1",
      "startTime": "2023-02-28T16:34:10.553Z",
      "type": "UPGRADE",
      "nbAgentsActioned": 102,
      "status": "FAILED",
      "expiration": "2023-02-28T16:54:10.553Z",
      "creationTime": "2023-02-28T16:34:50.352Z",
      "nbAgentsFailed": 102,
      "hasRolloutPeriod": true,
      "completionTime": "2023-02-28T16:39:28.000Z",
      "latestErrors": [
        {
          "agentId": "906560bc-2af4-4916-8261-3769e8c38931",
          "error": """failed verification of agent binary: 2 errors occurred:
	* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory
	* invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match

""",
          "timestamp": "2023-02-28T16:39:28Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e",
          "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d",
          "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        }
      ]
    },

Added an accordion on the UI to show error messages with a link to Logs.
In the design there was only one Review error logs button per action, I thought it is better to drill down to a specific agent id, we could do either/both.
See reasoning here #141206 (comment)

Latest styling, included host name on UI after feedback from Nima:
image

image

Checklist

@juliaElastic juliaElastic changed the title [Fleet] display errors in activity [Fleet] Display errors in Agent activity with link to Logs Mar 6, 2023
buildQuery({
agentId,
datasets: ['elastic_agent'],
logLevels: ['error'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to set some reasonable defaults here: query on agentId, elastic_agent dataset, error logs with position around the timestamp of the error (+-1h). We can tweak this if we want to include more datasets/log levels.

@juliaElastic juliaElastic marked this pull request as ready for review March 6, 2023 14:21
@juliaElastic juliaElastic requested a review from a team as a code owner March 6, 2023 14:21
@juliaElastic
Copy link
Contributor Author

@elasticmachine merge upstream

@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Mar 6, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@nchaulet nchaulet self-requested a review March 6, 2023 16:16
@nimarezainia
Copy link
Contributor

@juliaElastic same comment as before, but copying here also. Regarding the screenshot in #152583 (comment) - it would be more desirable to show the host.name instead of an agent.id (which as you mention is a uuid) to allow the user better correlate to the agent generating the error.

I would also limit the number of errors shown to say maybe 5 or 6. We may have thousands of agents that hit the same error so wouldn't want this fly out to be so cluttered.

import type { ActionStatus } from '../../../../types';
import { useStartServices } from '../../../../hooks';

const TruncatedEuiText = styled(EuiText)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we use the eui-textTruncate class here instead of a custom style?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried, though it doesn't work well with the auto sized table, the third column becomes invisible:

image

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
fleet 783 795 +12

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 933.6KB 937.1KB +3.5KB
Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 428 430 +2

Total ESLint disabled count

id before after diff
securitySolution 506 508 +2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

@juliaElastic juliaElastic merged commit cdc8ec0 into elastic:main Mar 7, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Mar 7, 2023
sloanelybutsurely pushed a commit to sloanelybutsurely/kibana that referenced this pull request Mar 8, 2023
…52583)

## Summary

Improvement of Agent activity to show action errors with a link to
`Review error logs`

Part of elastic#141206

Extended `action_status` API to return latest errors, these are the most
recent docs from `.fleet-action-results` that require errors.
We could do something more clever like aggregate the most frequent
errors and take the top hits from each bucket if that's a desirable
feature to group the same errors together.

To verify:
- Enroll agents (with horde/normally)
- Trigger some actions with failures (e.g. upgrade agents that are not
upgradeable, change artifact repo to an invalid url)
- Go to Agent Activity and click on `Show errors` under the failed
actions.
- The last 3 errors will be shown, with buttons to `Review error log`.
These are distinct errors per agent id.
- Click on `Review error log`, verify that the `Logs UI` shows the
expected filters (see
[here](elastic#152583 (comment)))

```
GET kbn:/api/fleet/agents/action_status

    {
      "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2",
      "nbAgentsActionCreated": 1,
      "nbAgentsAck": 0,
      "version": "8.6.1",
      "startTime": "2023-02-28T16:34:10.553Z",
      "type": "UPGRADE",
      "nbAgentsActioned": 102,
      "status": "FAILED",
      "expiration": "2023-02-28T16:54:10.553Z",
      "creationTime": "2023-02-28T16:34:50.352Z",
      "nbAgentsFailed": 102,
      "hasRolloutPeriod": true,
      "completionTime": "2023-02-28T16:39:28.000Z",
      "latestErrors": [
        {
          "agentId": "906560bc-2af4-4916-8261-3769e8c38931",
          "error": """failed verification of agent binary: 2 errors occurred:
	* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory
	* invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match

""",
          "timestamp": "2023-02-28T16:39:28Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e",
          "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d",
          "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        }
      ]
    },
```

Added an accordion on the UI to show error messages with a link to Logs.
In the design there was only one `Review error logs` button per action,
I thought it is better to drill down to a specific agent id, we could do
either/both.
See reasoning here
elastic#141206 (comment)

Latest styling, included host name on UI after feedback from Nima:
<img width="577" alt="image"
src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png">

<img width="1769" alt="image"
src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png">




### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
bmorelli25 pushed a commit to bmorelli25/kibana that referenced this pull request Mar 10, 2023
…52583)

## Summary

Improvement of Agent activity to show action errors with a link to
`Review error logs`

Part of elastic#141206

Extended `action_status` API to return latest errors, these are the most
recent docs from `.fleet-action-results` that require errors.
We could do something more clever like aggregate the most frequent
errors and take the top hits from each bucket if that's a desirable
feature to group the same errors together.

To verify:
- Enroll agents (with horde/normally)
- Trigger some actions with failures (e.g. upgrade agents that are not
upgradeable, change artifact repo to an invalid url)
- Go to Agent Activity and click on `Show errors` under the failed
actions.
- The last 3 errors will be shown, with buttons to `Review error log`.
These are distinct errors per agent id.
- Click on `Review error log`, verify that the `Logs UI` shows the
expected filters (see
[here](elastic#152583 (comment)))

```
GET kbn:/api/fleet/agents/action_status

    {
      "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2",
      "nbAgentsActionCreated": 1,
      "nbAgentsAck": 0,
      "version": "8.6.1",
      "startTime": "2023-02-28T16:34:10.553Z",
      "type": "UPGRADE",
      "nbAgentsActioned": 102,
      "status": "FAILED",
      "expiration": "2023-02-28T16:54:10.553Z",
      "creationTime": "2023-02-28T16:34:50.352Z",
      "nbAgentsFailed": 102,
      "hasRolloutPeriod": true,
      "completionTime": "2023-02-28T16:39:28.000Z",
      "latestErrors": [
        {
          "agentId": "906560bc-2af4-4916-8261-3769e8c38931",
          "error": """failed verification of agent binary: 2 errors occurred:
	* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory
	* invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match

""",
          "timestamp": "2023-02-28T16:39:28Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e",
          "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d",
          "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        }
      ]
    },
```

Added an accordion on the UI to show error messages with a link to Logs.
In the design there was only one `Review error logs` button per action,
I thought it is better to drill down to a specific agent id, we could do
either/both.
See reasoning here
elastic#141206 (comment)

Latest styling, included host name on UI after feedback from Nima:
<img width="577" alt="image"
src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png">

<img width="1769" alt="image"
src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png">




### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
nkhristinin pushed a commit that referenced this pull request Mar 22, 2023
## Summary

Improvement of Agent activity to show action errors with a link to
`Review error logs`

Part of #141206

Extended `action_status` API to return latest errors, these are the most
recent docs from `.fleet-action-results` that require errors.
We could do something more clever like aggregate the most frequent
errors and take the top hits from each bucket if that's a desirable
feature to group the same errors together.

To verify:
- Enroll agents (with horde/normally)
- Trigger some actions with failures (e.g. upgrade agents that are not
upgradeable, change artifact repo to an invalid url)
- Go to Agent Activity and click on `Show errors` under the failed
actions.
- The last 3 errors will be shown, with buttons to `Review error log`.
These are distinct errors per agent id.
- Click on `Review error log`, verify that the `Logs UI` shows the
expected filters (see
[here](#152583 (comment)))

```
GET kbn:/api/fleet/agents/action_status

    {
      "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2",
      "nbAgentsActionCreated": 1,
      "nbAgentsAck": 0,
      "version": "8.6.1",
      "startTime": "2023-02-28T16:34:10.553Z",
      "type": "UPGRADE",
      "nbAgentsActioned": 102,
      "status": "FAILED",
      "expiration": "2023-02-28T16:54:10.553Z",
      "creationTime": "2023-02-28T16:34:50.352Z",
      "nbAgentsFailed": 102,
      "hasRolloutPeriod": true,
      "completionTime": "2023-02-28T16:39:28.000Z",
      "latestErrors": [
        {
          "agentId": "906560bc-2af4-4916-8261-3769e8c38931",
          "error": """failed verification of agent binary: 2 errors occurred:
	* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory
	* invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match

""",
          "timestamp": "2023-02-28T16:39:28Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e",
          "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d",
          "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        }
      ]
    },
```

Added an accordion on the UI to show error messages with a link to Logs.
In the design there was only one `Review error logs` button per action,
I thought it is better to drill down to a specific agent id, we could do
either/both.
See reasoning here
#141206 (comment)

Latest styling, included host name on UI after feedback from Nima:
<img width="577" alt="image"
src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png">

<img width="1769" alt="image"
src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png">




### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:enhancement Team:Fleet Team label for Observability Data Collection Fleet team v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants