-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Display errors in Agent activity with link to Logs #152583
[Fleet] Display errors in Agent activity with link to Logs #152583
Conversation
buildQuery({ | ||
agentId, | ||
datasets: ['elastic_agent'], | ||
logLevels: ['error'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to set some reasonable defaults here: query on agentId
, elastic_agent
dataset, error
logs with position around the timestamp of the error (+-1h). We can tweak this if we want to include more datasets/log levels.
@elasticmachine merge upstream |
Pinging @elastic/fleet (Team:Fleet) |
@juliaElastic same comment as before, but copying here also. Regarding the screenshot in #152583 (comment) - it would be more desirable to show the host.name instead of an agent.id (which as you mention is a uuid) to allow the user better correlate to the agent generating the error. I would also limit the number of errors shown to say maybe 5 or 6. We may have thousands of agents that hit the same error so wouldn't want this fly out to be so cluttered. |
import type { ActionStatus } from '../../../../types'; | ||
import { useStartServices } from '../../../../hooks'; | ||
|
||
const TruncatedEuiText = styled(EuiText)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we use the eui-textTruncate
class here instead of a custom style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
💚 Build Succeeded
Metrics [docs]Module Count
Async chunks
Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
…52583) ## Summary Improvement of Agent activity to show action errors with a link to `Review error logs` Part of elastic#141206 Extended `action_status` API to return latest errors, these are the most recent docs from `.fleet-action-results` that require errors. We could do something more clever like aggregate the most frequent errors and take the top hits from each bucket if that's a desirable feature to group the same errors together. To verify: - Enroll agents (with horde/normally) - Trigger some actions with failures (e.g. upgrade agents that are not upgradeable, change artifact repo to an invalid url) - Go to Agent Activity and click on `Show errors` under the failed actions. - The last 3 errors will be shown, with buttons to `Review error log`. These are distinct errors per agent id. - Click on `Review error log`, verify that the `Logs UI` shows the expected filters (see [here](elastic#152583 (comment))) ``` GET kbn:/api/fleet/agents/action_status { "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2", "nbAgentsActionCreated": 1, "nbAgentsAck": 0, "version": "8.6.1", "startTime": "2023-02-28T16:34:10.553Z", "type": "UPGRADE", "nbAgentsActioned": 102, "status": "FAILED", "expiration": "2023-02-28T16:54:10.553Z", "creationTime": "2023-02-28T16:34:50.352Z", "nbAgentsFailed": 102, "hasRolloutPeriod": true, "completionTime": "2023-02-28T16:39:28.000Z", "latestErrors": [ { "agentId": "906560bc-2af4-4916-8261-3769e8c38931", "error": """failed verification of agent binary: 2 errors occurred: * fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory * invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match """, "timestamp": "2023-02-28T16:39:28Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e", "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d", "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" } ] }, ``` Added an accordion on the UI to show error messages with a link to Logs. In the design there was only one `Review error logs` button per action, I thought it is better to drill down to a specific agent id, we could do either/both. See reasoning here elastic#141206 (comment) Latest styling, included host name on UI after feedback from Nima: <img width="577" alt="image" src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png"> <img width="1769" alt="image" src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png"> ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
…52583) ## Summary Improvement of Agent activity to show action errors with a link to `Review error logs` Part of elastic#141206 Extended `action_status` API to return latest errors, these are the most recent docs from `.fleet-action-results` that require errors. We could do something more clever like aggregate the most frequent errors and take the top hits from each bucket if that's a desirable feature to group the same errors together. To verify: - Enroll agents (with horde/normally) - Trigger some actions with failures (e.g. upgrade agents that are not upgradeable, change artifact repo to an invalid url) - Go to Agent Activity and click on `Show errors` under the failed actions. - The last 3 errors will be shown, with buttons to `Review error log`. These are distinct errors per agent id. - Click on `Review error log`, verify that the `Logs UI` shows the expected filters (see [here](elastic#152583 (comment))) ``` GET kbn:/api/fleet/agents/action_status { "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2", "nbAgentsActionCreated": 1, "nbAgentsAck": 0, "version": "8.6.1", "startTime": "2023-02-28T16:34:10.553Z", "type": "UPGRADE", "nbAgentsActioned": 102, "status": "FAILED", "expiration": "2023-02-28T16:54:10.553Z", "creationTime": "2023-02-28T16:34:50.352Z", "nbAgentsFailed": 102, "hasRolloutPeriod": true, "completionTime": "2023-02-28T16:39:28.000Z", "latestErrors": [ { "agentId": "906560bc-2af4-4916-8261-3769e8c38931", "error": """failed verification of agent binary: 2 errors occurred: * fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory * invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match """, "timestamp": "2023-02-28T16:39:28Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e", "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d", "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" } ] }, ``` Added an accordion on the UI to show error messages with a link to Logs. In the design there was only one `Review error logs` button per action, I thought it is better to drill down to a specific agent id, we could do either/both. See reasoning here elastic#141206 (comment) Latest styling, included host name on UI after feedback from Nima: <img width="577" alt="image" src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png"> <img width="1769" alt="image" src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png"> ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Improvement of Agent activity to show action errors with a link to `Review error logs` Part of #141206 Extended `action_status` API to return latest errors, these are the most recent docs from `.fleet-action-results` that require errors. We could do something more clever like aggregate the most frequent errors and take the top hits from each bucket if that's a desirable feature to group the same errors together. To verify: - Enroll agents (with horde/normally) - Trigger some actions with failures (e.g. upgrade agents that are not upgradeable, change artifact repo to an invalid url) - Go to Agent Activity and click on `Show errors` under the failed actions. - The last 3 errors will be shown, with buttons to `Review error log`. These are distinct errors per agent id. - Click on `Review error log`, verify that the `Logs UI` shows the expected filters (see [here](#152583 (comment))) ``` GET kbn:/api/fleet/agents/action_status { "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2", "nbAgentsActionCreated": 1, "nbAgentsAck": 0, "version": "8.6.1", "startTime": "2023-02-28T16:34:10.553Z", "type": "UPGRADE", "nbAgentsActioned": 102, "status": "FAILED", "expiration": "2023-02-28T16:54:10.553Z", "creationTime": "2023-02-28T16:34:50.352Z", "nbAgentsFailed": 102, "hasRolloutPeriod": true, "completionTime": "2023-02-28T16:39:28.000Z", "latestErrors": [ { "agentId": "906560bc-2af4-4916-8261-3769e8c38931", "error": """failed verification of agent binary: 2 errors occurred: * fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory * invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match """, "timestamp": "2023-02-28T16:39:28Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e", "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d", "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" } ] }, ``` Added an accordion on the UI to show error messages with a link to Logs. In the design there was only one `Review error logs` button per action, I thought it is better to drill down to a specific agent id, we could do either/both. See reasoning here #141206 (comment) Latest styling, included host name on UI after feedback from Nima: <img width="577" alt="image" src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png"> <img width="1769" alt="image" src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png"> ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Summary
Improvement of Agent activity to show action errors with a link to
Review error logs
Part of #141206
Extended
action_status
API to return latest errors, these are the most recent docs from.fleet-action-results
that require errors.We could do something more clever like aggregate the most frequent errors and take the top hits from each bucket if that's a desirable feature to group the same errors together.
To verify:
Show errors
under the failed actions.Review error log
. These are distinct errors per agent id.Review error log
, verify that theLogs UI
shows the expected filters (see here)Added an accordion on the UI to show error messages with a link to Logs.
In the design there was only one
Review error logs
button per action, I thought it is better to drill down to a specific agent id, we could do either/both.See reasoning here #141206 (comment)
Latest styling, included host name on UI after feedback from Nima:
Checklist