[SecuritySolution] Migrate bulk enable/diable rules to Alerting methods #180796

xcrzx · 2024-04-15T12:23:53Z

Resolves: #171350
Resolves partially: #177634

Summary

This PR migrates the /api/detection_engine/rules/_bulk_action API endpoint to use the rulesClient.bulkEnableRules and rulesClient.bulkDisableRules methods under the hood. This change helps mitigate Task Manager flooding when users enable many detection rules at once. The alerting framework's bulk methods implement task staggering to ensure that multiple tasks are not scheduled for execution simultaneously. For more details, refer to the ticket.

elasticmachine · 2024-04-17T15:11:04Z

Pinging @elastic/security-detections-response (Team:Detections and Resp)

elasticmachine · 2024-04-17T15:11:05Z

Pinging @elastic/security-solution (Team: SecuritySolution)

elasticmachine · 2024-04-17T15:11:06Z

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

jpdjere · 2024-04-18T09:43:40Z

...urity_solution/server/lib/detection_engine/rule_management/logic/bulk_actions/validations.ts

@@ -55,19 +55,15 @@ const throwMlAuthError = (mlAuthz: MlAuthz, ruleType: RuleType) =>
 * @param params - {@link BulkActionsValidationArgs}
 */
 export const validateBulkEnableRule = async ({ rule, mlAuthz }: BulkActionsValidationArgs) => {
-  if (!rule.enabled) {


What's the reasoning behind this change? If the rule is enabled but included in the payload to be enabled anyways, throwing a validation error for it doesn't look too helpful at first look. But would like to undersand your thought process here.

Because we now delegate to the rules client the decision of whether a rule should be enabled or not, we need to validate access to the rules regardless of their current state. This is necessary because a rule's status might change to enabled or disabled after our initial check, potentially due to a race condition.

maximpn

@xcrzx it's a great improvement to bulk actions 👍

I've tested enabling and disabling all installed Elastic prebuilt rules and it works like charm 🙌

The major part of the PR's diff is moving logic out bulk actions route file. A special thanks for that 👍 I left some nit comments. It looks like implementation can be improved. It's interesting to check if we can use rulesClient aggregations to optimize rules data fetching. But it's mostly out of this PR's scope.

maximpn · 2024-04-18T08:52:53Z

x-pack/plugins/alerting/server/rules_client/methods/bulk_enable.ts

+ * Updating too many rules in parallel can cause the denial of service of the
+ * Elasticsearch cluster.
+ */
+const MAX_RULES_TO_UPDATE_IN_PARALLEL = 50;


nit: I saw constant 50 a few times but there is no explanation why it works the best than 20 or 70. Is it possible to add a comment to explain reasoning behind? Even if it's just a round number everyone will know that.

I've noticed the same constant used several times across Kibana. I'm not sure why it was originally chosen, but it works pretty well in the rules management area, so I don't see a need to question it too much. My intuition tells me that this default is sensible 🙂

maximpn · 2024-04-18T09:15:25Z

...ver/lib/detection_engine/rule_management/api/rules/bulk_actions/bulk_enable_disable_rules.ts

+
+  // Go through the original rules array and update rules that were not returned
+  // as failed from the bulkEnableRules. We cannot rely on the results from the
+  // bulkEnableRules because the response is not consistent and might not


I'm curious what it means "the response isn't consistent"? Do you have examples of such inconsistent responses?

Yes, some rules you pass to the bulk method might disappear from the response. I've expanded on this in the comment and included a link to a corresponding issue with examples: #181050.

maximpn · 2024-04-18T09:43:36Z

...ver/lib/detection_engine/rule_management/api/rules/bulk_actions/bulk_enable_disable_rules.ts

+  const ruleIds = validatedRules.map(({ id }) => id);
+
+  // Perform actual update using the rulesClient
+  let results;


nit: results doesn't have to a variable. In this case it's necessary only to handle both enable and disable cases. It could be a const if defined like

const results = operation === 'enable' ? await rulesClient.bulkEnableRules({ ids: ruleIds }) : await rulesClient.bulkDisableRules({ ids: ruleIds });

or a separate function to return results.

maximpn · 2024-04-18T09:46:25Z

...r/lib/detection_engine/rule_management/api/rules/bulk_actions/fetch_rules_by_query_or_ids.ts

+import { findRules } from '../../../logic/search/find_rules';
+import { MAX_RULES_TO_PROCESS_TOTAL } from './route';
+
+export const fetchRulesByQueryOrIds = async ({


nit: This function should be split into fetchRulesByIds and fetchRulesByQuery since implementation doesn't have anything in common.

I think it's fine to keep those methods together. They encapsulate two different methods for retrieving rules based on the input and switch logic. From a business logic standpoint, it's a single operation: rule retrieval. How the rules are retrieved is just a detail of implementation.

maximpn · 2024-04-18T09:49:59Z

...ty_solution/server/lib/detection_engine/rule_management/api/rules/bulk_actions/route.test.ts

-    it('returns error if disable rule throws error', async () => {
-      clients.rulesClient.disable.mockImplementation(async () => {
-        throw new Error('Test error');
+    it('returns error if disable rule returns an error', async () => {


Suggested change

it('returns error if disable rule returns an error', async () => {

it('returns an error when rulesClient.bulkDisableRules fails', async () => {

jpdjere

Thanks for these changes and the more than welcome refactor! Tested and all looking good 👍 ✅

jpdjere · 2024-04-18T09:48:16Z

...ecurity_solution/server/lib/detection_engine/rule_management/api/rules/bulk_actions/route.ts

-): IKibanaResponse<BulkEditActionResponse> => {
-  const numSucceeded = updated.length + created.length + deleted.length;
-  const numSkipped = skipped.length;
-  const numFailed = errors.length;
-
-  const summary: BulkEditActionSummary = {
-    failed: numFailed,
-    succeeded: numSucceeded,
-    skipped: numSkipped,
-    total: numSucceeded + numFailed + numSkipped,
-  };
-
-  // if response is for dry_run, empty lists of rules returned, as rules are not actually updated and stored within ES
-  // thus, it's impossible to return reliably updated/duplicated/deleted rules
-  const results: BulkEditActionResults = isDryRun
-    ? {
-        updated: [],
-        created: [],
-        deleted: [],
-        skipped: [],
-      }
-    : {
-        updated: updated.map((rule) => internalRuleToAPIResponse(rule)),
-        created: created.map((rule) => internalRuleToAPIResponse(rule)),
-        deleted: deleted.map((rule) => internalRuleToAPIResponse(rule)),
-        skipped,
-      };
-
-  if (numFailed > 0) {
-    return response.custom<BulkEditActionResponse>({
-      headers: { 'content-type': 'application/json' },
-      body: {
-        message: summary.succeeded > 0 ? 'Bulk edit partially failed' : 'Bulk edit failed',
-        status_code: 500,
-        attributes: {
-          errors: normalizeErrorResponse(errors),
-          results,
-          summary,
-        },
-      },
-      statusCode: 500,
-    });
-  }
-
-  const responseBody: BulkEditActionResponse = {
-    success: true,
-    rules_count: summary.total,
-    attributes: { results, summary },
-  };
-
-  return response.ok({ body: responseBody });
-};
-
-const fetchRulesByQueryOrIds = async ({
-  query,
-  ids,
-  rulesClient,
-  abortSignal,
-}: {
-  query: string | undefined;
-  ids: string[] | undefined;
-  rulesClient: RulesClient;
-  abortSignal: AbortSignal;
-}): Promise<PromisePoolOutcome<string, RuleAlertType>> => {
-  if (ids) {
-    return initPromisePool({
-      concurrency: MAX_RULES_TO_UPDATE_IN_PARALLEL,
-      items: ids,
-      executor: async (id: string) => {
-        const rule = await readRules({ id, rulesClient, ruleId: undefined });
-        if (rule == null) {
-          throw Error('Rule not found');
-        }
-        return rule;
-      },
-      abortSignal,
-    });
-  }
-
-  const { data, total } = await findRules({
-    rulesClient,
-    perPage: MAX_RULES_TO_PROCESS_TOTAL,
-    filter: query,
-    page: undefined,
-    sortField: undefined,
-    sortOrder: undefined,
-    fields: undefined,
-  });
-
-  if (total > MAX_RULES_TO_PROCESS_TOTAL) {
-    throw new BadRequestError(
-      `More than ${MAX_RULES_TO_PROCESS_TOTAL} rules matched the filter query. Try to narrow it down.`
-    );
-  }
-
-  return {
-    results: data.map((rule) => ({ item: rule.id, result: rule })),
-    errors: [],
-  };
-};


Very welcome refactor 👍 🚀 Thanks!

jpdjere · 2024-04-18T09:50:33Z

...ecurity_solution/server/lib/detection_engine/rule_management/api/rules/bulk_actions/route.ts

+          const actionsClient = ctx.actions.getActionsClient();

-          const { getExporter, getClient } = (await ctx.core).savedObjects;
+          const { getExporter, getClient } = ctx.core.savedObjects;


🤔

What changed here? Were we just awaiting properties that didn't need awaiting?

That's right. ctx.core is not a promise, so no need for await.

jpdjere · 2024-04-18T09:58:46Z

...ver/lib/detection_engine/rule_management/api/rules/bulk_actions/bulk_enable_disable_rules.ts

+
+  // Go through the original rules array and update rules that were not returned
+  // as failed from the bulkEnableRules. We cannot rely on the results from the
+  // bulkEnableRules because the response is not consistent and might not


"Not consistent" how? Is this logic not correct?

total number of rules to enable/disable = results.rules.length + results.errors.length

I.e., sometimes some of the passed ruleIds are completely dropped from the response?

Yes, some rules you pass to the bulk method might disappear from the response. I've expanded on this in the comment and included a link to a corresponding issue with examples: #181050.

JiaweiWu · 2024-04-18T14:54:23Z

x-pack/plugins/alerting/server/rules_client/methods/bulk_enable.ts

+      await pMap(
+        rulesFinderRules,
+        async (rule) => {


can we inline this so it doesn't create such a big diff

Inline what?

await pMap(rulesFinderRules, async (rule) => { try { if (scheduleValidationError) { ...

since it's hard to tell what actually changed in this PR

Hey @JiaweiWu, the formatting is applied by Prettier - like it or not, there's nothing I can do about it. The actual change is the addition of the concurrency: MAX_RULES_TO_UPDATE_IN_PARALLEL param to the pMap config. It was previously set up incorrectly, which led to too many simultaneous requests to Elasticsearch, resulting in 503 errors when enabling many rules at once.

sounds good

JiaweiWu

LGTM

kibana-ci · 2024-04-22T11:43:41Z

💚 Build Succeeded

Buildkite Build
Commit: acdeb8d

Metrics [docs]

✅ unchanged

History

💔 Build #204880 failed 3c57bd7
💚 Build #204597 succeeded 20247e4
💚 Build #204351 succeeded e5c6dca
💔 Build #204011 failed dfc63f3
💔 Build #203986 failed e0c7fa0

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @xcrzx

…ds (elastic#180796) **Resolves: elastic#171350 **Resolves partially: elastic#177634 ## Summary This PR migrates the `/api/detection_engine/rules/_bulk_action` API endpoint to use the `rulesClient.bulkEnableRules` and `rulesClient.bulkDisableRules` methods under the hood. This change helps mitigate Task Manager flooding when users enable many detection rules at once. The alerting framework's bulk methods implement task staggering to ensure that multiple tasks are not scheduled for execution simultaneously. For more details, refer to [the ticket](elastic#171350). (cherry picked from commit 2169c0f)

kibanamachine · 2024-04-22T13:25:06Z

💚 All backports created successfully

Status	Branch	Result
✅	8.14

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

…g methods (#180796) (#181312) # Backport This will backport the following commits from `main` to `8.14`: - [[SecuritySolution] Migrate bulk enable/diable rules to Alerting methods (#180796)](#180796)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Dmitrii Shevchenko <dmitrii.shevchenko@elastic.co>

xcrzx self-assigned this Apr 15, 2024

xcrzx force-pushed the bulk-enable-rules branch 6 times, most recently from 3ca10e9 to e5c6dca Compare April 17, 2024 12:28

xcrzx marked this pull request as ready for review April 17, 2024 15:11

xcrzx requested review from a team as code owners April 17, 2024 15:11

xcrzx requested a review from maximpn April 17, 2024 15:11

xcrzx mentioned this pull request Apr 18, 2024

[Security Solution] Improve rules enable/disable reliability by migrating to Alerting bulk methods #177634

Closed

jpdjere reviewed Apr 18, 2024

View reviewed changes

maximpn approved these changes Apr 18, 2024

View reviewed changes

jpdjere approved these changes Apr 18, 2024

View reviewed changes

banderror added bug Fixes for quality problems that affect the customer experience impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. v8.14.0 v8.15.0 and removed 8.14 candidate labels Apr 18, 2024

xcrzx force-pushed the bulk-enable-rules branch from e5c6dca to 20247e4 Compare April 18, 2024 11:17

JiaweiWu reviewed Apr 18, 2024

View reviewed changes

JiaweiWu approved these changes Apr 22, 2024

View reviewed changes

xcrzx force-pushed the bulk-enable-rules branch from 20247e4 to 3c57bd7 Compare April 22, 2024 08:42

Migrate bulk enable/diable rules to Alerting methods

acdeb8d

xcrzx force-pushed the bulk-enable-rules branch from 3c57bd7 to acdeb8d Compare April 22, 2024 10:12

xcrzx merged commit 2169c0f into elastic:main Apr 22, 2024
37 checks passed

kibanamachine mentioned this pull request Apr 22, 2024

[8.14] [SecuritySolution] Migrate bulk enable/diable rules to Alerting methods (#180796) #181312

Merged

szaffarano mentioned this pull request Jul 12, 2024

[8.14] [Telemetry][Security Solution] Use the proper index to query builtin alerts (#187859) #188233

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SecuritySolution] Migrate bulk enable/diable rules to Alerting methods #180796

[SecuritySolution] Migrate bulk enable/diable rules to Alerting methods #180796

xcrzx commented Apr 15, 2024 •

edited by kibanamachine

Loading

elasticmachine commented Apr 17, 2024

elasticmachine commented Apr 17, 2024

elasticmachine commented Apr 17, 2024

jpdjere Apr 18, 2024 •

edited

Loading

xcrzx Apr 18, 2024

maximpn left a comment

maximpn Apr 18, 2024

xcrzx Apr 18, 2024

maximpn Apr 18, 2024

xcrzx Apr 18, 2024

maximpn Apr 18, 2024

maximpn Apr 18, 2024

xcrzx Apr 18, 2024

maximpn Apr 18, 2024

jpdjere left a comment

jpdjere Apr 18, 2024

jpdjere Apr 18, 2024

xcrzx Apr 18, 2024

jpdjere Apr 18, 2024 •

edited

Loading

xcrzx Apr 18, 2024

JiaweiWu Apr 18, 2024

xcrzx Apr 18, 2024

JiaweiWu Apr 22, 2024 •

edited

Loading

xcrzx Apr 22, 2024

JiaweiWu Apr 22, 2024

JiaweiWu left a comment

kibana-ci commented Apr 22, 2024

kibanamachine commented Apr 22, 2024

	it('returns error if disable rule returns an error', async () => {
	it('returns an error when rulesClient.bulkDisableRules fails', async () => {

[SecuritySolution] Migrate bulk enable/diable rules to Alerting methods #180796

[SecuritySolution] Migrate bulk enable/diable rules to Alerting methods #180796

Conversation

xcrzx commented Apr 15, 2024 • edited by kibanamachine Loading

Summary

elasticmachine commented Apr 17, 2024

elasticmachine commented Apr 17, 2024

elasticmachine commented Apr 17, 2024

jpdjere Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maximpn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpdjere left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpdjere Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiaweiWu Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiaweiWu left a comment

Choose a reason for hiding this comment

kibana-ci commented Apr 22, 2024

💚 Build Succeeded

Metrics [docs]

History

kibanamachine commented Apr 22, 2024

💚 All backports created successfully

Questions ?

xcrzx commented Apr 15, 2024 •

edited by kibanamachine

Loading

jpdjere Apr 18, 2024 •

edited

Loading

jpdjere Apr 18, 2024 •

edited

Loading

JiaweiWu Apr 22, 2024 •

edited

Loading