EQL: Add CircuitBreaker for sequence queries #74381

matriv · 2021-06-21T21:14:54Z

The sequence matching algorithm holds some structures to keep track of
the matched and potentially matching sequences of events. When large
amount of events, sequence stages needs to be processed but also when
the requested size of the query (number of sequences to return) is large,
those structure can potentially increase the memory footprint.

Add a CircuitBreaker which can be configured through cluster settings,
which accounts for the memory used during the execution of a sequence
query. The memory accounting takes place every fetch_size number
of processed events (docs), to avoid significant performance overhead.

costin · 2021-06-22T10:49:50Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/KeyToSequences.java

@@ -34,23 +40,42 @@
            this.groups = new SequenceGroup[stages];
        }

-        void add(int stage, Sequence sequence) {
+        long add(int stage, Sequence sequence) {


Accounting/ram usage should not leak into the return values of add/remove/until.

costin · 2021-06-22T10:57:24Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/KeyToSequences.java

+        }
+
+        @Override
+        public long ramBytesUsed() {


Since the ram used is computed in a batch, there's no point in keeping track per remote/until/add method. It's either incremental or a batch - currently it's a mix.

costin · 2021-06-22T10:58:30Z

...plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/SequenceMatcher.java


        stats.seen++;
+
+        bytesUsed += sequence.ramBytesUsed();
+        addAccountedMemory(bytesUsed, "matcher_sequence");


Please put the labels ("matcher_sequence") into a dedicated class.

costin

Left some few comments.

costin

It's a good start however I see a number of issues:

The estimation is too intrusive.

Modifying the method signature that work on adding/removing matches to return estimates is a no-go. The estimation should be 'invisible' to the logic.

Too expensive

Most objects are really light - a couple of strings and some ints. Whether it's estimating on the fly the object tree or caching its result this adds significant overhead (whether it's method calls or memory usage) since it's per object.
Further more the memory is evaluated for every hit which is accurate but expensive.

I think moving the estimation outside the object will work better (since the index strings are cached for example which is better addressed across objects instead of per instance) and can be estimated and thus cached.
That's because for example the doc id has typically the same length (so it's the same size across hits), the tiebreaker is the same across all hits while the timestamp is typically the same per batch of results.

Moreover the breaker check can be done per batch of entries (say every 100/256/500/1k hits) instead of every single one.

costin · 2021-06-23T09:46:33Z

...ugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/assembler/ExecutionManager.java


        TumblingWindow w = new TumblingWindow(new PITAwareQueryClient(session),
                criteria.subList(0, completionStage),
                criteria.get(completionStage),
-                matcher);
+                matcher,
+                session.circuitBreaker());


The last parameter can be extracted in the TumblingWindow constructor from the matcher.

matcher doesn't have circuitBreaker anymore, as all the calculations are done in the TumblingWindow.

costin · 2021-06-23T09:51:25Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/search/HitReference.java

@@ -33,6 +37,11 @@ public String id() {
        return id;
    }

+    @Override
+    public long ramBytesUsed() {
+        return SHALLOW_SIZE + RamUsageEstimator.sizeOf(index) + RamUsageEstimator.sizeOf(id);


Since this doesn't change during the lifecycle of the object, it can be computed as a constant inside the constructor.
As a separate concern, the index string is cached during the same request to reduce object churn which the estimation should take into account otherwise it overestimates. Which against a lot of objects becomes an issue.

costin · 2021-06-23T09:53:23Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/search/Ordinal.java

@@ -33,6 +37,15 @@ public long implicitTiebreaker() {
        return implicitTiebreaker;
    }

+    @Override
+    public long ramBytesUsed() {


Same as above - since the fields are final and this method is going to be called multiple times it's worth considering caching the long (and thus increasing its memory size) to save on virtual calls.

That's already cached, as it's computed once and saved in a static variable.

The method is incorrect - the timestamp and implicitTiebreaker need to be accounted (the tiebreaker comparable is only a reference so shallow).

Either these are accounted for or there's no point in tracking Ordinal at all.
It would be useful to see the impact the accounting takes on the call tree though some basic profiling.

Those are accounted with the RamUsageEstimator.shallowSizeOfInstance(Ordinal.class), all 3 fields, and the tiebreaker just as an object Reference.

matriv · 2021-06-23T11:55:39Z

The estimation is too intrusive.

This is reverted in favour of calling a "batch" estimation after the calls to match() and trim() on the matcher.

Too expensive

Most objects are really light - a couple of strings and some ints. Whether it's estimating on the fly the object tree or caching its result this adds significant overhead (whether it's method calls or memory usage) since it's per object.
Further more the memory is evaluated for every hit which is accurate but expensive.

This is done only in TumblingWindow.wrapValues().next() as I cannot find any other way to do this in batches.
Outside this lambda we no longer have a grasp of which objects were created. The only solution I can see is to
have a counter increasing inside next() and once it reaches the 128/256/etc. mark inside the lambda we make
the call to the circuitbreaker and add the memory.

I think moving the estimation outside the object will work better (since the index strings are cached for example which is better addressed across objects instead of per instance) and can be estimated and thus cached.
That's because for example the doc id has typically the same length (so it's the same size across hits), the tiebreaker is the same across all hits while the timestamp is typically the same per batch of results.

Could you please provide some details on this idea? what do you mean "outside the object"?

costin

Looks good - sound be moved to a proper PR and shared with the rest of the group.

costin · 2021-06-24T11:03:50Z

...plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/SequenceMatcher.java


    private final Logger log = LogManager.getLogger(SequenceMatcher.class);

-    static class Stats {
+    static class Stats implements Accountable {


I don't think we need to count this class since it's just one per matcher.

costin · 2021-06-24T11:07:06Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/search/Limit.java

-public class Limit {
+public class Limit implements Accountable {
+
+    private static final long SHALLOW_SIZE = RamUsageEstimator.shallowSizeOfInstance(Limit.class);


I believe there's only one instance of this class so no need to account for it.

elasticmachine · 2021-06-24T11:20:41Z

Pinging @elastic/es-ql (Team:QL)

costin

Left another round of comments essentially around the circuit breaker directly into SequenceMatcher (which contains all the hooks and data structure needed).

costin · 2021-06-24T20:41:06Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/search/Limit.java

@@ -16,7 +16,7 @@

 import static java.util.Collections.emptyList;

-public class Limit {
+public class Limit  {


Should be removed.

costin · 2021-06-24T20:41:49Z

...k/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/KeyAndOrdinal.java

@@ -12,6 +12,7 @@
 import java.util.Objects;

 public class KeyAndOrdinal {
+


No change, should be removed.

costin · 2021-06-24T20:45:27Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/KeyToSequences.java

-class KeyToSequences {
+class KeyToSequences implements Accountable {
+
+    private static final long SHALLOW_SIZE = RamUsageEstimator.shallowSizeOfInstance(KeyToSequences.class);


Since there's only one instance for this class, I don't think the shallow size is necessary.

costin · 2021-06-24T20:48:09Z

...plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/SequenceMatcher.java


    private final Logger log = LogManager.getLogger(SequenceMatcher.class);

-    static class Stats {
+    static class Stats  {


Extra whitespace, needs removal.

Still there.

costin · 2021-06-24T20:48:52Z

...plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/SequenceMatcher.java

-public class SequenceMatcher {
+public class SequenceMatcher implements Accountable {
+
+    private static final long SHALLOW_SIZE = RamUsageEstimator.shallowSizeOfInstance(SequenceMatcher.class);


There's only one instance for this class, no need to account for it.

costin · 2021-06-24T20:50:28Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/StageToKeys.java


    @SuppressWarnings(value = { "unchecked", "rawtypes" })
    StageToKeys(int stages) {
        // use asList to create an immutable list already initialized to null
        this.stageToKey = Arrays.asList(new Set[stages]);
+        ramBytesUsed = SHALLOW_SIZE + RamUsageEstimator.sizeOfCollection(stageToKey);


This one is wrong. The stateKey has a fixed length but each entry contains a set of SequenceKeys which vary in length and it is suppose to be dynamic.

costin · 2021-06-24T20:52:39Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/TumblingWindow.java

@@ -53,6 +54,7 @@
 public class TumblingWindow implements Executable {

    private static final int CACHE_MAX_SIZE = 64;
+    private static final String CIRCUIT_BREAKER_LABEL = "sequence_matches";


Not sure what's the convention for circuit breakers however I would expect an eql prefix - if this is not picked up from the package name/plugin then let's add it here.
Regarding the name, I would relabel it to "sequences" and differentiate between "completed" and "in-flight" or "in-progress"

https://github.com/elastic/elasticsearch/pull/74381/files#diff-92c7ef4c47b713f656bb9dd1c5dc054ccfa0473e6c485e4d0c362d9e7671539bR54
The name is "eql_sequence", I will rename it to plain "eql", and use sequences_completes and sequences_inflight for the method call.

costin · 2021-06-24T20:55:11Z

x-pack/plugin/eql/src/test/java/org/elasticsearch/xpack/eql/analysis/CancellationTests.java

@@ -58,7 +56,7 @@ public void testCancellationBeforeFieldCaps() throws InterruptedException {
        ClusterService mockClusterService = mockClusterService();

        IndexResolver indexResolver = new IndexResolver(client, randomAlphaOfLength(10), DefaultDataTypeRegistry.INSTANCE);
-        PlanExecutor planExecutor = new PlanExecutor(client, indexResolver, new NamedWriteableRegistry(Collections.emptyList()));
+        PlanExecutor planExecutor = new PlanExecutor(client, indexResolver, new NoopCircuitBreaker("test"));


Let's define only one method in the test for creating the executor and reuse that instead:

PlanExecutor = planExecutor() ... public PlanExecutor planExecutor() { return new PlanExecutor(client, indexResolver, new NoopCircuitBreaker("test")) }

costin · 2021-06-24T21:05:11Z

...plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/SequenceMatcher.java

@@ -285,6 +290,11 @@ public void clear() {
        completed.clear();
    }

+    @Override
+    public long ramBytesUsed() {
+        return SHALLOW_SIZE + RamUsageEstimator.sizeOf(keyToSequences) + RamUsageEstimator.sizeOf(stageToKeys);


One needs to add completed which represents the completed sequences.
While the other two structures represent the in-flight sequences.
I'm not sure whether we want to differentiate between them or not (and label them appropriately since the results are typically significantly smaller than the intermediate data).

costin · 2021-06-24T21:07:49Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/TumblingWindow.java

+    // and for each subquery every "fetch_size" docs. Doing RAM accounting on object creation is
+    // expensive, so we just calculate the difference in bytes of the total memory that the matcher's
+    // structure occupy, before and after the match() call.
+    private boolean match(int stage, Iterable<Tuple<KeyAndOrdinal, HitReference>> hits) {


Since all the data and thus tracking is done in the sequence matcher, it's easier to move the accounting there.
Keep the TumblingWindow intact, move the tracking directly into matcher.match (no need to wrap it) including clearing the memory.

Luegg · 2021-06-28T07:23:36Z

...plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/SequenceMatcher.java

        }
-        log.trace("{}", stats);
-        return true;
+        trackMemory(ramBytesUsedInFlight, ramBytesUsedCompleted);


I noticed that trackMemory is not called in the if (headLimit) {... branch. Why is that?

Because it's an early exit, and no new objects are created/added in the structures.

Luegg · 2021-06-28T07:24:34Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/plugin/EqlPlugin.java

+
+    @Override
+    public void setCircuitBreaker(CircuitBreaker circuitBreaker) {
+        //assert circuitBreaker.getName().equals(TRAINED_MODEL_CIRCUIT_BREAKER_NAME);


Suggested change

//assert circuitBreaker.getName().equals(TRAINED_MODEL_CIRCUIT_BREAKER_NAME);

Luegg · 2021-06-28T07:32:01Z

...n/eql/src/test/java/org/elasticsearch/xpack/eql/execution/assembler/CircuitBreakerTests.java

+        }));
+
+        CIRCUIT_BREAKER.startBreaking();
+        window.execute(wrap(p -> {}, ex -> assertEquals(CircuitBreakingException.class, ex.getClass())));


I think this test would also pass if the circuit breaker never throws / is never called.

Thx! Fixed with expectThrows()

astefan

Is CircuitBreakerTests enough? It only tests that a breaker trips when a sequence runs, but there is nothing checked about when it trips, if it reaches the limit or not.

matriv · 2021-06-29T08:32:39Z

Is CircuitBreakerTests enough? It only tests that a breaker trips when a sequence runs, but there is nothing checked about when it trips, if it reaches the limit or not.

I've added more low-level test with b246745

costin

LGTM.

costin · 2021-06-29T09:58:34Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/search/Ordinal.java

@@ -33,6 +37,15 @@ public long implicitTiebreaker() {
        return implicitTiebreaker;
    }

+    @Override
+    public long ramBytesUsed() {


The method is incorrect - the timestamp and implicitTiebreaker need to be accounted (the tiebreaker comparable is only a reference so shallow).

Either these are accounted for or there's no point in tracking Ordinal at all.
It would be useful to see the impact the accounting takes on the call tree though some basic profiling.

costin · 2021-06-29T10:00:10Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/SequenceKey.java


    public static final SequenceKey NONE = new SequenceKey();

    private final Object[] keys;
    private final int hashCode;

-    SequenceKey(Object... keys) {
+    public SequenceKey(Object... keys) {


Why public?

Moved the Tests into the sequence package to avoid this, so I revert it.

costin · 2021-06-29T10:00:29Z

...plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/SequenceMatcher.java


    private final Logger log = LogManager.getLogger(SequenceMatcher.class);

-    static class Stats {
+    static class Stats  {


Still there.

The sequence matching algorithm holds some structures to keep track of the matched and potentially matching sequences of events. When large amount of events, sequence stages needs to be processed but also when the requested size of the query (number of sequences to return) is large, those structure can potentially increase the memory footprint. Add a CircuitBreaker which can be configured through cluster settings, which accounts for the memory used during the execution of a sequence query. The memory accounting takes place every fetch_size number of processed events (docs), to avoid significant performance overhead. (cherry picked from commit c6f0fb8)

Add documentation for the newly introduced CircuitBreaker, which is used to restrict the memory usage for an EQL sequence query to avoid OutOfMemory exceptions. Follows: elastic#74381

Add documentation for the newly introduced CircuitBreaker, which is used to restrict the memory usage for an EQL sequence query to avoid OutOfMemory exceptions. Follows: #74381

Add documentation for the newly introduced CircuitBreaker, which is used to restrict the memory usage for an EQL sequence query to avoid OutOfMemory exceptions. Follows: elastic#74381

Add documentation for the newly introduced CircuitBreaker, which is used to restrict the memory usage for an EQL sequence query to avoid OutOfMemory exceptions. Follows: #74381 Co-authored-by: Marios Trivyzas <matriv@gmail.com>

EQL: Add Circuit Breaker for sequence queries

ae1dc91

matriv force-pushed the cb-eql branch from 3643a70 to ae1dc91 Compare June 22, 2021 08:25

matriv requested a review from costin June 22, 2021 08:25

matriv added :Analytics/EQL EQL querying v7.14.0 v8.0.0 labels Jun 22, 2021

costin reviewed Jun 22, 2021

View reviewed changes

costin requested changes Jun 22, 2021

View reviewed changes

Use batch ram accouning for sequence matcher

d1ea4fe

costin requested changes Jun 23, 2021

View reviewed changes

clean up ram accounting

3ba5da5

costin added the Team:QL (Deprecated) Meta label for query languages team label Jun 23, 2021

matriv and others added 5 commits June 23, 2021 17:11

Call to CB in batches

3f994ca

Remove accounting for temp objects

cbcd6db

Fix imports

2b9904e

Merge remote-tracking branch 'upstream/master' into cb-eql

ba3117d

Add comments, remove cb calls for trim()

c412097

matriv requested a review from costin June 24, 2021 08:02

costin reviewed Jun 24, 2021

View reviewed changes

matriv added 2 commits June 24, 2021 14:10

Add unit test

309c59f

remove limit and stats from counting

6485744

matriv requested review from costin, astefan, bpintea and Luegg June 24, 2021 11:20

matriv marked this pull request as ready for review June 24, 2021 11:20

matriv changed the title ~~EQL: Add CircuitBreaker~~ EQL: Add CircuitBreaker for sequence queries Jun 24, 2021

costin requested changes Jun 24, 2021

View reviewed changes

matriv and others added 3 commits June 25, 2021 16:33

Address review pt1

b7a5404

move the ram accounting in SequnceMatcher

367c42d

Merge remote-tracking branch 'upstream/master' into cb-eql

118e13f

matriv requested a review from costin June 25, 2021 14:06

Luegg approved these changes Jun 28, 2021

View reviewed changes

matriv added 3 commits June 28, 2021 11:10

address review

3c4469c

fix test

700bc97

Merge remote-tracking branch 'upstream/master' into cb-eql

168f745

astefan reviewed Jun 29, 2021

View reviewed changes

Add low level unit test

b246745

costin approved these changes Jun 29, 2021

View reviewed changes

address comments

d149f65

matriv merged commit c6f0fb8 into elastic:master Jun 29, 2021

matriv deleted the cb-eql branch June 29, 2021 11:31

matriv added the backport pending label Jun 29, 2021

matriv removed the backport pending label Jun 29, 2021

matriv mentioned this pull request Jul 5, 2021

EQL: [Docs] Add documentation for the CircuitBreaker #74897

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

		@@ -12,6 +12,7 @@
		import java.util.Objects;

		public class KeyAndOrdinal {

EQL: Add CircuitBreaker for sequence queries #74381

EQL: Add CircuitBreaker for sequence queries #74381

Conversation

matriv commented Jun 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv commented Jun 23, 2021

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented Jun 24, 2021

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

matriv commented Jun 29, 2021

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv commented Jun 21, 2021 •

edited

Loading