-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EQL: Minimise CCS roundtrips #76076
EQL: Minimise CCS roundtrips #76076
Conversation
This intorduces an optimisation of the EQL requests when these target one remote cluster only (i.e. no mixed local and remote indices or multiple remote clusters). In this case, the EQL request is forwarded to the remote cluster and executed there, instead of having the local cluster perform multiple querries of the remote cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs LGTM. I left some non-blocking nits.
(Optional, Boolean) | ||
+ | ||
If `true`, network round-trips between the local and the remote cluster are | ||
minimized when executing cross-cluster search (CCS) requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fixes some wrapping.
(Optional, Boolean) | |
+ | |
If `true`, network round-trips between the local and the remote cluster are | |
minimized when executing cross-cluster search (CCS) requests. | |
(Optional, Boolean) If `true`, network round-trips between the local and the | |
remote cluster are minimized when running cross-cluster search (CCS) requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed!
If `true`, network round-trips between the local and the remote cluster are | ||
minimized when executing cross-cluster search (CCS) requests. | ||
+ | ||
This setting is currently only effective if the request does not include both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use option
or parameter
to avoid confusion with cluster/index settings.
This setting is currently only effective if the request does not include both | |
This option is currently only effective if the request does not include both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Adress doc review comments. Update error messages
Pinging @elastic/es-ql (Team:QL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some minor comments and questions, otherwise all makes sense to me 👍
@@ -89,6 +89,15 @@ request that targets only `bar*` still returns an error. | |||
+ | |||
Defaults to `true`. | |||
|
|||
`ccs_minimize_roundtrips`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give some background behind the decision to make this configurable? Is there a use case for not minimizing round trips or is that a general convention for configurability of optimizations like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks a lot @jrodewig , I wasn't aware of this doc yet. That clarifies a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the setting follows suite of the one for search and is similar to it, but I've hesitated to add the reference, since we don't actually link the two. We could add an explanation note to clarify the distinction, though.
if (clusterAliases.size() != 1 || clusterAliases.contains(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY)) { | ||
return false; | ||
} | ||
// The remote async ID would not be valid on local cluster; furthermore on results fetching we would know neither if the ID is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth to mention this limitation in the docs as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might, but wanted to have the TODO
proposal reviewed, in case it emerges as a "blocker" for this optimisation.
transportService.sendRequest(transportService.getRemoteClusterService().getConnection(clusterAlias), | ||
EqlSearchAction.INSTANCE.name(), request.indices(remoteIndices), TransportRequestOptions.EMPTY, | ||
new ActionListenerResponseHandler<>(wrap(r -> listener.onResponse(qualifyHits(r, clusterAlias)), | ||
e -> listener.onFailure(qualifyException(e, request.indices(), clusterAlias))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e -> listener.onFailure(qualifyException(e, request.indices(), clusterAlias))), | |
e -> listener.onFailure(qualifyException(e, remoteIndices, clusterAlias))), |
Minor change but at least to me it would help to be explicit about this because the mutation of request.indices()
is very subtle. Assuming that it's indeed the remote indices that should be passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's a good point. I think current format was more of a copy that happened to work. Will update with next batch of changes.
And yes, we need the remote-cluster-name:
prefix-stripped remote indices passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a first round of comments.
One thing that missing from this PR is the client
EQL code which needs to be updated (in a backwards compatible way) with the new settings.
(Optional, Boolean) If `true`, network round-trips between the local and the | ||
remote cluster are minimized when running cross-cluster search (CCS) requests. | ||
+ | ||
This option is currently only effective if the request does not include both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs rephrasing:
This option is effective for requests that target data fully contained in one remote cluster; when data is spread across multiple clusters, the setting is ignored
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation is compacter, though maybe less explicit. I've applied the proposal, though.
@@ -17,21 +17,19 @@ | |||
|
|||
@Override | |||
protected String getInexistentIndexErrorMessage() { | |||
return "\"caused_by\":{\"type\":\"verification_exception\",\"reason\":\"Found 1 problem\\nline -1:-1: Unknown index "; | |||
return "\"root_cause\":[{\"type\":\"index_not_found_exception\",\"reason\":\"no such index "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subtle different - what's the reason behind it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Qualifying" the index in the exception replaces the VerificationException with an IndexNotFoundException, to "harmonise" the failure reporting. This would be needed to not differentiate the failure format depending on which request parameters are set.
@@ -15,11 +17,10 @@ public static String remoteClusterIndex(String indexName) { | |||
} | |||
|
|||
public static String remoteClusterPattern(String pattern) { | |||
StringBuilder sb = new StringBuilder(); | |||
StringJoiner sj = new StringJoiner(","); | |||
for (String index: pattern.split(",")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference String.split
is expensive since it compiles a regex pattern - however it does have a shortcut (at least in the recent JVMs) where for single chars it does string iteration. For the former case better to use Strings.commaDelimitedListToStringArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
return false; | ||
} | ||
// The remote async ID would not be valid on local cluster; furthermore on results fetching we would know neither if the ID is | ||
// remote or not, nor which remote cluster it belongs to (TODO: could rewrite the ID to smth like [alias:ID]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with dealing with the async bit after this gets merged. Curious how async search and CCS deals with this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cross-cluster search is supported only with ccs_minimize_roundtrips set to false
.". I.e. it won't optimise it either, it'll query the remote cluster, but store the results locally.
private static EqlSearchResponse qualifyHits(EqlSearchResponse r, String clusterAlias) { | ||
EqlSearchResponse.Hits hits = r.hits(); | ||
List<EqlSearchResponse.Sequence> sequences = qualifySequences(hits.sequences(), clusterAlias); | ||
List<EqlSearchResponse.Event> events = sequences != null ? null : qualifyEvents(hits.events(), clusterAlias, true); | ||
return new EqlSearchResponse(new EqlSearchResponse.Hits(events, sequences, hits.totalHits()), r.took(), r.isTimeout()); | ||
} | ||
|
||
private static List<EqlSearchResponse.Sequence> qualifySequences(List<EqlSearchResponse.Sequence> sequences, String clusterAlias) { | ||
List<EqlSearchResponse.Sequence> qualifiedSequences = null; | ||
if (sequences != null && sequences.isEmpty() == false) { | ||
qualifiedSequences = new ArrayList<>(sequences.size()); | ||
for (EqlSearchResponse.Sequence s : sequences) { | ||
qualifiedSequences.add(new EqlSearchResponse.Sequence(s.joinKeys(), qualifyEvents(s.events(), clusterAlias, false))); | ||
} | ||
} | ||
return qualifiedSequences; | ||
} | ||
|
||
private static List<EqlSearchResponse.Event> qualifyEvents(List<EqlSearchResponse.Event> events, String clusterAlias, | ||
boolean emptyListIfNoEvents) { | ||
List<EqlSearchResponse.Event> qualifiedEvents = null; | ||
if (events != null && events.isEmpty() == false) { | ||
qualifiedEvents = new ArrayList<>(events.size()); | ||
for (EqlSearchResponse.Event e : events) { | ||
qualifiedEvents.add(new EqlSearchResponse.Event(buildRemoteIndexName(clusterAlias, e.index()), e.id(), e.source(), | ||
e.fetchFields())); | ||
} | ||
} | ||
return qualifiedEvents == null && emptyListIfNoEvents ? new ArrayList<>() : qualifiedEvents; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, these methods copy the response to prefix the index which is a waste.
Either change the actual returned objects to be mutable so that the remote cluster alias is added to the index or push the clusterAlias info so it's available during the EqlSearchResponse.*
creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've change the mutability to allow setting an index name. The alternative would require passing the cluster name thorough multiple layer, which is doable, but seemed a bit more contrived.
// index [...]"); this is also similar to a non-CCS `POST inexistent/_eql/search?ignore_unavailable=false`, but | ||
// unfortunately unlike an inexistent pattern search: `POST inexistent*/_eql/search?ignore_unavailable=false, which raises a | ||
// VerificationException as it's root cause. I.e. the failures are not homogenous. | ||
finalException = new RemoteTransportException(e.getMessage(), new IndexNotFoundException(qualifiedIndex)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When @astefan come back please raise this with him since he worked on something similar. It's worth creating an issue to see whether the two errors can be aligned and if not why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. It would seem good to me to return/report a not found index in the same way; i.e. no matter if the verifier finds the query error or other layer under EQL. (This could probably also be a subsequent PR.)
Strings.splitStringByCommaToArray(indexPattern)); | ||
indicesMap.remove(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY); | ||
return indicesMap; | ||
public Set<String> clusterAliases(String[] indices, boolean discardLocal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does discardLocal
has any other value than true
? If not, better to keep the method specialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
discardLocal
has any other value thantrue
?
Yes, the optimisation code needs to know if the queried indices are all local or not (so it calls with this parameter set to false
).
EqlConfiguration cfg = new EqlConfiguration(request.indices(), zoneId, username, clusterName, filter, | ||
Set<String> clusterAliases = remoteClusterRegistry.clusterAliases(request.indices(), false); | ||
if (canMinimizeRountrips(request, clusterAliases)) { | ||
String clusterAlias = clusterAliases.iterator().next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using two methods - canMinimize
plus calling the cluster, how about combining them into a method that returns the cluster on which to re-route the call or otherwise returning null.
String singleCssCluster = minimizedCss(request, clusterAliases);
if (singleCssCluster != null) {
....
} else {
...
}
This helps with the different methods around checking, selecting and massaging the string into one method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did initially consider this format too (before the small RemoteClusterRegistry refactoring), but found delegating the decision to a method a clearer implementation (similar to TransportSearchAction#shouldMinimizeRoundtrips().
String clusterAlias = clusterAliases.iterator().next(); | ||
String[] remoteIndices = new String[request.indices().length]; | ||
for (int i = 0; i < request.indices().length; i++) { | ||
remoteIndices[i] = request.indices()[i].substring(clusterAlias.length() + /*`:`*/1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That inline comment is too l33t. please move it after the statement or above it since as it stands requires multi-parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled that into a trailing comment.
- rephrase docs; - EqlSearchResponse.Event allows setting the index value; - update comment. Also, fix a case of index qualifying in an exception / failure case.
EqlClient? I'll look into it. |
Since it's a parameter, it's the request that needs to be changed not the client - e.g. EqlSearchRequest |
Add de-/serialization of ccs_minimize_roundtrips setting to action/EqlSearchRequest.
Serialisation of the new parameter has been added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Backport failed
To backport manually run |
This introduces an optimisation of the EQL requests when these target one remote cluster only (i.e. no mixed local and remote indices or multiple remote clusters). In this case, the EQL request is forwarded to the remote cluster and executed there, instead of having the local cluster perform multiple queries to the remote cluster. (cherry picked from commit 7a5ac3e)
This introduces an optimisation of the EQL requests when these target one remote cluster only (i.e. no mixed local and remote indices or multiple remote clusters). In this case, the EQL request is forwarded to the remote cluster and executed there, instead of having the local cluster perform multiple queries to the remote cluster. (cherry picked from commit 7a5ac3e)
This introduces an optimisation of the EQL requests when these target
one remote cluster only (i.e. no mixed local and remote indices or
multiple remote clusters). In this case, the EQL request is forwarded
to the remote cluster and executed there, instead of having the local
cluster perform multiple queries of the remote cluster.
This behaviour is enabled by default, a new request parameter
ccs_minimize_roundtrips
can be set tofalse
to disable it.