Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Minimise CCS roundtrips #76076

Merged
merged 7 commits into from
Aug 18, 2021
Merged

EQL: Minimise CCS roundtrips #76076

merged 7 commits into from
Aug 18, 2021

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Aug 4, 2021

This introduces an optimisation of the EQL requests when these target
one remote cluster only (i.e. no mixed local and remote indices or
multiple remote clusters). In this case, the EQL request is forwarded
to the remote cluster and executed there, instead of having the local
cluster perform multiple queries of the remote cluster.

This behaviour is enabled by default, a new request parameter
ccs_minimize_roundtrips can be set to false to disable it.

This intorduces an optimisation of the EQL requests when these target
one remote cluster only (i.e. no mixed local and remote indices or
multiple remote clusters). In this case, the EQL request is forwarded
to the remote cluster and executed there, instead of having the local
cluster perform multiple querries of the remote cluster.
Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs LGTM. I left some non-blocking nits.

Comment on lines 93 to 96
(Optional, Boolean)
+
If `true`, network round-trips between the local and the remote cluster are
minimized when executing cross-cluster search (CCS) requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fixes some wrapping.

Suggested change
(Optional, Boolean)
+
If `true`, network round-trips between the local and the remote cluster are
minimized when executing cross-cluster search (CCS) requests.
(Optional, Boolean) If `true`, network round-trips between the local and the
remote cluster are minimized when running cross-cluster search (CCS) requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

If `true`, network round-trips between the local and the remote cluster are
minimized when executing cross-cluster search (CCS) requests.
+
This setting is currently only effective if the request does not include both
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use option or parameter to avoid confusion with cluster/index settings.

Suggested change
This setting is currently only effective if the request does not include both
This option is currently only effective if the request does not include both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Adress doc review comments. Update error messages
@bpintea bpintea marked this pull request as ready for review August 4, 2021 13:46
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Aug 4, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

Copy link
Contributor

@Luegg Luegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some minor comments and questions, otherwise all makes sense to me 👍

@@ -89,6 +89,15 @@ request that targets only `bar*` still returns an error.
+
Defaults to `true`.

`ccs_minimize_roundtrips`::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some background behind the decision to make this configurable? Is there a use case for not minimizing round trips or is that a general convention for configurability of optimizations like this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks a lot @jrodewig , I wasn't aware of this doc yet. That clarifies a lot!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the setting follows suite of the one for search and is similar to it, but I've hesitated to add the reference, since we don't actually link the two. We could add an explanation note to clarify the distinction, though.

if (clusterAliases.size() != 1 || clusterAliases.contains(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY)) {
return false;
}
// The remote async ID would not be valid on local cluster; furthermore on results fetching we would know neither if the ID is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth to mention this limitation in the docs as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might, but wanted to have the TODO proposal reviewed, in case it emerges as a "blocker" for this optimisation.

transportService.sendRequest(transportService.getRemoteClusterService().getConnection(clusterAlias),
EqlSearchAction.INSTANCE.name(), request.indices(remoteIndices), TransportRequestOptions.EMPTY,
new ActionListenerResponseHandler<>(wrap(r -> listener.onResponse(qualifyHits(r, clusterAlias)),
e -> listener.onFailure(qualifyException(e, request.indices(), clusterAlias))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
e -> listener.onFailure(qualifyException(e, request.indices(), clusterAlias))),
e -> listener.onFailure(qualifyException(e, remoteIndices, clusterAlias))),

Minor change but at least to me it would help to be explicit about this because the mutation of request.indices() is very subtle. Assuming that it's indeed the remote indices that should be passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a good point. I think current format was more of a copy that happened to work. Will update with next batch of changes.
And yes, we need the remote-cluster-name: prefix-stripped remote indices passed.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a first round of comments.
One thing that missing from this PR is the client EQL code which needs to be updated (in a backwards compatible way) with the new settings.

(Optional, Boolean) If `true`, network round-trips between the local and the
remote cluster are minimized when running cross-cluster search (CCS) requests.
+
This option is currently only effective if the request does not include both
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs rephrasing:

This option is effective for requests that target data fully contained in one remote cluster; when data is spread across multiple clusters, the setting is ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is compacter, though maybe less explicit. I've applied the proposal, though.

@@ -17,21 +17,19 @@

@Override
protected String getInexistentIndexErrorMessage() {
return "\"caused_by\":{\"type\":\"verification_exception\",\"reason\":\"Found 1 problem\\nline -1:-1: Unknown index ";
return "\"root_cause\":[{\"type\":\"index_not_found_exception\",\"reason\":\"no such index ";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtle different - what's the reason behind it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Qualifying" the index in the exception replaces the VerificationException with an IndexNotFoundException, to "harmonise" the failure reporting. This would be needed to not differentiate the failure format depending on which request parameters are set.

@@ -15,11 +17,10 @@ public static String remoteClusterIndex(String indexName) {
}

public static String remoteClusterPattern(String pattern) {
StringBuilder sb = new StringBuilder();
StringJoiner sj = new StringJoiner(",");
for (String index: pattern.split(",")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference String.split is expensive since it compiles a regex pattern - however it does have a shortcut (at least in the recent JVMs) where for single chars it does string iteration. For the former case better to use Strings.commaDelimitedListToStringArray

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

return false;
}
// The remote async ID would not be valid on local cluster; furthermore on results fetching we would know neither if the ID is
// remote or not, nor which remote cluster it belongs to (TODO: could rewrite the ID to smth like [alias:ID])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with dealing with the async bit after this gets merged. Curious how async search and CCS deals with this scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cross-cluster search is supported only with ccs_minimize_roundtrips set to false.". I.e. it won't optimise it either, it'll query the remote cluster, but store the results locally.

Comment on lines 200 to 229
private static EqlSearchResponse qualifyHits(EqlSearchResponse r, String clusterAlias) {
EqlSearchResponse.Hits hits = r.hits();
List<EqlSearchResponse.Sequence> sequences = qualifySequences(hits.sequences(), clusterAlias);
List<EqlSearchResponse.Event> events = sequences != null ? null : qualifyEvents(hits.events(), clusterAlias, true);
return new EqlSearchResponse(new EqlSearchResponse.Hits(events, sequences, hits.totalHits()), r.took(), r.isTimeout());
}

private static List<EqlSearchResponse.Sequence> qualifySequences(List<EqlSearchResponse.Sequence> sequences, String clusterAlias) {
List<EqlSearchResponse.Sequence> qualifiedSequences = null;
if (sequences != null && sequences.isEmpty() == false) {
qualifiedSequences = new ArrayList<>(sequences.size());
for (EqlSearchResponse.Sequence s : sequences) {
qualifiedSequences.add(new EqlSearchResponse.Sequence(s.joinKeys(), qualifyEvents(s.events(), clusterAlias, false)));
}
}
return qualifiedSequences;
}

private static List<EqlSearchResponse.Event> qualifyEvents(List<EqlSearchResponse.Event> events, String clusterAlias,
boolean emptyListIfNoEvents) {
List<EqlSearchResponse.Event> qualifiedEvents = null;
if (events != null && events.isEmpty() == false) {
qualifiedEvents = new ArrayList<>(events.size());
for (EqlSearchResponse.Event e : events) {
qualifiedEvents.add(new EqlSearchResponse.Event(buildRemoteIndexName(clusterAlias, e.index()), e.id(), e.source(),
e.fetchFields()));
}
}
return qualifiedEvents == null && emptyListIfNoEvents ? new ArrayList<>() : qualifiedEvents;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, these methods copy the response to prefix the index which is a waste.
Either change the actual returned objects to be mutable so that the remote cluster alias is added to the index or push the clusterAlias info so it's available during the EqlSearchResponse.* creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've change the mutability to allow setting an index name. The alternative would require passing the cluster name thorough multiple layer, which is doable, but seemed a bit more contrived.

// index [...]"); this is also similar to a non-CCS `POST inexistent/_eql/search?ignore_unavailable=false`, but
// unfortunately unlike an inexistent pattern search: `POST inexistent*/_eql/search?ignore_unavailable=false, which raises a
// VerificationException as it's root cause. I.e. the failures are not homogenous.
finalException = new RemoteTransportException(e.getMessage(), new IndexNotFoundException(qualifiedIndex));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When @astefan come back please raise this with him since he worked on something similar. It's worth creating an issue to see whether the two errors can be aligned and if not why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. It would seem good to me to return/report a not found index in the same way; i.e. no matter if the verifier finds the query error or other layer under EQL. (This could probably also be a subsequent PR.)

Strings.splitStringByCommaToArray(indexPattern));
indicesMap.remove(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY);
return indicesMap;
public Set<String> clusterAliases(String[] indices, boolean discardLocal) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does discardLocal has any other value than true? If not, better to keep the method specialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does discardLocal has any other value than true?

Yes, the optimisation code needs to know if the queried indices are all local or not (so it calls with this parameter set to false).

EqlConfiguration cfg = new EqlConfiguration(request.indices(), zoneId, username, clusterName, filter,
Set<String> clusterAliases = remoteClusterRegistry.clusterAliases(request.indices(), false);
if (canMinimizeRountrips(request, clusterAliases)) {
String clusterAlias = clusterAliases.iterator().next();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using two methods - canMinimize plus calling the cluster, how about combining them into a method that returns the cluster on which to re-route the call or otherwise returning null.

String singleCssCluster = minimizedCss(request, clusterAliases);
if (singleCssCluster != null) {
....
} else {
...
}

This helps with the different methods around checking, selecting and massaging the string into one method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did initially consider this format too (before the small RemoteClusterRegistry refactoring), but found delegating the decision to a method a clearer implementation (similar to TransportSearchAction#shouldMinimizeRoundtrips().

String clusterAlias = clusterAliases.iterator().next();
String[] remoteIndices = new String[request.indices().length];
for (int i = 0; i < request.indices().length; i++) {
remoteIndices[i] = request.indices()[i].substring(clusterAlias.length() + /*`:`*/1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That inline comment is too l33t. please move it after the statement or above it since as it stands requires multi-parsing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled that into a trailing comment.

- rephrase docs;
- EqlSearchResponse.Event allows setting the index value;
- update comment.

Also, fix a case of index qualifying in an exception / failure case.
@bpintea
Copy link
Contributor Author

bpintea commented Aug 8, 2021

One thing that missing from this PR is the client EQL code which needs to be updated (in a backwards compatible way) with the new settings.

EqlClient? I'll look into it.

@costin
Copy link
Member

costin commented Aug 9, 2021

One thing that missing from this PR is the client EQL code which needs to be updated (in a backwards compatible way) with the new settings.

EqlClient? I'll look into it.

Since it's a parameter, it's the request that needs to be changed not the client - e.g. EqlSearchRequest

Add de-/serialization of ccs_minimize_roundtrips setting to
action/EqlSearchRequest.
@bpintea
Copy link
Contributor Author

bpintea commented Aug 18, 2021

One thing that missing from this PR is the client EQL code which needs to be updated (in a backwards compatible way) with the new settings.

Serialisation of the new parameter has been added.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bpintea bpintea added the auto-backport Automatically create backport pull requests when merged label Aug 18, 2021
@bpintea bpintea merged commit 7a5ac3e into elastic:master Aug 18, 2021
@bpintea bpintea deleted the enh/optimize_ccs branch August 18, 2021 12:01
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts

To backport manually run backport --upstream elastic/elasticsearch --pr 76076

bpintea added a commit to bpintea/elasticsearch that referenced this pull request Aug 18, 2021
This introduces an optimisation of the EQL requests when these target
one remote cluster only (i.e. no mixed local and remote indices or
multiple remote clusters). In this case, the EQL request is forwarded
to the remote cluster and executed there, instead of having the local
cluster perform multiple queries to the remote cluster.

(cherry picked from commit 7a5ac3e)
elasticsearchmachine pushed a commit that referenced this pull request Aug 18, 2021
This introduces an optimisation of the EQL requests when these target
one remote cluster only (i.e. no mixed local and remote indices or
multiple remote clusters). In this case, the EQL request is forwarded
to the remote cluster and executed there, instead of having the local
cluster perform multiple queries to the remote cluster.

(cherry picked from commit 7a5ac3e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying auto-backport Automatically create backport pull requests when merged >enhancement Team:QL (Deprecated) Meta label for query languages team v7.15.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants