Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Add ability to perform CCS through SQL querying #78903

Merged
merged 18 commits into from
Oct 20, 2021

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Oct 11, 2021

This adds the ability to perform cross cluster searching through SQL.
This is done by simply using the CCS-specific cluster:index notation,
where the "cluster" maps to an SQL catalog and "index" to a table;
patterns are supported for both. Noteworthy here is that a search runs
either on the local or remote clusters, but not both (which is a
CCS-specific behavior).

Another limitation is currently that table enumeration in remote
clusters will not contain aliases (not reported through field caps).
Also, frozen indices are currently not listed as such (they're listed as
normal indices).

SYS COLUMNS command will not take a catalog pattern - an exact match is
required -, following xDBC spec. SYS TABLES can, however.

The query API now takes a new parameter, catalog, whose value is prefixed
to the index in the query, before execution, if the index is not already qualified
(which takes precedence).

This adds the ability to perform cross cluster searching through SQL.
This is done by simply using the CCS-specific `cluster:index` notation,
where the "cluster" maps to an SQL catalog and "index" to a table;
patterns are supported for both. Noteworthy here is that a search runs
either on the local or remote clusters, but not both (which is a
CCS-specific behavior).

Another limitation is currently that table enumeration in remote
clusters will not contain aliases (not reported through field caps).
Also, frozen indices are currently not listed as such (they're listed as
normal indices).

`SYS COLUMNS` command will not take a catalog pattern - an exact match is
required -, following xDBC spec. `SYS TABLES` can, however.
License and WS fixes.
Add a comment.
Revert a debugging change.
@bpintea
Copy link
Contributor Author

bpintea commented Oct 11, 2021

@elasticmachine update branch

@bpintea bpintea marked this pull request as ready for review October 11, 2021 14:27
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Oct 11, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@bpintea bpintea marked this pull request as draft October 13, 2021 07:33
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave a first round of feedback. Looks good overall.
It's not clear to me what test are running against a catalog and which one are not.
SELECT FROM table
vs
SELECT FROM local:table
vs
SELECT FROM remote:table

this.client = client;
this.clusterName = clusterName;
this.typeRegistry = typeRegistry;
this.remoteClusters = new RemoteClusterResolver(settings, clusterSettings);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing wise I think it's better to pass a provider that returns a Set vs the ClusterSettings which is an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pulled this class out as standalone and passed IndexResolver just a Supplier of the set.

Comment on lines -194 to +220
client.admin().indices().getAliases(aliasRequest, wrap(aliases ->
resolveIndices(indices, javaRegex, aliases, retrieveIndices, retrieveFrozenIndices, listener),
ex -> {
// with security, two exception can be thrown:
// INFE - if no alias matches
// security exception is the user cannot access aliases

// in both cases, that is allowed and we continue with the indices request
if (ex instanceof IndexNotFoundException || ex instanceof ElasticsearchSecurityException) {
resolveIndices(indices, javaRegex, null, retrieveIndices, retrieveFrozenIndices, listener);
} else {
listener.onFailure(ex);
client.admin().indices().getAliases(aliasRequest, wrap(aliases -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on what's the difference between the old code and the new one - the whole block was formatted so it's difficult to keep track.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The significant changes are: (1) decouple and defer the output filtering from index/alias collection and (2) selectively cascade the resolution: aliases -> local indices -> remote indices (-> filtering), as these are needed.
The aliases and local indices resolution haven't otherwise changed.

Comment on lines 269 to 283
client.admin().indices().getIndex(indexRequest, wrap(indices -> {
if (indices != null) {
for (String indexName : indices.getIndices()) {
boolean isFrozen = retrieveFrozenIndices
&& indices.getSettings().get(indexName).getAsBoolean("index.frozen", false);
indexInfos.add(new IndexInfo(clusterName, indexName,
isFrozen ? IndexType.FROZEN_INDEX : IndexType.STANDARD_INDEX));
}
}
resolveRemoteIndices(clusterWildcard, indexWildcard, javaRegex, retrieveFrozenIndices, indexInfos, listener);
},
listener::onFailure)
);
} else {
resolveRemoteIndices(clusterWildcard, indexWildcard, javaRegex, retrieveFrozenIndices, indexInfos, listener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code is similar to the method below (resolveRemoteIndices) - can the two be merged somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about it, but didn't come to a solution. The flow is similar, but different request types and API and a "generified" would be more verbose eventually. But happy to apply suggestions.

Comment on lines 46 to 47
sb.append(query.substring(i, j).replaceAll("(?i)(FROM)(\\s+)(\\w+|\"[^\"]+\")",
"$1$2" + buildRemoteIndexName(REMOTE_CLUSTER_NAME, "$3")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😓

Extract RemoteClusterResolver into own class and ony pass a supplier of
remote clusters to the IndexResolver.
This adds a new "catalog" request parameter, to be used by the xDBC
clients. It's value is whatever the set-catalog xDBC action is passed to
by the application and will have ES/SQL run the query on the indicated
cluster, in case the query itself doesn't spacify a cluster target
(which takes precedence over the parameter).
@bpintea
Copy link
Contributor Author

bpintea commented Oct 14, 2021

@elasticmachine update branch

@bpintea bpintea marked this pull request as ready for review October 14, 2021 10:08
@bpintea
Copy link
Contributor Author

bpintea commented Oct 14, 2021

It's not clear to me what test are running against a catalog and which one are not.
SELECT FROM table
vs
SELECT FROM local:table
vs
SELECT FROM remote:table

The JDBC driver now takes a new parameter, catalog, whose value is prefixed to the index, if this is not already qualified.

The JdbcCsvSpecIT test will randomly alternate between SELECT FROM table (with a "catalog": "remote" param) and SELECT FROM remote:table. Currently all the data is indexed in the remote cluster only.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Left some minor comments.


// first get aliases (if specified)
boolean retrieveAliases = CollectionUtils.isEmpty(types) || types.contains(IndexType.ALIAS);
boolean retrieveIndices = CollectionUtils.isEmpty(types) || types.contains(IndexType.STANDARD_INDEX);
boolean retrieveFrozenIndices = CollectionUtils.isEmpty(types) || types.contains(IndexType.FROZEN_INDEX);

String[] indices = Strings.commaDelimitedListToStringArray(indexWildcard);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why you removed this. Now you are explicitly calling Strings.commaDelimitedListToStringArray(indexWildcard) here and here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restored it. In a previous design, the non-split version was called in 3 places, but now it makes sense to have it split at the beginning. (Renamed it tho, to be able to use the same name across invocations with no conflict of not-changed code.)

;

showTablesWithFrozen
SHOW TABLES CATALOG 'my_remote_cluster' INCLUDE FROZEN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@@ -179,7 +179,7 @@ private static FieldAttribute getFieldAttribute(String name) {
}

public void testPruneSubQueryAliases() {
ShowTables s = new ShowTables(EMPTY, null, null, false);
ShowTables s = new ShowTables(EMPTY, null, null, null,null, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@@ -65,9 +65,13 @@
// NB: this is password instead of pass since that's what JDBC DriverManager/tools use
public static final String AUTH_PASS = "password";

// Default catalog

public static final String CATALOG = "catalog";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Default catalog

public static final String CATALOG = "catalog";

protected static final Set<String> OPTION_NAMES = new LinkedHashSet<>(
Arrays.asList(PROPERTIES_VALIDATION, BINARY_COMMUNICATION, CONNECT_TIMEOUT, NETWORK_TIMEOUT, QUERY_TIMEOUT, PAGE_TIMEOUT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the catalog defined on the connection configuration? This is about doing a network connection to the cluster not about its content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of the catalog parameter will contain what the drivers receive from clients through the setCatalog()/SQL_ATTR_CURRENT_CATALOG (/USE/SET database potentially for CLI, in the future), i.e. the context in which to execute a query if it contains no catalog itself.

@@ -256,6 +264,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
if (zoneId != null) {
builder.field(TIME_ZONE_NAME, zoneId.getId());
}
if (catalog != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this code here? Not at the beginning nor at the end of the code block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to consistently keep the catalog after zoneId (member declaration, parameter order, code inserts), since catalog seems as "connection relevant" as zoneId (i.e. not really a meta config, but rather execution-relevant).

Revert to splitting index string higher up in the execution.
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a second round of comments/questions.

apply plugin: 'elasticsearch.rest-test'

dependencies {
testImplementation project(path: xpackModule('eql:qa:common'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is eql here needed? Most likely it shouldn't be needed as dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted. Replaced the dependency with a minimal one (ql:test-fixtures).

setting 'node.roles', '[data,ingest,master]'
setting 'xpack.ml.enabled', 'false'
setting 'xpack.watcher.enabled', 'false'
setting 'xpack.security.enabled', 'false'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security is disabled but the test project is called multi-cluster-with-security and you set an username and password later in the file. Doesn't look right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabled back, thanks.

remoteClusterReg.get().getAllTransportPortURI().collect { "\"$it\"" }.toString()
}
setting 'cluster.remote.connections_per_cluster', "1"
setting 'xpack.security.enabled', 'false'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

assertFalse(rs.next());

es.setCatalog(LOCAL_CLUSTER_NAME);
expectThrows(SQLException.class, ps::executeQuery);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth checking for the error message as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked.

super(fileName, groupName, testName, lineNumber, randomBoolean() ? qualifyFromClause(testCase) : testCase);
}

// qualify the query FROM clause with the cluster name, but (crudely) skip `EXTRACT(a FROM b)` calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skipping the EXTRACT function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid changing queries to an incorrect format like SELECT ... EXTRACT(YEAR cluster:FROM date) FROM cluster:index ....

for (String table : tables) {
String line = null;
/*
* Security automatically creates either a `.security` or a
* `.security6` index but it might not have created the index
* by the time the test runs.
*/
while (line == null || line.startsWith(".security")) {
while (line == null || line.contains("|.security")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change if the comment says ".security" or ".security6"? Either leave it as is, or do also change the comment, because there is a slight contradiction now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOW TABLES will now have as first column the catalog and the table will be second. So for both .security and .security6 indices, the listing will no longer startWith() that value, but the line will contain a |.security (and/or |.security6).

Comment on lines +101 to +110
private void cleanup() throws IOException {
try {
deleteTestIndex();
} catch (ResponseException e) {
// while the majority of tests use the index(...) test method, few create their own index
if (e.getResponse().getStatusLine().getStatusCode() != 404) {
throw e;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? I thought that ESRestTestCase already does this if preserveIndicesUponCompletion() is not overridden, which should not be the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preserveIndicesUponCompletion() is used to clear (or not) the local cluster. The indices in these tests are created on the remote cluster and only queried locally.

@@ -255,6 +273,8 @@ public void testNextPageWithDatetimeAndTimezoneParam() throws IOException {
expected,
runSql(new StringEntity(cursor(cursor).mode(mode).toString(), ContentType.APPLICATION_JSON), StringUtils.EMPTY, mode)
);

deleteIndex("test_date_timezone");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about this operation. Why is it specifically needed if one of the base classes should do the cleanup in an @After method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

() -> runSql(randomMode(), "SELECT * FROM test JOIN other"),
containsString("line 1:21: Queries with JOIN are not yet supported")
() -> runSql(randomMode(), "SELECT * FROM " + indexPattern("test") + " JOIN other"),
containsString(": Queries with JOIN are not yet supported")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line is removed because of the index pattern that now can have the cluster name in front of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. These tests are executed both on local cluster and with CCS so the offsets would differ.

Address review comments
Enable test cluster security
Propagate credentials to JdbcIntegrationTestCase hierarchy.
Remove no longer used imports.
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bpintea bpintea added the auto-backport-and-merge Automatically create backport pull requests and merge when ready label Oct 20, 2021
@bpintea bpintea merged commit 733837d into elastic:master Oct 20, 2021
@bpintea bpintea deleted the feat/sql_ccs branch October 20, 2021 06:25
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 78903

bpintea added a commit to bpintea/elasticsearch that referenced this pull request Oct 20, 2021
This adds the ability to perform cross cluster searching through SQL.
This is done by simply using the CCS-specific `cluster:index` notation,
where the "cluster" maps to an SQL catalog and "index" to a table;
patterns are supported for both. Noteworthy here is that a search runs
either on the local or remote clusters, but not both (which is a
CCS-specific behaviour).

Another limitation is currently that table enumeration in remote
clusters will not contain aliases (not reported through field caps).
Also, frozen indices are currently not listed as such (they're listed as
normal indices).

`SYS COLUMNS` command will not take a catalog pattern - an exact match is
required -, following xDBC spec. `SYS TABLES` can, however.

This adds a new "catalog" request parameter, to be used by the xDBC
clients. It's value is whatever the set-catalog xDBC action is passed to
by the application and will have ES/SQL run the query on the indicated
cluster, in case the query itself doesn't specify a cluster target
(which takes precedence over the parameter).

(cherry picked from commit 733837d)
elasticsearchmachine pushed a commit that referenced this pull request Oct 20, 2021
) (#79536)

* SQL: Add ability to perform CCS through SQL querying (#78903)

This adds the ability to perform cross cluster searching through SQL.
This is done by simply using the CCS-specific `cluster:index` notation,
where the "cluster" maps to an SQL catalog and "index" to a table;
patterns are supported for both. Noteworthy here is that a search runs
either on the local or remote clusters, but not both (which is a
CCS-specific behaviour).

Another limitation is currently that table enumeration in remote
clusters will not contain aliases (not reported through field caps).
Also, frozen indices are currently not listed as such (they're listed as
normal indices).

`SYS COLUMNS` command will not take a catalog pattern - an exact match is
required -, following xDBC spec. `SYS TABLES` can, however.

This adds a new "catalog" request parameter, to be used by the xDBC
clients. It's value is whatever the set-catalog xDBC action is passed to
by the application and will have ES/SQL run the query on the indicated
cluster, in case the query itself doesn't specify a cluster target
(which takes precedence over the parameter).

(cherry picked from commit 733837d)

* Remove test cluster setting not avail in 7.x

Remove 'xpack.security.autoconfiguration.enabled' setting.
giladgal added a commit that referenced this pull request Dec 7, 2021
The QL team added support for SQL via CCS to both 7.16 and 8.0 - #78903
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying auto-backport-and-merge Automatically create backport pull requests and merge when ready >feature release highlight Team:QL (Deprecated) Meta label for query languages team v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants