Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ES|QL] Validate index name in parser #112081

Merged
merged 22 commits into from
Oct 2, 2024

Conversation

fang-xing-esql
Copy link
Member

Resolves: #112040

Calls ES APIs to validate index names in parser.

IndexNameExpressionResolver.resolveDateMathExpression
MetadataCreateIndexService.validateIndexOrAliasName

@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql fang-xing-esql marked this pull request as ready for review August 24, 2024 02:24
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please add more date math examples.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this PR is introducing a breaking change.
A query like from test, --bla* fails with this PR but was accepted by ES|QL before. Now, we strip the wildcard and the first minus sign and validate -bla (which fails).

ES itself doesn't fail for localhost:9200/test,--bla*/_search, but even if it did I am not sure we should copy the ES behavior here. ES|QL has a different behavior than ES when resolving indices anyway; see this PR for more info on that.

I would also be reluctant on validating the index names and not letting ES do it. We assume stuff when we validate the index names (we strip wildcards and the first minus sign) but is ES doing the same steps? If ES is introducing some restrictions or lifting them and those are not reflected in the two methods we borrow from ES, we risk restricting queries when ES accepts them.

@fang-xing-esql fang-xing-esql requested a review from a team as a code owner September 9, 2024 19:47
@fang-xing-esql
Copy link
Member Author

Unfortunately, this PR is introducing a breaking change. A query like from test, --bla* fails with this PR but was accepted by ES|QL before. Now, we strip the wildcard and the first minus sign and validate -bla (which fails).

I apologize for getting back to this late. Yes, there is possibility that this might introduce a breaking change... Discussed with Andrei offline, I'd like to put my analysis and experiments here.

The breaking change could happen when:

  1. The validation on indices names in parser is wrong, we might block a valid index name. - This is a situation that we want to avoid.
  2. The validation on indices names in parser is correct:
    a) The query succeeded before, the invalid index name is ignored, but it failed with ParsingException now. - This is indeed a breaking change, and it could be a surprise to users, however from another perspective it can help users identify bugs in their queries earlier. The solution is to use valid index names in the queries.
    b) The query failed before with a different exception than ParsingException. - This could be a breaking change too, and the application needs to catch a different exception.

I did some experiments, to compare the behavior when there is invalid index names in the query between main and branch. I haven't come across #1 yet.

On today's main, if there is an index name with invalid name, or if there is a non-existing index referenced in the query. The query may fail with two exceptions:

  1. VerificationException if all the indices referenced in the query not exist.
  2. IndexNotFoundException if some indices referenced exist, some don't.

These are what I have found so far, we do have to be very careful at validating index names in parser, not block valid indices names.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change is benefic for ES|QL ignoring the breaking changes it may introduce.
Regarding tests, I think they can be improved a bit:

  • I added a comment regarding the check for the error message itself
  • some tests are needed for multiple comma-separated index patterns are provided
  • also, testInvalidCharacterInIndexPattern can be improved further and cluster: suffix can be provided or skipped for two types of queries
  • the metrics command is a snapshot only command. Please, see other test methods in this class that skip these tests when the test is a non-snapshot run. To test your code with non-snapshot check use -Dbuild.snapshot=false -Dtests.jvm.argline="-Dbuild.snapshot=false" -Dlicense.key=x-pack/license-tools/src/test/resources/public.key in local runs. If you'd like to do it in the PR, the test-release label is needed, but it will take quite a big chunk of time.

@fang-xing-esql fang-xing-esql added the test-release Trigger CI checks against release build label Sep 25, 2024
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fang-xing-esql
Copy link
Member Author

Thanks for reviewing @bpintea and @astefan! I updated to PR according to the recent discussion about index names that start with exclusions, some more related tests are added, hope to make it more consistent with current behavior and less confusing. Just let me know if there is anything that I missed.

In a summary:

  • If there is no index name with wildcard in the index pattern:
    If one of the indexes has invalid names reported from either IndexNameExpressionResolver.resolveDateMathExpression or MetadataCreateIndexService.validateIndexOrAliasName, ParsingExceptions will be thrown.

  • If there is index name with wildcard in the index pattern:
    If IndexNameExpressionResolver.resolveDateMathExpression complains about the DateMath expression in one of the index names, ParsingExceptions will be thrown.
    If IndexNameExpressionResolver.resolveDateMathExpression passed, but there is no index name starting with exclusion, ParsingExceptions will be thrown if MetadataCreateIndexService.validateIndexOrAliasNamereports invalid index name.
    If IndexNameExpressionResolver.resolveDateMathExpression passed, and there is index name starting with exclusion, ParsingExceptions will NOT be thrown, no matter MetadataCreateIndexService.validateIndexOrAliasName passes or fails.

@fang-xing-esql fang-xing-esql added auto-backport-and-merge and removed test-release Trigger CI checks against release build labels Oct 1, 2024
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

}

// comma separated indices
// These fail because there is an invalid index name, and there is no index name with wildcard before it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test for index,-index and/or just -index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, it is a good point. I'll add tests for index,-index, I'll leave it to the IndicesOptions to decide whether to error out or not.

@fang-xing-esql fang-xing-esql merged commit 91dca8d into elastic:main Oct 2, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 112081

@fang-xing-esql
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

fang-xing-esql added a commit to fang-xing-esql/Elasticsearch that referenced this pull request Oct 2, 2024
* validate index name in parser

(cherry picked from commit 91dca8d)
gmarouli pushed a commit to gmarouli/elasticsearch that referenced this pull request Oct 3, 2024
matthewabbott pushed a commit to matthewabbott/elasticsearch that referenced this pull request Oct 4, 2024
fang-xing-esql added a commit that referenced this pull request Oct 7, 2024
@fang-xing-esql fang-xing-esql deleted the validate-index-name branch October 8, 2024 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: early validation of the index name
6 participants