Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterize segment IDs #16174

Merged
merged 4 commits into from
Mar 22, 2024

Conversation

abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Mar 19, 2024

Issue:

The markUsed API accepts either an interval or segmentIds. When using segment IDs, the underlying query uses an IN clause with the user-provided values inlined directly in the query instead of being parameterized.

Fix:

This patch fixes that by adding the following utilities specifically for queries that contain an IN clause:

  • getParameterizedInConditionForColumn()
  • bindColumnValuesToQueryWithInCondition()

Other related changes:

Renamed the following functions for consistency and code hygiene:

  • appendConditionForIntervalsAndMatchMode() to getConditionForIntervalsAndMatchMode(). Pass only select arguments to the function.
  • bindQueryIntervals() to bindIntervalsToQuery()

Release note:

The segmentIds filter in the markUsed API payload is parameterized in the DB query.

This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • been tested in a test Druid cluster.

@kfaraz kfaraz self-requested a review March 20, 2024 05:37
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change, @abhishekrb19 . This is very useful!

It would be nice to have a test to verify that the code throws an exception when passing illegal values for the version or any other column.

@abhishekrb19
Copy link
Contributor Author

@kfaraz, thanks for taking a look!

Thanks for the change, @abhishekrb19 . This is very useful!

It would be nice to have a test to verify that the code throws an exception when passing illegal values for the version or any other column.

What kind of validation and exception did you have in mind? Currently, the code doesn't perform any type checks for the columns.

@kfaraz
Copy link
Contributor

kfaraz commented Mar 22, 2024

What kind of validation and exception did you have in mind? Currently, the code doesn't perform any type checks for the columns.

I should think that JDBI would throw some exception if we try to bind am invalid string which contains, say a substring like '); DROP TABLE abc;.
We could try to verify that behaviour. By the way, this is optional and need not block this PR.

@abhishekrb19
Copy link
Contributor Author

Ah, I see. The JDBI library doesn't check for "illegal" values for the named parameters as it's contextual. I ran a test with such values and it seems the library just treats them as any other string literal. That said, the binding itself makes the query safe to execute and I think if we want to detect illegal values or perform semantic validation, it has to be done at a layer on top (I'm not sure if that check can be full-proof though).

@kfaraz
Copy link
Contributor

kfaraz commented Mar 22, 2024

That said, the binding itself makes the query safe to execute

Yeah, that's a fair point. Thanks for the clarification.

@abhishekrb19 abhishekrb19 merged commit a70e28a into apache:master Mar 22, 2024
85 checks passed
@abhishekrb19 abhishekrb19 deleted the parameterized_segment_ids branch March 22, 2024 15:21
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants