Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async reads for JDBC #13196

Merged
merged 8 commits into from
Oct 18, 2022
Merged

Async reads for JDBC #13196

merged 8 commits into from
Oct 18, 2022

Conversation

paul-rogers
Copy link
Contributor

Prevents JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests.

Release Notes

Druid's Avatica-based JDBC handler is not the preferred way to run long-running queries in Druid.

Druid supports the Avatica JDBC driver. Avatica uses HTTP requests to communicate with the server. When using JDBC with long-running queries, the HTTP request can time out, producing an error. This PR uses an internal feature of JDBC to avoid timeouts by returning "empty batches" of rows when Druid takes too long to return the actual rows. The JDBC client automatically requests more rows, resulting in Druid queries running asynchronously with JDBC requests. The result is that JDBC queries no longer time out.

This feature is enabled by default with a timeout of 5 seconds. You can modify the time out by changing the druid.sql.avatica.fetchTimeoutMs property to the new timeout. Specify the timeout in milliseconds. Druid enforces a minimum of 1000 milliseconds to prevent hammering of the Broker.

Description

Druid's Avatica handler implementation already uses async execution to open and close the "yielder" for a query. This PR extends the idea by using the same executor for fetches. The fetch state (yielder, fetch offset) resides in a new ResultFetcher class that is invoked on each request to get results.

The async logic is simple:

  • If a future exists from the previous fetch exists, use it.
  • Otherwise, invoke the fetcher using the existing ExecutorService which returns a future.
  • Wait for the future to finish, but only up to the fetch timeout.
  • If the wait times out, save the future and return an empty batch.
  • If the results arrive in time, return them.

The close operation is modified to wait for any in-flight fetch to return before shutting down the result set. A ResultFetcherFactory creates the fetcher for each query. The factory is needed only to allow introducing artificial delays for testing.

The Avatica handler tests are cleaned up to eliminate some redundancy. This then allowed tests for async to be created with less copy/paste than with the existing code.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Prevents JDBC timeouts on long queries by returning empty batches
when a batch fetch takes too long. Uses an async model to run the
result fetch concurrently with JDBC requests.
@FrankChen021
Copy link
Member

This PR uses an internal feature of JDBC to avoid timeouts by returning "empty batches" of rows when Druid takes too long to return the actual rows.

I always thought the HTTP timeout is a client behavior, and didn't know that it can be controlled by the server.
And I'm curious that is such internal feature implemented by Avatica JDBC driver only? Could you show me some materials to learn more about it?

Copy link
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM with some minor comments

@paul-rogers
Copy link
Contributor Author

@FrankChen021, you get the Fastest Review Ever award! Thanks!

I always thought the HTTP timeout is a client behavior, and didn't know that it can be controlled by the server.
And I'm curious that is such internal feature implemented by Avatica JDBC driver only? Could you show me some materials to learn more about it?

You are right: the client (and proxies and Router) all determine timeout: the shortest timeout wins. The challenge is when that HTTP timeout is shorter than the amount of time Druid needs to compute the next batch of results. In a regular REST API via the /sql endpoint, we have to compute and return all the results within this timeout. This is usually not a problem because Druid is intended for simple queries: those that return in 100ms with a few hundred or thousand rows, and that exploit Druid's time partitioning, filters, etc.

The challenge is with the occasional "BI" style query that, for whatever reason, returns many rows, or takes a long time to compute. Maybe you've got billions of rows and need to find the one event, anywhere in that set, that represents access to some resource. There is nothing for it but to scan all the data, and that might take a while depending on the filter used. Or, maybe someone wants to grab several million rows of data to feed into an ML model. And so on.

Normally you would not want to run such a query on Druid: that;'s not what Druid is designed for. But, sometimes you just gotta do what you gotta do. In this case, if your query takes a minute to run, and one of the network elements in the chain has a 30 second timeout, you can't successfully run the query with request response.

But, JDBC is different. It works by sending multiple requests for each query, each returning some number of rows, generally in the thousands. The client keeps asking for more batches until the server returns EOF. So, if we had a long running query, but each batch of records could be computed in less time than the HTTP timeout, the query would succeed.

Most of the time in a big query, however, is the work before we get the first row. So, we want to generalize the above idea. JDBC allows us to return a batch of 0 rows. If so, the client just turns around and requests another batch until it gets rows or is told that an EOF occurred. Just to be clear, this polling is a feature of the Avatica client. The application just iterates though a ResultSet, blissfully ignorant of the goings-on under the covers.

That's what the async thing in this PR does for us. We tell Druid to go get a batch of rows. If this takes longer than the fetch timeout, we return an empty batch, and remember that Druid is busily working away to get us a batch. Maybe it will be ready next time. If, so, we return it. If not, we again return an empty batch and the whole thing repeats.

For this to work, the fetch timeout has to be less than the HTTP timeout. Given jitter, we might use the Nyquist rule and set the fetch timeout to half of the HTTP timeout. It doesn't have to be exact, any value less than 1/2 the network timeout should be fine.

Voila! We've decoupled query run time from network timeouts by running the query (or at least each batch fetch) asynchronously with the Avatica REST HTTP requests. So, you see, we don't set the HTTP timeout, we just work around it by always returning within the timeout, knowing that the client will do the right thing if we return 0 rows to stay within the timeout.

@FrankChen021
Copy link
Member

Just to be clear, this polling is a feature of the Avatica client.

I get it, it's a mechanism provided by the client. No wonder I didn't see such behaviour in other HTTP-based JDBC clients.
This is a good feature that ease client users to set proper timeout settings.

@FrankChen021
Copy link
Member

I was thinking what's the proper tag to label this PR.

We have a label Apache Avatica, but that seems more related to the Avatica client instead of our server side implementation.

And Area - Querying is more related to query planning, query engine, query execution etc. Maybe we need a tag Area - JDBC to tag any issues about our server side processing related to JDBC.

What do you think @paul-rogers @kfaraz

Fixed race condition in Druid's Avatica server-side handler
Fixed issue with no-user connections
@paul-rogers
Copy link
Contributor Author

@FrankChen021, thanks for taking a look at the PR and for working out the proper tag. You suggest:

Maybe we need a tag Area - JDBC to tag any issues about our server side processing related to JDBC.

When used for issues, it is hard for a user to know that an issue is in JDBC vs. the other areas you mention. Us developers know, but JDBC is a rather small area. Maybe Area - Clients or Area - API to encompass JDBC, REST and any other APIs we might invent? But, really, any of those tags are fine if it helps us sort PRs in a useful way.

Copy link
Contributor

@zachjsh zachjsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@paul-rogers paul-rogers merged commit b34b435 into apache:master Oct 18, 2022
@paul-rogers paul-rogers deleted the 221010-jdbc-async branch October 18, 2022 18:41
@kfaraz kfaraz added this to the 25.0 milestone Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants