Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using google-cloud-bigquery library instead of google-api-services-bigquery #1555

Open
clairemcginty opened this issue Nov 26, 2018 · 7 comments
Labels
enhancement New feature or request

Comments

@clairemcginty
Copy link
Contributor

Google documentation recommends using the client library google-cloud-bigquery rather than the API library google-api-services-bigquery.

Pros

  • google-cloud-bigquery uses typed protobuf API request/response params rather than plain strings/ints, implicitly handles transport layer configurations, and potentially improves performance by making direct RPC calls rather than JSON over HTTP. We've seen intermittent failures with the scio-bigquery IT suite due to network timeouts, which might be solved by migrating.
  • the google-api-services-bigquery library is in maintenance mode and aside from critical bug fixes, won't have any new features added.

Cons

  • Unfortunately, the data models are quite different, and the three classes from the API library that we publicly expose in Scio - TableSchema, TableReference, and TableRow - map to Schema, TableId, and FieldValueList in the client library. So, if we end up migrating, we'd have to decide whether to change the externally facing Scio API or handle those conversions ourselves in private methods. I have a WIP branch for this migration I'll link to as soon as it's cleaned up.
  • While I was developing that branch I found an issue with the client library that breaks cross-project extraction jobs: ExtractJobConfiguration's setProjectId makes cross-project BQ extracts impossible googleapis/google-cloud-java#3924 , so in its current state, client library is not fully usable in Scio.
@clairemcginty clairemcginty added the enhancement New feature or request label Nov 26, 2018
@clairemcginty
Copy link
Contributor Author

Update: the client library bug affecting extract jobs has been fixed! googleapis/google-cloud-java#3924

@jbx jbx added the P2 label Dec 17, 2018
@jbx jbx added this to the 0.8.0 milestone Jan 18, 2019
@mewppis mewppis removed this from the 0.8.0 milestone Mar 22, 2019
@nevillelyh
Copy link
Contributor

@clairemcginty is this still worth looking?

@nevillelyh
Copy link
Contributor

Talked IRL, closing.

@regadas
Copy link
Contributor

regadas commented May 20, 2020

I would like us to reconsider re-opening this. I think there's still some subtle bugs in our current internal BigQuery client. Some of these bugs are related to not fallbacking to env setting properties.

@regadas regadas reopened this May 20, 2020
@regadas
Copy link
Contributor

regadas commented May 20, 2020

@nevillelyh @clairemcginty what was the reason to not go forward with this?

@clairemcginty
Copy link
Contributor Author

@regadas If I remember right, it was due to the complexity of integrating with Beam's BigQuery sources/sinks -- Beam returned types from google-api-services-bigquery and a lot of the Google library functions that could convert those to google-cloud-bigquery types were private.

This was awhile ago though, so maybe worth a second look?

@regadas
Copy link
Contributor

regadas commented May 20, 2020

@clairemcginty interesting! I think it's worth looking into it again since we are already using the storage impl to actually retrieve data.

Let's see if the other types are good to go as well. I'll book some time to look into this.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants