Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition releases data off of non-versioned API #608

Open
richaagarwal opened this issue Jul 13, 2022 · 3 comments
Open

Transition releases data off of non-versioned API #608

richaagarwal opened this issue Jul 13, 2022 · 3 comments
Labels

Comments

@richaagarwal
Copy link
Collaborator

richaagarwal commented Jul 13, 2022

This is a follow-up to #598, in which we implemented a hotfix for a breaking change introduced in the pypi API we query, which removed the releases key from the versioned API endpoint we were using. More information here: pypi/warehouse#11775. For the hotfix, we changed the query to hit the non-versioned API endpoint instead, with the knowledge that the field is considered deprecated there as well, though it has yet to be removed.

As this comment notes, there's no current timeline to remove releases from the non-versioned API endpoint, but it would be prudent to start thinking about a transition plan. In #598 I outlined these two options:

Option 1: Re-introduce fields by using the recommended simple API instead. This in turn could be broken out into two parts reintroducing just those two fields, and then later possibly re-working all of format_plugin to rely on the simple API. (Ideally these would both be done at once, but depending on how important it is to get back to populating these fields, we could delay the latter work).
Option 2: If accessing upload_time_iso_8601 from the urls array is a reliable source for the release_date (which it appears to be), we may not need to switch APIs at all, and instead could re-work how we handle first_released. Ideally, we would only populate first_released the first time we grab data for a plugin, in which case it would be the same as release_date with no need to ever get previous version releases in any given request.

It turns out that option 2 is not very straightforward given our current S3 architecture for storing data, so I'd recommend that we revisit this work when we are ready to prioritize moving to a database.

@neuromusic let's connect on this when you're back!

@richaagarwal
Copy link
Collaborator Author

This work may end up addressing the bug reported in #611 as well, as we have a hypothesis that the lag reported there is due to PyPI's non-versioned endpoint taking a while to catch up to the latest release's data.

@neuromusic
Copy link
Collaborator

note: in order to support #712, #702, and #703, we'll need to ingest ALL release dates from PyPI

@richaagarwal
Copy link
Collaborator Author

This is currently blocked as the simple API may not be an option, as it doesn't support upload time at the moment (see jwodder/pypi-simple#5).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

2 participants