Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache doesn't update due to issue with format_plugin #598

Closed
lgan-czi opened this issue Jul 7, 2022 · 6 comments
Closed

Cache doesn't update due to issue with format_plugin #598

lgan-czi opened this issue Jul 7, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@lgan-czi
Copy link
Contributor

lgan-czi commented Jul 7, 2022

Description

The cache in S3 failed to populate all plugins when I pushed to build in the remote dev environment

Steps/Code to Reproduce

Push any branch to rdev, and then observe it's cloudwatch logs in backend, you should get an error about how 'str' object has no attribute 'items'.

Expected Results

The S3 cache should populate with information scraped from pypi and github on build.

Actual Results

The S3 cache only populates information for around 30 plugins before erroring out. I printed a stack trace and it appears that the error comes from when we gather metadata from pypi. The code tries to read a json, but gets a string instead...
Screen Shot 2022-07-07 at 3 06 00 PM

@lgan-czi lgan-czi added the bug Something isn't working label Jul 7, 2022
@lgan-czi
Copy link
Contributor Author

Update: It looks like this issue happened because the pypi api changes to not include previous version releases if we request for a specific version (url in this format: https://pypi.org/pypi/{plugin}/{version}/json) if that happens, our code for filling in the "first_released" field of our metadata will fail because it expects a "releases" field.

This immediate issue could be fixed by only querying https://pypi.org/pypi/{plugin}/json, but I'm not sure what any potential downstream consequences of this move could be.

@lgan-czi
Copy link
Contributor Author

@potating-potato , @neuromusic , @richaagarwal tagging for visibility

@richaagarwal
Copy link
Collaborator

richaagarwal commented Jul 12, 2022

Thank you for opening up this ticket @lgan-czi ! This is really helpful detail.

I did some more digging to learn more from the pypi side, and it looks like the releases field we've been relying on has been deprecated according to this page: https://warehouse.pypa.io/api-reference/json.html. In the future, they may also remove it from the project level (https://pypi.org/pypi/{plugin}/json), and they're recommending using the simple API instead to get release-specific information. This appears to have been done 5 days ago without warning (pypi/warehouse@de96b45) due to stability concerns.

This impacts our first_released field as Larry noted, as well as our release_date field.

Here are some options for hotfixes and for less urgent fixes:

Hotfixes

  • Option 1: Ignore first_released and release_date fields in updates for now. (Technically release_date appears as upload_time_iso_8601 in the urls array, but it'd be worth confirming this is a reliable source).
  • Option 2: Rely on the non-versioned API for those two fields only, even though we know that releases may be removed from there as well, but perhaps that buys us more time in implementing a less urgent fix.

Less urgent fixes:

  • Option 1: Re-introduce fields by using the recommended simple API instead. This in turn could be broken out into two parts reintroducing just those two fields, and then later possibly re-working all of format_plugin to rely on the simple API. (Ideally these would both be done at once, but depending on how important it is to get back to populating these fields, we could delay the latter work).
  • Option 2: If accessing upload_time_iso_8601 from the urls array is a reliable source for the release_date, we may not need to switch APIs at all, and instead could re-work how we handle first_released. Ideally, we would only populate first_released the first time we grab data for a plugin, in which case it would be the same as release_date with no need to ever get previous version releases in any given request.

@potating-potato and @lgan-czi let me know if you have any thoughts on the above. I'll also grab some time for the 3 of us to sync up on next steps for this.

@liu-ziyang
Copy link
Contributor

we have decided to:
hotfix with option 2 and use the non-versioned URL, and then look into simple API library to apply the proper long term fix

lgan-czi added a commit that referenced this issue Jul 12, 2022
set version=None whenever get_plugin_pypi_metadata is called to fix issue #598
@richaagarwal
Copy link
Collaborator

Just to clarify from the above, the hotfix we're moving forward with actually relies on the non-versioned API for all fields (rather than just two fields as I noted above) - but @lgan-czi did research to confirm this won't have any unintended side-effects. As this is meant to be a short-term fix, we feel confident moving forward with this (as it would be more work to treat the two fields differently).

@richaagarwal
Copy link
Collaborator

This was deployed earlier and I can confirm we're seeing data updates as expected. I'll create a ticket for the follow-up work on possibly switching APIs, or reworking how we populate first_released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

3 participants