Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Should pypiserver switch default hash algorithm to sha256? #452

Closed
jayeff opened this issue Oct 5, 2022 · 11 comments · Fixed by #459
Closed

Discussion: Should pypiserver switch default hash algorithm to sha256? #452

jayeff opened this issue Oct 5, 2022 · 11 comments · Fixed by #459

Comments

@jayeff
Copy link
Contributor

jayeff commented Oct 5, 2022

Current default configuration of pypiserver does no longer work with poetry >= 1.2 as pypiserver uses md5 by default for hashing. There is an open issue for this here: python-poetry/poetry#6301 (and workaround with using --hash-algo=sha256 here: python-poetry/poetry#4523 (comment)).

This may still be fixed by a future release of poetry. Still I wonder if it would be useful to switch the default hashing algorithm to sha256 as this is current recommendation by PEP 503

Repositories SHOULD choose a hash function from one of the ones guaranteed to be available via the hashlib module in the Python standard library (currently md5, sha1, sha224, sha256, sha384, sha512). The current recommendation is to use sha256.

(note: highlight is mine)

Changing the algorithm would be a breaking change

@lucsorel
Copy link

🙏 thank you @jayeff for opening this issue.

We have our own instance of pypiserver and would like to upgrade poetry from 1.1.15 to 1.2.0.

If there is a command to update the hashes (from md5 to sha256) in the pypiserver side, I am also interested.

@dee-me-tree-or-love
Copy link
Member

dee-me-tree-or-love commented Oct 31, 2022

Hello @jayeff and @lucsorel for bringing this up!

That's a very important point and I also see the importance of going towards sha256 as a default algorithm for pypiserver too. However, as @jayeff correctly remarks, that would be a breaking change. So I truly hope to include it in the 2.0.0 release along with a few other new modernisation changes. For now --hash-algo=sha256 is indeed the best workaround we have at hand, but I would love to move towards sha256 as default.

@lucsorel I believe that restarting the pypiserver instance with --hash-algo=sha256 while keeping the same package directories should be out of the box and allow updating the hashes. It will invalidate any caches of course, but then if you generate a Lockfile on the poetry side, it will use the sha256 hashes provided by the running instance of pypiserver. 💭 Important to note that best is to double check that this works out by maybe starting an instance running in parallel and redirecting the traffic. I hope that this is the right approach, and I stand very open to corrections! (I am just starting to dive into this part of pypiserver, so maybe I am still missing a few pieces of the puzzle 🙏) ✌️🙃

@lucsorel
Copy link

lucsorel commented Oct 31, 2022

thank you very much @dee-me-tree-or-love for your answer. Just to make sure about your advice of restarting the server with --hash-algo=sha256: would that break the download of artefacts that are referred to by "old" poetry.lock files that use md5 hashes too?

If I should update the poetry.lock files with sha256 hashes beforehand, then it is the chicken and egg dilemma 😃

@dee-me-tree-or-love
Copy link
Member

dee-me-tree-or-love commented Oct 31, 2022

Hey @lucsorel, happy to hear this may be helpful! That's a good point 😅 I am not too familiar with the internals of poetry, but afaik, there exists poetry lock --no-update command that should update the lockfile with the right metadata without reinstalling any packages. I am not 100% sure that this will do the trick, but I think that could be an option. There's also a discussion about it on StackOverflow and in the poetry project itself: python-poetry/poetry#3248

Could you share if it works as needed if you give it a try? :)

@lucsorel
Copy link

I am not too familiar with the internals of poetry, but afaik, there exists poetry lock --no-update command that should update the lockfile with the right metadata without reinstalling any packages

That is a good lead to follow 👍

Could you share if it works as needed if you give it a try? :)

Yes, that may happen next week, after the holidays. And I will give some feedback here for the other interested people

@dee-me-tree-or-love
Copy link
Member

Hey @lucsorel, this sounds great! Yes, of course, any insights will be very much appreciated! :) Thanks a good luck there ✌️

@jayeff
Copy link
Contributor Author

jayeff commented Nov 1, 2022

@lucsorel I believe that restarting the pypiserver instance with --hash-algo=sha256 while keeping the same package directories should be out of the box and allow updating the hashes.

I validated this for our switch and this works without issues. Restart pypiserver with --hash-algo=sha256 and it will start serve packages with sha256 hashes. Remove the option, restart pypiserver and it will serve packages again with md5 hashes.

Just to make sure about your advice of restarting the server with --hash-algo=sha256: would that break the download of artefacts that are referred to by "old" poetry.lock files that use md5 hashes too?

Yes, it will.

If I should update the poetry.lock files with sha256 hashes beforehand, then it is the chicken and egg dilemmas

Yes, it is 😊
We had the PR with poetry.lock change ready to merge. Restarted pypiserver with hash-algo change and right afterwards merged the prepared PR. This worked for us but depending on your context you may need consider additional things.

I am not too familiar with the internals of poetry, but afaik, there exists poetry lock --no-update command that should update the lockfile with the right metadata without reinstalling any packages

I did not try this. For our case I did run poetry install and install failed on all packages served by pypiserver as expected ("Retrieved digest for link (md5:) not in poetry.lock metadata ['sha256:']"). I copied the sha256-hashes and replaced them manually in our poetry.lock file. As it was less than a handful packages for us this worked out fine.

On more thing: When we upgraded some unrelated packages afterwards the old md5 hashes did find their way back into the poetry.lock file. I did not do a detailed analysis why this happened but my working theory is that it pulled the md5-hashes from locally cached packages. After clearing the poetry cache md5-hashes were gone. Alternatively you could consider bumping versions for all packages served from your pypiserver. This should install new packages with sha256 hashes in your local poetry caches.

Hope this helps

jayeff added a commit to jayeff/pypiserver that referenced this issue Nov 1, 2022
@lucsorel
Copy link

lucsorel commented Nov 2, 2022

🙏 thank you @jayeff for your detailed contribution 😃

@jeffwidman
Copy link

@dee-me-tree-or-love may I suggest that rather than trying to shoehorn everything into a single large breaking release that you instead start doing smaller breaking changes and not be scared to have 2.0, 3.0, 4.0 etc... ??

I've been a maintainer on a lot of open source projects and found over time that's a much more successful route to actually getting things shipped.

For example you could ship this here, and then in 3.0 drop support for python < 3.8 etc...

@dee-me-tree-or-love
Copy link
Member

@jeffwidman thanks a lot for the suggestion! Of course, I fully agree that's definitely a good idea! In fact I've been quite unhappy with my release frequency (sadly often linked to my availability 🙈) in general over the past times. So I'm certainly aiming to increase the cadence by enabling faster changes but still keeping an eye at long-term support :)

I'd be very happy to hear some more tips from your experience! :) I'm aiming to open a new announcement regarding the releases soon, so if you could chime in there, I'd really appreciate. I'll give you a ping then :)

Hope that sounds alright?

jayeff added a commit to jayeff/pypiserver that referenced this issue Sep 8, 2023
@dee-me-tree-or-love
Copy link
Member

@jayeff thanks a lot for your PR! That's a great step ⛰️

I have one related idea: before releasing 2.0.0 I'd like to add a little warning log message & a brief migration guide for projects relying on pypiserver, what do you think? I'll try to give it a go today and will ping you in the PR if you have time - if you have any suggestions and comments, I would really appreciate any, if not, also no problems! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants