Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.10 installing multiprocess has incompatibility with apache-beam due to dill version #125

Closed
rpeloff-id opened this issue Dec 1, 2022 · 4 comments

Comments

@rpeloff-id
Copy link

rpeloff-id commented Dec 1, 2022

On Python 3.10 it is not possible to install apache-beam==2.43.0 together with multiprocess>=0.70.12. This is due to Python 3.10 only being supported by multiprocess>=0.70.12 which requires dill>=0.3.4 and is in conflict with the apache-beam requirement for dill>=0.3.1.1,<0.3.2.

These libraries are used together, for example, in the datasets library.

Is there a specific reason for the higher version requirement of the dill package? If not, is it safe to ignore the dill dependency of multiprocess and install a version that satisfies the apache-beam requirement?

I.e. first install apache-beam then multiprocess without dependencies:

pip install apache-beam==2.43.0
pip install --no-deps multiprocess==0.70.14
@mmckerns
Copy link
Member

mmckerns commented Dec 1, 2022

Is there a specific reason for the higher version requirement of the dill package?

I'm trying to minimize the different versions that need support. dill>=0.3.1.1,<0.3.2 is late 2019, and a lot has changed since then. However...

If not, is it safe to ignore the dill dependency of multiprocess and install a version that satisfies the apache-beam requirement?

...yes, you can probably get away with doing this. Fundamentally, the issue is that python doesn't guarantee a pickled object from a new version of pickle can be loaded with an old version of pickle... and carries forward into dill. So, I believe, apache-beam has fixed on a particular old version of dill for server and client. But short answer is yes.

@rpeloff
Copy link

rpeloff commented Jan 9, 2023

Hi again, thanks for the response. While this works as a temporary solution, is there any possibility to lower or remove the dill constraint in a future version of multiprocess? This is making it very difficult to build a package that depends on the datasets package for newer versions of Python.

See also my discussion here apache/beam#24458.

@mmckerns
Copy link
Member

mmckerns commented Jan 10, 2023

is there any possibility to lower or remove the dill constraint in a future version of multiprocess?

Looking at version diagnostics, there's still a lot of downloads/users of old versions of dill. Instead of tightening the versions for management's sake, I can take a look at it from a breakage standpoint and see if I can loosen constraints a bit.

@mmckerns
Copy link
Member

Python made a number of backward incompatible changes on the code object between python 3.10 and 3.11. So, the unfortunate thing is that while python 3.7, 3.8, etc can use older dill, more recent python versions break dill (and multiprocess) pretty hard when you roll back the dill versions too far. Apart from python messing with the code object in a backward incompatible way, it looks like it would have been ok.

However... I'm unfortunately going to need to keep pinning on more recent versions of dill or 3.11 will break. It's something that datasets will need to deal with when supporting 3.11.

Closing, but feel free to comment, etc here.

@mmckerns mmckerns closed this as not planned Won't fix, can't repro, duplicate, stale Jan 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants