Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIN-712 use cloudpickle #857

Merged
merged 4 commits into from
Dec 14, 2022
Merged

LIN-712 use cloudpickle #857

merged 4 commits into from
Dec 14, 2022

Conversation

lionsardesai
Copy link
Contributor

Description

Start using cloudpickle. In general, cloudpickle supports more use cases but specifically it works around the constraint of the pickle class that requires class objects to be registered in globals(). This will let linea keep hijacking the default globals and work with a separate copy in executor.

Fixes LIN-712

Type of change

Please delete options that are not relevant.

  • [ x ] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

All existing tests.

Copy link
Contributor

@mingjerli mingjerli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocker but two comments.

I think your PR gets rid of pandas dependency, so we probably want to remove it from requirements.txt and setup.py (maybe in other places I'm not aware).

I am also curious about scenarios that cloudpickle doesn't work and need to fallback to vanilla pickle. I thought cloudpickle is still using pickle under the hood.

@@ -57,6 +57,7 @@ def version(path):
"fsspec",
"pandas",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of pandas dependency, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still use your older version of code which uses get_handle for different remote filepaths. so only for that i do need pandas right now. if we can refactor that piece too somehow then ya we should be able to get rid of this piece.

@lionsardesai
Copy link
Contributor Author

No blocker but two comments.

I think your PR gets rid of pandas dependency, so we probably want to remove it from requirements.txt and setup.py (maybe in other places I'm not aware).

I am also curious about scenarios that cloudpickle doesn't work and need to fallback to vanilla pickle. I thought cloudpickle is still using pickle under the hood.

  1. addressed the pandas point in the sub-comment
  2. i'm not sure about the implementation of cloudpickle. but i added vanilla pickle in case someone already has pickled using our older version of lineapy that did not use cloudpickle. if it fails with cloudpickle it will atleast try using pickle library before finally failing. a secondary goal was in case there is a scenario where cloudpickle fails and pickle does not, there's no downside to adding this in because pickle comes inbuilt with python.

@lionsardesai lionsardesai merged commit a853e5d into main Dec 14, 2022
@lionsardesai lionsardesai deleted the LIN-712-cloudpickle branch December 14, 2022 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants