Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): impl hex and base64 for binary #5892

Merged
merged 19 commits into from
Dec 26, 2022
Merged

feat(rust, python): impl hex and base64 for binary #5892

merged 19 commits into from
Dec 26, 2022

Conversation

ozgrakkurt
Copy link
Contributor

mostly replicated the implementation for string

@ritchie46
Copy link
Member

Thanks for the PR. I think we should finish extending this the same way we do for Utf8. That means that we need a Binary namespace on expressions that is also exposed to python Expr/Series similar to what we do for str, cat and arr.

@ozgrakkurt
Copy link
Contributor Author

@ritchie46 added namespace as far as I can understand

@ritchie46
Copy link
Member

@ritchie46 added namespace as far as I can understand

Thanks. Can you expose it to python and add tests on the python side as well? That's our ultimate integration test. Let me know if you need any pointers.

@ozgrakkurt
Copy link
Contributor Author

how can I expose it to python? where should I add the code?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Dec 23, 2022

how can I expose it to python? where should I add the code?

Expression support:

  • Add the new functions to PyExpr, in py-polars/src/lazy/dsl.rs (probably with a bin prefix; eg: pub fn bin_xyz(&self, ...) -> PyExpr {, consistent with their str-prefixed counterparts).
  • Create ExprBinaryNameSpace in py-polars/polars/internals/expr/binary.py, patterned after the other Expr*NameSpace objects, with _accessor = "bin"; this is the object on which your new binary expression functions will be exposed.
  • Import the new namespace and make it available on the python-side Expr by adding to py-polars/polars/internals/expr/expr.py (just look for @accessor to see how/where).

Series support (trivial - will redirect to the expression code above):

  • Create BinaryNameSpace in py-polars/polars/internals/series/binary.py, patterned after the other Series namespaces. Create the function definition/docstrings, as per the Expr one above, but omit the function body (it'll automatically call the expression implementation via @expr_dispatch black magic ;)
  • Register the namespace in py-polars/polars/internals/series/series.py (via @accessor, as above).

@ozgrakkurt
Copy link
Contributor Author

@alexander-beedie Thanks!

what would be the type for binary to use in py-polars/polars/internals/series/binary.py? I see it uses str in string file

@ritchie46
Copy link
Member

@alexander-beedie Thanks!

what would be the type for binary to use in py-polars/polars/internals/series/binary.py? I see it uses str in string file

Do you mean the accessor name? I think this should be bin.

@ozgrakkurt
Copy link
Contributor Author

I mean str in def ends_with(self, sub: str) -> pli.Series:

@ozgrakkurt
Copy link
Contributor Author

Also shouldn't the decode functions on string return binary?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Dec 23, 2022

Ahh, gotcha; the python binary type is bytes.

@ozgrakkurt
Copy link
Contributor Author

does this look good for exposing to python part?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Dec 23, 2022

does this look good for exposing to python part?

Almost there; change the binary accessor/namespace shortcut to bin, and then add some python-side unit test coverage for the new functions to help validate it all 👍

(Looks like there is already a test_binary.py, so would suggest adding the tests there).

@ozgrakkurt
Copy link
Contributor Author

I added a test file but didn't register it anywhere. does it look good?

@ozgrakkurt
Copy link
Contributor Author

should be good to merge

@ritchie46
Copy link
Member

I think the base_64 encode must return a Utf8Chunked. It is free to cast from Utf8Chunked to BinaryChunked but not in the other direction as we need to do utf8 checking.

In base64 we already have this utf8 guarantee, so I think we should encode this in the type we return.

@ozgrakkurt
Copy link
Contributor Author

I think the base_64 encode must return a Utf8Chunked. It is free to cast from Utf8Chunked to BinaryChunked but not in the other direction as we need to do utf8 checking.

In base64 we already have this utf8 guarantee, so I think we should encode this in the type we return.

yes makes sense to me but also decode should return binary because there is no guarantee the returned value is valid string, it can be arbitrary bytes

@ozgrakkurt
Copy link
Contributor Author

Is there a way to do something like cast_and_apply on BinaryChunked and Utf8Chunked?

@ozgrakkurt
Copy link
Contributor Author

ozgrakkurt commented Dec 24, 2022

@ritchie46 made the changes but had to convert encode functions to return Series and not Utf8Chunked in BinaryChunked encode functions. Not sure the implementation is efficient. Changed Utf8Chunked decode functions to cast it to binary first and then do the decode and also return BinaryChunked

@ritchie46
Copy link
Member

@ritchie46 made the changes but had to convert encode functions to return Series and not Utf8Chunked in BinaryChunked encode functions. Not sure the implementation is efficient. Changed Utf8Chunked decode functions to cast it to binary first and then do the decode and also return BinaryChunked

Thank you @ozgrakkurt. I still have two comments on the decoding of utf8 data via hex/base64.

@ozgrakkurt
Copy link
Contributor Author

ozgrakkurt commented Dec 24, 2022

@ritchie46 can you check again? I cleaned up the decode functions, also is it possible to cast while applying for string and binary? I see it is possible for integers

@ozgrakkurt ozgrakkurt requested review from ritchie46 and removed request for ritchie46 December 24, 2022 14:25
@ozgrakkurt
Copy link
Contributor Author

closes #5118

@ritchie46
Copy link
Member

Thank you @ozgrakkurt. And merry christmas! :)

@ritchie46 ritchie46 changed the title impl hex and base64 for binary feat(rust, python): impl hex and base64 for binary Dec 26, 2022
@ritchie46 ritchie46 merged commit 3dccb4a into pola-rs:master Dec 26, 2022
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Dec 26, 2022
@ozgrakkurt
Copy link
Contributor Author

Thank you @ozgrakkurt. And merry christmas! :)

thanks, merry Christmas to you too!

zundertj pushed a commit to zundertj/polars that referenced this pull request Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants