Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added escape_regex operation to the str namespace and as a global function #19257

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

barak1412
Copy link
Contributor

Fixes #19207.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Oct 16, 2024
@barak1412
Copy link
Contributor Author

barak1412 commented Oct 16, 2024

@orlp Just to be sure -

The code:

import polars as pl

df = pl.DataFrame({"text": ["abc", "def", None, "abc(\\w+)"]})

df.with_columns(escaped=pl.escape_regex('text'))

Should escape the 'text' literal, not the column, right?

Besides, I need to:

  1. Add tests.
  2. Add pl.escape_regex as function.
  3. Refactor the code such that the namespace Expr will be translated to the function Expr.
  4. Add to docs.

@orlp
Copy link
Collaborator

orlp commented Oct 16, 2024

@barak1412 pl.escape_regex should only work on Python strings, and not interact with the expression API at all. I would suggest adding a warning/error if you try to pass in an expression into it, suggesting you to use Expr.str.escape_regex instead.

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 47.36842% with 30 lines in your changes missing coverage. Please review.

Project coverage is 80.02%. Comparing base (d89fdcd) to head (27d460b).
Report is 23 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-plan/src/dsl/string.rs 0.00% 8 Missing ⚠️
...rates/polars-plan/src/dsl/function_expr/strings.rs 0.00% 7 Missing ⚠️
...lars-ops/src/chunked_array/strings/escape_regex.rs 33.33% 6 Missing ⚠️
.../polars-ops/src/chunked_array/strings/namespace.rs 0.00% 4 Missing ⚠️
crates/polars-python/src/expr/string.rs 0.00% 3 Missing ⚠️
.../polars-python/src/lazyframe/visitor/expr_nodes.rs 0.00% 1 Missing ⚠️
py-polars/polars/expr/string.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19257      +/-   ##
==========================================
- Coverage   80.04%   80.02%   -0.02%     
==========================================
  Files        1528     1531       +3     
  Lines      209564   209876     +312     
  Branches     2415     2421       +6     
==========================================
+ Hits       167741   167963     +222     
- Misses      41275    41362      +87     
- Partials      548      551       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

use regex::escape;

#[inline]
pub fn escape_regex_str(s: &str) -> String {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this function so pl.escape_regex and str.escape_regex will be coupled by same implementation.

@barak1412
Copy link
Contributor Author

@orlp I will be glad if you can look, thanks.

}

pub fn escape_regex(ca: &StringChunked) -> StringChunked {
unary_elementwise(ca, escape_regex_helper)
Copy link
Collaborator

@orlp orlp Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a dependency on regex_syntax (we already have this indirectly) in our main Cargo.toml, add a dependency with regex_syntax = { workspace = true } in the polars-ops Cargo.toml, and use a loop similar to this?

pub fn escape_regex(ca: &StringChunked) -> StringChunked {
    let mut buffer = String::new();
    let mut builder = StringChunkedBuilder::new(ca.name().clone(), ca.len());
    for opt_s in ca.iter() {
        if let Some(s) = opt_s {
            buffer.clear();
            regex_syntax::escape_into(s, &mut buffer);
            builder.append_value(&buffer);
        } else {
            builder.append_null();
        }
    }
}

This prevents us from having to allocate for each string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some Cargo.lock mess, will get to it tommorow evening.


"""
if isinstance(s, pl.Expr):
msg = "escape_regex function is unsupported for `Exp`, you may want use `Expr.str.escape_regex` instead"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Exp -> Expr.

@barak1412 barak1412 marked this pull request as draft October 18, 2024 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose regex::escape in Polars Python API
2 participants