-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Added escape_regex
operation to the str
namespace and as a global function
#19257
base: main
Are you sure you want to change the base?
Conversation
@orlp Just to be sure - The code: import polars as pl
df = pl.DataFrame({"text": ["abc", "def", None, "abc(\\w+)"]})
df.with_columns(escaped=pl.escape_regex('text')) Should escape the Besides, I need to:
|
@barak1412 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19257 +/- ##
==========================================
- Coverage 80.04% 80.02% -0.02%
==========================================
Files 1528 1531 +3
Lines 209564 209876 +312
Branches 2415 2421 +6
==========================================
+ Hits 167741 167963 +222
- Misses 41275 41362 +87
- Partials 548 551 +3 ☔ View full report in Codecov by Sentry. |
use regex::escape; | ||
|
||
#[inline] | ||
pub fn escape_regex_str(s: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this function so pl.escape_regex
and str.escape_regex
will be coupled by same implementation.
@orlp I will be glad if you can look, thanks. |
} | ||
|
||
pub fn escape_regex(ca: &StringChunked) -> StringChunked { | ||
unary_elementwise(ca, escape_regex_helper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a dependency on regex_syntax
(we already have this indirectly) in our main Cargo.toml
, add a dependency with regex_syntax = { workspace = true }
in the polars-ops
Cargo.toml
, and use a loop similar to this?
pub fn escape_regex(ca: &StringChunked) -> StringChunked {
let mut buffer = String::new();
let mut builder = StringChunkedBuilder::new(ca.name().clone(), ca.len());
for opt_s in ca.iter() {
if let Some(s) = opt_s {
buffer.clear();
regex_syntax::escape_into(s, &mut buffer);
builder.append_value(&buffer);
} else {
builder.append_null();
}
}
}
This prevents us from having to allocate for each string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some Cargo.lock mess, will get to it tommorow evening.
|
||
""" | ||
if isinstance(s, pl.Expr): | ||
msg = "escape_regex function is unsupported for `Exp`, you may want use `Expr.str.escape_regex` instead" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Exp
-> Expr
.
Fixes #19207.