Optimize Regex match check #3779

HalidOdat · 2024-04-02T04:04:51Z

No description provided.

jedel1043 · 2024-04-02T04:09:46Z

@raskad This is a pretty big optimization because it removes O(n^2) complexity from our match implementation, but we are not sure how to adapt this to be able to work on unicode matching. Can you give this a look?

github-actions · 2024-04-02T04:19:19Z

Test262 conformance changes

Test result	main count	PR count	difference
Total	50,268	50,268	0
Passed	42,773	42,773	0
Ignored	1,391	1,391	0
Failed	6,104	6,104	0
Panics	18	18	0
Conformance	85.09%	85.09%	0.00%

HalidOdat · 2024-04-02T05:31:49Z

Reduced the failing tests from 48 to 4 unicode ones.

HalidOdat · 2024-04-02T15:01:40Z

After some more debugging I think it's an issue with regresses InputUtf16 iterator, managed to get a minimal testcase. We are trying to match against the lower part of the unicode 𝌆 this should not be possible in unicode mode since it's unicode aware. (In non-unicode mode it matches as expected).

/// 262 test/built-ins/RegExp/prototype/exec/u-lastindex-adv.js
///
/// Test case:
///
/// ```JavaScript
/// assert.sameValue(/\udf06/u.exec('\ud834\udf06'), null);
/// ```
#[test]
fn utf16_correct_unicode_scan() {
    // '𝌆' This is "Tetragram For Centre"
    // See: https://www.compart.com/en/unicode/U+1D306
    const INPUT: &[u16] = &[0xd834, 0xdf06];
    const MATCHER: &[u16] = &[0xdf06];

    let regex = Regex::from_unicode(MATCHER.iter().copied().map(u32::from), Flags::from("u"))
        .expect("valid regex");
    let m = regex.find_from_utf16(INPUT, 0).next();

    println!("{m:#?}");

    assert!(m.is_none());
}

There is a match:

Some(
    Match {
        range: 1..2,
        captures: [],
        named_captures: {},
    },
)

core/engine/src/builtins/regexp/mod.rs

raskad · 2024-04-02T19:34:12Z

The tests seem fine with the new regress version so I think the only thing missing is the todo comment

HalidOdat · 2024-04-02T20:13:24Z

I put the comments this is ready for review/merge :)

raskad

Great find and optimization!

Initial optimization

76827f3

HalidOdat added the performance Performance related changes and issues label Apr 2, 2024

jedel1043 requested a review from raskad April 2, 2024 04:09

Fix sticky flag

77bc4a8

jedel1043 reviewed Apr 2, 2024

View reviewed changes

core/engine/src/builtins/regexp/mod.rs Outdated Show resolved Hide resolved

core/engine/src/builtins/regexp/mod.rs Outdated Show resolved Hide resolved

HalidOdat and others added 2 commits April 2, 2024 19:50

Apply review

3d10824

Update regress to 0.9.1

86eb225

Add spec deviation comments

166b44a

HalidOdat requested a review from a team April 2, 2024 20:13

HalidOdat added this to the v0.18.1 milestone Apr 2, 2024

raskad approved these changes Apr 2, 2024

View reviewed changes

jedel1043 approved these changes Apr 2, 2024

View reviewed changes

raskad added this pull request to the merge queue Apr 2, 2024

Merged via the queue into main with commit b1f0780 Apr 2, 2024
13 checks passed

jedel1043 deleted the optimize-regex branch April 2, 2024 22:29

raskad modified the milestones: v0.18.1, v0.19.0 Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Regex match check #3779

Optimize Regex match check #3779

HalidOdat commented Apr 2, 2024

jedel1043 commented Apr 2, 2024

github-actions bot commented Apr 2, 2024 •

edited

Loading

HalidOdat commented Apr 2, 2024

HalidOdat commented Apr 2, 2024

raskad commented Apr 2, 2024

HalidOdat commented Apr 2, 2024

raskad left a comment

Optimize Regex match check #3779

Optimize Regex match check #3779

Conversation

HalidOdat commented Apr 2, 2024

jedel1043 commented Apr 2, 2024

github-actions bot commented Apr 2, 2024 • edited Loading

Test262 conformance changes

HalidOdat commented Apr 2, 2024

HalidOdat commented Apr 2, 2024

raskad commented Apr 2, 2024

HalidOdat commented Apr 2, 2024

raskad left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 2, 2024 •

edited

Loading