Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate regex linter rules to use parsed patterns #5416

Closed
8 tasks done
rzvxa opened this issue Sep 2, 2024 · 1 comment · Fixed by #6129
Closed
8 tasks done

Migrate regex linter rules to use parsed patterns #5416

rzvxa opened this issue Sep 2, 2024 · 1 comment · Fixed by #6129
Assignees
Labels
C-enhancement Category - New feature or request good first issue Experience Level - Good for newcomers

Comments

@rzvxa
Copy link
Collaborator

rzvxa commented Sep 2, 2024

Merging #5256 provides support for optional parsed regex literals, We can enhance our regex-related linter rules to use this parsed pattern whenever available.

Here is the list of the rules operating on regex patterns which are written before the introduction of regex parser.

@Boshen
Copy link
Member

Boshen commented Sep 3, 2024

I'll add "good first issue" when the required PRs are merged.

rzvxa added a commit that referenced this issue Sep 4, 2024
Part of #5416, Paves the road for upcoming refactors by adding the `oxc_regular_expression` dependency and a helper method for ease of access.
rzvxa added a commit that referenced this issue Sep 4, 2024
Part of #5416, Paves the road for upcoming refactors by adding the `oxc_regular_expression` dependency and a helper method for ease of access.
@Boshen Boshen added the good first issue Experience Level - Good for newcomers label Sep 4, 2024
camc314 pushed a commit that referenced this issue Sep 21, 2024
…ace-all` rule (#5943)

- part of #5416

Replaces the `is_simple_string` method with a more robust check against the parsed terms from the regular expression.
camc314 pushed a commit that referenced this issue Sep 21, 2024
…s-ends-with` (#5949)

- part of #5416

This change enhances the accuracy of the `prefer_string_starts_ends_with` rule by using the parsed regex patterns for analysis. It allows for more precise detection of patterns that can be replaced with `startsWith()` and `endsWith()` methods, reducing false positives and improving the overall effectiveness of the linter.

### What changed?

- Replaced the simple string-based regex analysis with a more robust AST-based approach.
- Removed the `is_simple_string` function as it's no longer needed.
Boshen pushed a commit that referenced this issue Sep 22, 2024
- part of #5416

Use the `oxc_regular_expression` parser to make these checks more robust. a few snapshots are updated because they now output more accurate diagnostics based on the regex AST. for example, `/   ?/` now correctly only highlights two spaces rather than three (because the last one is part of a quantifier)
DonIsaac pushed a commit that referenced this issue Sep 22, 2024
…5974)

- part of #5416

Replaces the handwritten regex parsing logic with the `oxc_regular_expression` parser, which should be more accurate and enables support for unicode sets.
DonIsaac pushed a commit that referenced this issue Sep 23, 2024
…ule (#5980)

- part of #5416

Uses the parsed regular expression patterns for detecting empty character classes. This is more robust than the handwritten pattern matching from before and allows us to provide more accurate diagnostics and actually point to the empty character class in the literal.
DonIsaac pushed a commit that referenced this issue Sep 23, 2024
)

- part of #5416

This pull request includes significant improvements to the `no_hex_escape` rule in the `oxc_linter` crate. The changes enhance the detection and replacement of hexadecimal escapes within regular expressions by introducing a more comprehensive AST traversal.

  - Implemented a new `visit_terms` function and its helper functions to traverse the regex AST and apply checks on individual terms.
  - Introduced the `check_character` function to replace hexadecimal escapes with Unicode escapes within regex patterns.
  - Updated snapshots to reflect the new diagnostic messages and replacements for hexadecimal escapes in regex patterns.
@camchenry camchenry self-assigned this Sep 26, 2024
Boshen pushed a commit that referenced this issue Sep 28, 2024
- closes #5416

Rewrites the `no-control-regex` rule to use a regular expression AST visitor instead of the `regex` crate and parsing by hand. This change simplifies the code and makes it easier to maintain.

One notable change in the snapshots is the printing of the control characters. Previously, we always printed from the source text. Now, we print a representation of the control character itself based on its numeric value. This resulted in the nonprintable chars being printed, which are invisible. The other reason for this change is that the spans output by the regex parser for unicode escapes do not match 1:1 when raw strings and escapes are involved. This resulted in goofy looking spans in the output:

```
  ⚠ eslint(no-control-regex): Unexpected control character: '*\\x'
   ╭─[no_control_regex.tsx:1:22]
 1 │ new RegExp('\\u{1111}*\\x1F', 'u')
   ·                      ────
   ╰────
```

Not sure where the bug lies there yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category - New feature or request good first issue Experience Level - Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants