Consider 2-character EOL before line continuation #12035

dhruvmanila · 2024-06-26T03:17:29Z

Summary

This PR fixes a bug introduced in #12008 which didn't consider the two character newline after the line continuation character.

For example, consider the following code highlighted with whitespaces:

call(foo # comment \\r\n
\r\n
def bar():\r\n
....pass\r\n

The lexer is at def when it's running the re-lexing logic and trying to move back to a newline character. It encounters \n and it's being escaped (incorrect) but \r is being escaped, so it moves the lexer to \n character. This creates an overlap in token ranges which causes the panic.

Name 0..4
Lpar 4..5
Name 5..8
Comment 9..20
NonLogicalNewline 20..22 <-- overlap between
Newline 21..22           <-- these two tokens
NonLogicalNewline 22..23
Def 23..26
...

fixes: #12028

Test Plan

Add a test case with line continuation and windows style newline character.

github-actions · 2024-06-26T03:36:57Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

dhruvmanila · 2024-06-26T04:42:41Z

crates/ruff_python_parser/src/lexer.rs

+            match ch {
+                '\n' => {
+                    current_position -= ch.text_len();
+                    if let Some(carriage_return) = reverse_chars.next_if_eq(&'\r') {
+                        current_position -= carriage_return.text_len();
                    }
-                    current_position -= TextSize::new('\\'.text_len().to_u32() * backslash_count);
                }
+                '\r' => {
+                    current_position -= ch.text_len();
+                }


The main change is to split the match with \r and \n into it's own branch and then a common check for line continuation character.

MichaReiser

Looks good to me. But I agree with you that this is all backwards lexing again. Could we use the previous token kind (e.g. by storing it on the lexer) and test if it is a NonLogicalNewline to avoid all the complexity of guessing if something's a newline or not?

Or is this not sufficient if there are multiple newlines (in which case storing just one previous token isn't enough?)

I guess there's also:

[a, 
	# comment
	# more comment

def foo

where there's more in between def and ,

MichaReiser · 2024-06-26T06:35:00Z

crates/ruff_python_parser/src/lexer.rs

+            match ch {
+                '\n' => {
+                    current_position -= ch.text_len();
+                    if let Some(carriage_return) = reverse_chars.next_if_eq(&'\r') {
+                        current_position -= carriage_return.text_len();
                    }
-                    current_position -= TextSize::new('\\'.text_len().to_u32() * backslash_count);
                }
+                '\r' => {
+                    current_position -= ch.text_len();
+                }


I think we can simplify this logic a bit. It doesn't seem to matter how many new line characters there are, we just want to skip all of them

if matches!(c, '\n' | '\r') { while Some(newline) = reverse_chars.next_if(|c| matches!(c, '\n' | '\r')) { current_position -= newline.text_len(); } // after any new line, handle line continuation }

I don't think this will work because we want the position of the last unescaped newline character. We would still need to keep track of that position, so we might as well be explicit about this.

dhruvmanila · 2024-06-26T07:23:10Z

Looks good to me. But I agree with you that this is all backwards lexing again. Could we use the previous token kind (e.g. by storing it on the lexer) and test if it is a NonLogicalNewline to avoid all the complexity of guessing if something's a newline or not?

Or is this not sufficient if there are multiple newlines (in which case storing just one previous token isn't enough?)

I guess there's also:
[a, 
	# comment
	# more comment

def foo
where there's more in between def and ,

Yeah, that's correct. There can be multiple trivia tokens in between which makes it hard to just look at the last token.

Although we could split the logic and move "finding the non-logical newline" part to the TokenSource who has access to all the tokens and then it'll ask the lexer to move back to the provided location. I'm not sure if this is a good way to model this behavior.

dhruvmanila added bug Something isn't working parser Related to the parser labels Jun 26, 2024

dhruvmanila requested a review from MichaReiser as a code owner June 26, 2024 03:17

dhruvmanila mentioned this pull request Jun 26, 2024

Re-lexing should avoid line continuation in a comment #12036

Closed

dhruvmanila commented Jun 26, 2024

View reviewed changes

Consider 2-character EOL before line continuation

61d7288

dhruvmanila force-pushed the dhruv/line-continuation-eol branch from 0ed0107 to 61d7288 Compare June 26, 2024 04:48

MichaReiser approved these changes Jun 26, 2024

View reviewed changes

dhruvmanila merged commit 47c9ed0 into main Jun 26, 2024
20 checks passed

dhruvmanila deleted the dhruv/line-continuation-eol branch June 26, 2024 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider 2-character EOL before line continuation #12035

Consider 2-character EOL before line continuation #12035

dhruvmanila commented Jun 26, 2024 •

edited

Loading

github-actions bot commented Jun 26, 2024 •

edited

Loading

dhruvmanila Jun 26, 2024

MichaReiser left a comment •

edited

Loading

MichaReiser Jun 26, 2024

dhruvmanila Jun 26, 2024

dhruvmanila commented Jun 26, 2024 •

edited

Loading

Consider 2-character EOL before line continuation #12035

Consider 2-character EOL before line continuation #12035

Conversation

dhruvmanila commented Jun 26, 2024 • edited Loading

Summary

Test Plan

github-actions bot commented Jun 26, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

dhruvmanila Jun 26, 2024

Choose a reason for hiding this comment

MichaReiser left a comment • edited Loading

Choose a reason for hiding this comment

MichaReiser Jun 26, 2024

Choose a reason for hiding this comment

dhruvmanila Jun 26, 2024

Choose a reason for hiding this comment

dhruvmanila commented Jun 26, 2024 • edited Loading

dhruvmanila commented Jun 26, 2024 •

edited

Loading

github-actions bot commented Jun 26, 2024 •

edited

Loading

`ruff-ecosystem` results

MichaReiser left a comment •

edited

Loading

dhruvmanila commented Jun 26, 2024 •

edited

Loading