Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backslash Escape Semantics #2740

Closed
wants to merge 6 commits into from
Closed

Backslash Escape Semantics #2740

wants to merge 6 commits into from

Conversation

masenf
Copy link
Collaborator

@masenf masenf commented Dec 17, 2022

Semantics

As written in this patch, the following rules now apply for escaping the literal backslash

  1. \\ will be treated as an escape sequence for a literal backslash \. In a value, two backslashes will never result in escaping some other special character to the right.
  2. If the value of a replacement is a single backslash \, then it is treated as a literal and will never result in escaping some other special character to the right.
  3. If the value of a replacement contains a backslash and other characters, then it is NOT treated as a literal backslash and may still affect adjacent characters. This is to allow recursive env expansion to expand to values with escaped specials. (i.e. {env:FOO} where FOO="fun with \\{env:BAR}" to avoid expanding {env:BAR} after the sub). However, this also opens the door to the shenanigans seen in tox 3.27.1 to 4.0.9, environment var substitution failing #2732 where FOO="bad idea\\" and the value is something like {env:FOO}{env:BAR}... in this case, the trailing backslash from $FOO ends up escaping the next substitution. I couldn't really think of a nice way to handle this while remaining general and not over complicating the code. I don't think it will come up often.

Fixes #2732

Thanks for contribution

Please, make sure you address all the checklists (for details on how see
development documentation)!

  • ran the linter to address style issues (tox -e fix)
  • wrote descriptive pull request text
  • ensured there are test(s) validating the fix
  • added news fragment in docs/changelog folder
  • updated/extended the documentation

The resulting escaped value should not then be
subsequently used as an escape in future replacement
rounds.

Additional friends, like triple backslash, which
should escape the following, and quad backslash,
which should escape to two single backslashes
and not escape the following characters.

The Quad backslash test case also has escaped
double backslash at the beginning and end of the
string for more edge coverage.
explicit test case for issue tox-dev#2732 which affects
{/} os.sep replacement on windows platform.
When the replacement value is a backslash, it shouldn't escape the next
part of the value.

However, if the replacement contains a backslash, then it could affect
the next value.
1. Split the value to be replaced on double
backslash.
2. If there is more than 1 split, recursively
perform replace on each sub value.
3. Rejoin replaced sub values on single backslash.

This allows "\\" to be an escape for "\" while
preserving recursive expansion semantics, and
not having already-escaped backslashes affect
processing replacements in other parts of
the string.

Fixes tox-dev#2732
if the replaced value is a single backslash, then process replace on the
left and right parts of the value separately so that the replacement
cannot act as an escape in other parts of the value.

Fixes tox-dev#2732
Copy link
Member

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to make sure the CI passes 👍

@masenf masenf marked this pull request as draft December 17, 2022 09:07
@masenf
Copy link
Collaborator Author

masenf commented Dec 17, 2022

Converted back to draft to add some additional unit test coverage. The CI just happened to expose a missing edge case. And I've written a test that passes before this patch and exposes the issue:

    result = replace_one("{env:_:{posargs}{/}{posargs}}", ["foo"])
    assert result == f"foo{os.sep}foo"

splitting on the backslash and calling replace on the left and right subexpression breaks the parser because it can't find the trailing } to go with the {env, so it doesn't expand it. i actually don't think i can proceed with this approach.

next simplest solution I can think of is replacing \\ with a really special unicode point as a surrogate during the processing, and then just putting it back at the very end.

to truly solve this in a general way, might require parsing subexpressions with strict bounds. the code would be more complicated, but it could fix other weird edge cases around subexpressions expanding and breaking surrounding syntax, or having constructs that are impossible to escape within the current "whole expression" iterative replace algorithm.

the simple is surely a hack, but maybe a nice compromise to unbreak windows {/} in the short term with almost no code 😁.

@gaborbernat
Copy link
Member

It's up to you because you're the one needing to do the work. In the long term we'd obviously prefer the more complicated one 😅

@gaborbernat
Copy link
Member

@masenf you still plan to push this through?

@masenf
Copy link
Collaborator Author

masenf commented Dec 20, 2022

@gaborbernat yes, i still plan to work on it but I have been delayed with the holidays upcoming.

I will close this PR since this solution is no good, and then when I write the updated parser tests and the recursive replacer, I will open a new PR to fix #2732.

In the meantime if someone else wanted to take a stab at #2732, they are free to proceed or build upon my research here.

@masenf masenf closed this Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tox 3.27.1 to 4.0.9, environment var substitution failing
2 participants