-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increased Support for Printing Escape Sequences #1502
Conversation
src/lpython/parser/semantics.h
Outdated
@@ -803,6 +803,15 @@ char* unescape(Allocator &al, LCompilers::Str &s) { | |||
} else if (s.p[idx] == '\\' && s.p[idx+1] == '\'') { | |||
x += "'"; | |||
idx++; | |||
} else if (s.p[idx] == '\\' && s.p[idx+1] == '\"') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you recheck for this case -
A tricky one I came across while randomly trying out a bunch of strings. Although a very improbable input, I would like this pr to perfectly handle all cases.
print('I said \'\\\'hi\\''hey\' ')
(lp) C:\Users\kunni\lpython>python try.py
I said '\'hi\hey'
(lp) C:\Users\kunni\lpython>src\bin\lpython try.py
I said '\'hi\hey\'
Thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have observed that LPython generates incorrect output for the following cases:
print(' ''hey\' ')
print(" " " hey\" ")
Meaning, when there are 2 strings in a print function, which have no comma or any operator between them. This is accepted behavior in python.
Like you said this is an impossible input, since a user should put an operator like a comma between two strings, in which case lpython executes correctly.
I do not yet understand why lpython outputs that escape sequence after 'hey', but will try after my classes end. But right now I think that this might be a more parser related issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But right now I think that this might be a more parser related issue.
Try checking the AST using --show-ast
command line argument in lpython
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was kinda right - lpython considers these both of the print statements I wrote as single strings and ignores the quotes in the middle, which somehow leads to the problem of printing the escape sequence.
I believe that in ("a" " b") or ('a' 'b'), a and b should be considered two different strings by lpython, which it does not.
Some AST results for further investigations:
print('a b\' ')
[(ConstantStr "a b' " ())] // Correct
print('a'' b\' ')
[(ConstantStr "a b\' " ())] // Incorrect (Same AST but '\' also printed)
print('a' ' b\' ')
[(ConstantStr "a b\' " ())] // Incorrect (Whitespace ignored)
print('a' , ' b\' ')
[(ConstantStr "a" ()) (ConstantStr " b' " ())] // Correct (Two strings because of comma)
print("a " " b\' ")
[(ConstantStr "a b\' " ())] // Incorrect (Same issue with double quotes)
print("a' ' b\' ")
[(ConstantStr "a' ' b' " ())] // Correct (Nested quotes work)
I think we should open a new issue for this, if it is a problem that needs to be solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, thanks for investigating. You can create a new issue or directly open a new PR and work towards fixing it.
If you get any doubts regarding the parser, do ping me I can help you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a runnable test for this. See https://github.com/lcompilers/lpython/tree/main/integration_tests for the pattern.
def test_escape_sequences(): | ||
print('I said \'Wow\' ') | ||
print("I said \"hi\" ") | ||
print("I said \\\"hi\" ") | ||
print('I said \'\\\'hi\\''hey\' ') | ||
print('a''b') | ||
print("a" "b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add this test to tests/parser/string2.py
as well? (so that we test the AST changes)
(Module [(Expr (ConstantStr ""\" ())) (Expr (ConstantStr " | ||
\" ())) (Expr (ConstantStr "\" ())) (Expr (ConstantStr " \" ())) (Expr (ConstantStr "\ " ())) (Expr (ConstantStr "\" ())) (Expr (ConstantStr "\|" ())) (Expr (ConstantStr "Text\\" ())) (Expr (ConstantStr "\" ())) (Expr (Call (Name x Load) [(ConstantStr "\class a: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, ""\"
, "\"
, ... seems to be wrong.
Can you cross check with Python's output and make the changes accordingly in semantics.h
? you can use python -m ast tests/parser/string2.py
to verify the Python's AST.
src/lpython/parser/semantics.h
Outdated
@@ -774,9 +799,12 @@ static inline ast_t* concat_string(Allocator &al, Location &l, | |||
throw LCompilers::LPython::parser_local::ParserError( | |||
"The byte and non-byte literals can not be combined", l); | |||
} | |||
} else { | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { | |
} else { |
src/lpython/parser/semantics.h
Outdated
} else if (s.p[idx] == '\\' && s.p[idx+1] == ' ') { | ||
x += "\\ "; | ||
idx++; | ||
} else if (s.p[idx] == '\\') { | ||
x += "\\"; | ||
idx++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment this, check the output, uncomment or add new lines, check the output...; play around with this. Please make sure that you get the same output as CPython ast
In the latest commit I have :
There is one issue left to solve. After making the changes mentioned in first point, I observed that |
src/libasr/asdl_cpp.py
Outdated
elif field.name == "value": | ||
self.emit( 's.append(\"\\\"\");',2) | ||
self.emit( "std::string str = std::string(x.m_value);",2) | ||
self.emit( "for (size_t idx=0; idx < str.size(); idx++) {",2) | ||
self.emit( "if (str[idx] == \'\\n\') {",3) | ||
self.emit( "s += \"\\\\n\";",4) | ||
self.emit( "} else if (str[idx] == '\\\\') {",3) | ||
self.emit( "s += \"\\\\\\\\\";",4) | ||
self.emit( "} else {",3) | ||
self.emit( "s += str[idx];",4) | ||
self.emit( "}",3) | ||
self.emit( "}",2) | ||
self.emit( "s.append(\"\\\"\");",2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to handle this in semantics.h
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried handling this in the semantics alone, but could not do it because that changes the output of the code. It changes the value
in the ConstantStr
node. If we change that, we'll have to handle this again in the backend (python_ast_to_asr.cpp
i guess) to get the same output as python. If that is what we need I'll try to do that myself first, otherwise you should take over this PR. Thank you
" ())) (Expr (ConstantStr " | ||
'\' | ||
" ()))] []) | ||
(Module [(Expr (ConstantStr ""\\" ())) (Expr (ConstantStr "\n\\" ())) (Expr (ConstantStr "\\" ())) (Expr (ConstantStr " \\" ())) (Expr (ConstantStr "\\ " ())) (Expr (ConstantStr "" ())) (Expr (ConstantStr "\\|" ())) (Expr (ConstantStr "I said 'Wow' " ())) (Expr (ConstantStr "I said "hi" " ())) (Expr (ConstantStr "I said \\"hi" " ())) (Expr (ConstantStr "I said '\\'hi\\hey' " ())) (Expr (ConstantStr "ab" ())) (Expr (ConstantStr "ab" ())) (Expr (ConstantStr "Text\\\\" ())) (Expr (ConstantStr "" ())) (Expr (Call (Name x Load) [(ConstantStr "class a:\n def test():\n return r"\\LaTeX"\n" ())] [])) (Expr (ConstantStr "\n "\\\\"\n" ())) (Expr (ConstantStr "\n '\\'\n" ()))] []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything seems to be correct. Except ""\\"
, which should be '"\\'
or "\"\\"
(and "\n "\\\\"\n"
too). I think we don't have to unescape "
in this PR. Instead, Let's create a new issue or work on another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please create a new issue to work on this.
Right now what i think is, in Lpython's ConstantStr
node we use string value
while python uses constant value
. I think this is why the strings are wrapped in double quotes instead of single quotes, which is the only difference I see here. If fixing this is simply not unescaping "
, that will be easy and I'll do it in my next commit in this PR only.
In python,
Update : Python AST has value exactly what we get using the repr() method,
- mostly wrapped in single quotes (example
'I said \\"hi" '
) - unless there is a single quote inside string in which case it wraps with double quotes. (example -
"I said 'Wow' "
) - If both single and double quotes are present inside string, it wraps with single quotes and escapes the single quotes inside. (example -
'I said "\'Wow\'" '
)
I can try implementing this but I need help with how we would wrap our string value
around with single quote
Thank you, @gptsarthak for working on this. It was very much required. |
Co-authored-by: gptsarthak <sarthakgpt95@gmail.com>
c095e02
to
98497a8
Compare
@gptsarthak I made the required changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay to me.
LGTM ! I did some testing and it looks fine . |
Thanks for the approval and testing. |
@gptsarthak is it okay if I merge this? |
Conflicts should be resolved here. |
I believe I resolved the conflict. Its my first time dealing with them so hopefully it went right. |
I think this PR will have to be fixed. It seems it keeps the strings escaped in ASR. I think that's not a good design, because then we need to be unescaping the strings in almost all the backends. Also, the escaping and unescaping will be different in every surface language, in particular it is different in Fortran and Python. Rather, a proper design is what we had in master before this PR, that is:
|
Closes #1473
LPython now replicates python behavior with all types of combinations of escape sequences.