Increased Support for Printing Escape Sequences #1502

gptsarthak · 2023-02-05T13:28:27Z

LPython now replicates python behavior with all types of combinations of escape sequences.

def test_new_line():
    print('I said \'Wow\' ')
    print("I said \"hi\" ")
    print("I said \\\"hi\" ")
    print("abc \\ ") 
    print("abc \\\ wow")

test_new_line()

(lp) sarthak@pop-os:~/lpython/examples$ python test.py
I said 'Wow' 
I said "hi" 
I said \"hi" 
abc \ 
abc \\ wow

(lp) sarthak@pop-os:~/lpython/examples$ lpython test.py
I said 'Wow' 
I said "hi" 
I said \"hi" 
abc \ 
abc \\ wow

faze-geek · 2023-02-05T15:21:41Z

src/lpython/parser/semantics.h

@@ -803,6 +803,15 @@ char* unescape(Allocator &al, LCompilers::Str &s) {
        } else if (s.p[idx] == '\\' && s.p[idx+1] == '\'') {
            x += "'";
            idx++;
+        } else if (s.p[idx] == '\\' && s.p[idx+1] == '\"') {


Could you recheck for this case -

A tricky one I came across while randomly trying out a bunch of strings. Although a very improbable input, I would like this pr to perfectly handle all cases.

print('I said \'\\\'hi\\''hey\' ') (lp) C:\Users\kunni\lpython>python try.py I said '\'hi\hey' (lp) C:\Users\kunni\lpython>src\bin\lpython try.py I said '\'hi\hey\'

Thanks !

I have observed that LPython generates incorrect output for the following cases:

print(' ''hey\' ') print(" " " hey\" ")

Meaning, when there are 2 strings in a print function, which have no comma or any operator between them. This is accepted behavior in python.
Like you said this is an impossible input, since a user should put an operator like a comma between two strings, in which case lpython executes correctly.
I do not yet understand why lpython outputs that escape sequence after 'hey', but will try after my classes end. But right now I think that this might be a more parser related issue.

But right now I think that this might be a more parser related issue.

Try checking the AST using --show-ast command line argument in lpython.

I was kinda right - lpython considers these both of the print statements I wrote as single strings and ignores the quotes in the middle, which somehow leads to the problem of printing the escape sequence.
I believe that in ("a" " b") or ('a' 'b'), a and b should be considered two different strings by lpython, which it does not.
Some AST results for further investigations:

print('a b\' ') [(ConstantStr "a b' " ())] // Correct print('a'' b\' ') [(ConstantStr "a b\' " ())] // Incorrect (Same AST but '\' also printed) print('a' ' b\' ') [(ConstantStr "a b\' " ())] // Incorrect (Whitespace ignored) print('a' , ' b\' ') [(ConstantStr "a" ()) (ConstantStr " b' " ())] // Correct (Two strings because of comma) print("a " " b\' ") [(ConstantStr "a b\' " ())] // Incorrect (Same issue with double quotes) print("a' ' b\' ") [(ConstantStr "a' ' b' " ())] // Correct (Nested quotes work)

I think we should open a new issue for this, if it is a problem that needs to be solved.

Yup, thanks for investigating. You can create a new issue or directly open a new PR and work towards fixing it.
If you get any doubts regarding the parser, do ping me I can help you.

czgdp1807

Please add a runnable test for this. See https://github.com/lcompilers/lpython/tree/main/integration_tests for the pattern.

Thirumalai-Shaktivel · 2023-02-10T13:08:47Z

integration_tests/test_str_03.py

+def test_escape_sequences():
+    print('I said \'Wow\' ')
+    print("I said \"hi\" ")
+    print("I said \\\"hi\" ")
+    print('I said \'\\\'hi\\''hey\' ')
+    print('a''b')
+    print("a" "b")


Can you please add this test to tests/parser/string2.py as well? (so that we test the AST changes)

Thirumalai-Shaktivel · 2023-02-10T13:30:11Z

tests/reference/ast_new-string2-44323ea.stdout

+(Module [(Expr (ConstantStr ""\" ())) (Expr (ConstantStr "
+\" ())) (Expr (ConstantStr "\" ())) (Expr (ConstantStr "   \" ())) (Expr (ConstantStr "\ " ())) (Expr (ConstantStr "\" ())) (Expr (ConstantStr "\|" ())) (Expr (ConstantStr "Text\\" ())) (Expr (ConstantStr "\" ())) (Expr (Call (Name x Load) [(ConstantStr "\class a:


Here, ""\", "\", ... seems to be wrong.

Can you cross check with Python's output and make the changes accordingly in semantics.h? you can use python -m ast tests/parser/string2.py to verify the Python's AST.

Thirumalai-Shaktivel · 2023-02-10T13:30:37Z

src/lpython/parser/semantics.h

@@ -774,9 +799,12 @@ static inline ast_t* concat_string(Allocator &al, Location &l,
            throw LCompilers::LPython::parser_local::ParserError(
                "The byte and non-byte literals can not be combined", l);
        }
-    } else {
+    } else {  


Suggested change

} else {

} else {

Thirumalai-Shaktivel · 2023-02-14T07:23:34Z

src/lpython/parser/semantics.h

+        } else if (s.p[idx] == '\\' && s.p[idx+1] == ' ') {
+            x += "\\ ";
+            idx++;
+        } else if (s.p[idx] == '\\') {
+            x += "\\";
+            idx++;


Comment this, check the output, uncomment or add new lines, check the output...; play around with this. Please make sure that you get the same output as CPython ast

gptsarthak · 2023-02-15T16:06:34Z

In the latest commit I have :

Added code to escape the strings in the AST. This required me to edit the asdl_cpp.py file to emit the correct code in the visit_ConstantStr function in the Pickle Visitor class.
Made the strings in LPython AST be same as you would get in python -m AST {file-name} by escaping newline
Fixed printing of backslash when used as a continuation character at the end of a line. We already had tests for this in string2.py and many other files, but they produced incorrect AST and output before.
Refactored code according to suggested changes.
Added test in parser/string2.py

There is one issue left to solve. After making the changes mentioned in first point, I observed that visit_ConstantBytes is also getting edited along the visit_ConstantStr function. I could not find a way to differentiate between the two inside make_visitor. If that is a problem, we can discuss and fix it asap. After that I believe we can merge this PR.

Thirumalai-Shaktivel · 2023-02-16T06:58:06Z

src/libasr/asdl_cpp.py

+                elif field.name == "value":
+                    self.emit(  's.append(\"\\\"\");',2)
+                    self.emit(  "std::string str = std::string(x.m_value);",2)
+                    self.emit(  "for (size_t idx=0; idx < str.size(); idx++) {",2)
+                    self.emit(      "if (str[idx] == \'\\n\') {",3)
+                    self.emit(          "s += \"\\\\n\";",4)
+                    self.emit(      "} else if (str[idx] == '\\\\') {",3)
+                    self.emit(          "s += \"\\\\\\\\\";",4)
+                    self.emit(      "} else {",3)
+                    self.emit(          "s += str[idx];",4)
+                    self.emit(      "}",3)
+                    self.emit(  "}",2)
+                    self.emit(  "s.append(\"\\\"\");",2)


I think we have to handle this in semantics.h itself.

I have tried handling this in the semantics alone, but could not do it because that changes the output of the code. It changes the value in the ConstantStr node. If we change that, we'll have to handle this again in the backend (python_ast_to_asr.cpp i guess) to get the same output as python. If that is what we need I'll try to do that myself first, otherwise you should take over this PR. Thank you

Thirumalai-Shaktivel · 2023-02-16T07:00:23Z

tests/reference/ast_new-string2-44323ea.stdout

-" ())) (Expr (ConstantStr "
-    '\'
-" ()))] [])
+(Module [(Expr (ConstantStr ""\\" ())) (Expr (ConstantStr "\n\\" ())) (Expr (ConstantStr "\\" ())) (Expr (ConstantStr "   \\" ())) (Expr (ConstantStr "\\ " ())) (Expr (ConstantStr "" ())) (Expr (ConstantStr "\\|" ())) (Expr (ConstantStr "I said 'Wow' " ())) (Expr (ConstantStr "I said "hi" " ())) (Expr (ConstantStr "I said \\"hi" " ())) (Expr (ConstantStr "I said '\\'hi\\hey' " ())) (Expr (ConstantStr "ab" ())) (Expr (ConstantStr "ab" ())) (Expr (ConstantStr "Text\\\\" ())) (Expr (ConstantStr "" ())) (Expr (Call (Name x Load) [(ConstantStr "class a:\n    def test():\n        return r"\\LaTeX"\n" ())] [])) (Expr (ConstantStr "\n    "\\\\"\n" ())) (Expr (ConstantStr "\n    '\\'\n" ()))] [])


Everything seems to be correct. Except ""\\", which should be '"\\' or "\"\\" (and "\n "\\\\"\n" too). I think we don't have to unescape " in this PR. Instead, Let's create a new issue or work on another PR.

Yes, please create a new issue to work on this.
Right now what i think is, in Lpython's ConstantStr node we use string value while python uses constant value. I think this is why the strings are wrapped in double quotes instead of single quotes, which is the only difference I see here. If fixing this is simply not unescaping ", that will be easy and I'll do it in my next commit in this PR only.
In python,
Update : Python AST has value exactly what we get using the repr() method,

mostly wrapped in single quotes (example 'I said \\"hi" ')

unless there is a single quote inside string in which case it wraps with double quotes. (example - "I said 'Wow' ")

If both single and double quotes are present inside string, it wraps with single quotes and escapes the single quotes inside. (example - 'I said "\'Wow\'" ')

I can try implementing this but I need help with how we would wrap our string value around with single quote

Thirumalai-Shaktivel · 2023-02-16T07:03:26Z

Thank you, @gptsarthak for working on this. It was very much required.
I have left some comments, you can work on it if you know what to do. Otherwise, I will try to work on it coming weekend.

Co-authored-by: gptsarthak <sarthakgpt95@gmail.com>

Thirumalai-Shaktivel · 2023-02-19T12:46:02Z

@gptsarthak I made the required changes.
Now, you can review and test it. If you have any doubts or find any mistakes, please ping me.

czgdp1807

Looks okay to me.

anutosh491 · 2023-02-23T04:39:16Z

LGTM ! I did some testing and it looks fine .

Thirumalai-Shaktivel · 2023-02-23T04:40:22Z

Thanks for the approval and testing.

Thirumalai-Shaktivel · 2023-02-23T04:41:43Z

@gptsarthak is it okay if I merge this?
Do you wanna test or make some other changes??

czgdp1807 · 2023-03-02T14:31:43Z

Conflicts should be resolved here.

gptsarthak · 2023-03-05T13:47:43Z

I believe I resolved the conflict. Its my first time dealing with them so hopefully it went right.
@Thirumalai-Shaktivel thanks for working on this! I did some tests and they ran fine. Your commits taught me a lot and gave an insight on how the issue was supposed to be solved. It can be merged now.
I believe I can help out with #1529 (comment) after this is merged.

certik · 2023-05-04T16:55:53Z

I think this PR will have to be fixed. It seems it keeps the strings escaped in ASR. I think that's not a good design, because then we need to be unescaping the strings in almost all the backends. Also, the escaping and unescaping will be different in every surface language, in particular it is different in Fortran and Python.

Rather, a proper design is what we had in master before this PR, that is:

Strings should be unescaped, represented as utf-8 strings, directly. I think they can be null terminated, that's probably fine.
The frontend's job is to unescape the strings, based on the surface level language rules.
The backend's job is to properly escape the string as needed, based on the language, it will in general be different for Julia, Python, Fortran, C, etc. Most lowering backends like LLVM and WASM will not be escaping the string at all.

faze-geek reviewed Feb 5, 2023

View reviewed changes

czgdp1807 marked this pull request as draft February 6, 2023 04:42

gptsarthak requested review from Thirumalai-Shaktivel and czgdp1807 and removed request for Thirumalai-Shaktivel February 9, 2023 16:03

czgdp1807 requested changes Feb 9, 2023

View reviewed changes

Thirumalai-Shaktivel reviewed Feb 10, 2023

View reviewed changes

Thirumalai-Shaktivel reviewed Feb 14, 2023

View reviewed changes

gptsarthak requested a review from Thirumalai-Shaktivel February 15, 2023 17:11

Thirumalai-Shaktivel reviewed Feb 16, 2023

View reviewed changes

Thirumalai-Shaktivel and others added 3 commits February 19, 2023 17:58

[Parser] Handle escape sequences

8a87406

Co-authored-by: gptsarthak <sarthakgpt95@gmail.com>

Fix the escape sequence handling in the backends

fbbe0e8

Update tests

98497a8

Thirumalai-Shaktivel force-pushed the fix branch from c095e02 to 98497a8 Compare February 19, 2023 12:43

Thirumalai-Shaktivel marked this pull request as ready for review February 20, 2023 13:54

czgdp1807 approved these changes Feb 22, 2023

View reviewed changes

czgdp1807 marked this pull request as draft March 2, 2023 14:31

Merge branch 'main' into fix

f491a96

Merge branch 'main' into fix

7685a56

gptsarthak and others added 4 commits March 5, 2023 19:40

Merge branch 'main' of github.com:gptsarthak/lpython into fix

8b230e5

Update references

1eb4a11

Merge branch 'lcompilers:main' into fix

79c74c1

Merge branch 'fix' of github.com:gptsarthak/lpython into fix

e3b7809

Thirumalai-Shaktivel marked this pull request as ready for review March 6, 2023 05:48

Thirumalai-Shaktivel merged commit 3c53ffb into lcompilers:main Mar 6, 2023

anutosh491 mentioned this pull request Mar 10, 2023

Extended escape sequence (tab and backspace) support #1529

Closed

gptsarthak deleted the fix branch March 21, 2023 07:47

certik mentioned this pull request May 4, 2023

ASR must keep strings unescaped #1759

Closed

Thirumalai-Shaktivel mentioned this pull request May 22, 2023

Sync libasr with LFortran #1847

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increased Support for Printing Escape Sequences #1502

Increased Support for Printing Escape Sequences #1502

gptsarthak commented Feb 5, 2023

faze-geek Feb 5, 2023

gptsarthak Feb 6, 2023

czgdp1807 Feb 6, 2023

gptsarthak Feb 6, 2023

Thirumalai-Shaktivel Feb 7, 2023

czgdp1807 left a comment

Thirumalai-Shaktivel Feb 10, 2023

Thirumalai-Shaktivel Feb 10, 2023 •

edited

Loading

Thirumalai-Shaktivel Feb 10, 2023

Thirumalai-Shaktivel Feb 14, 2023 •

edited

Loading

gptsarthak commented Feb 15, 2023 •

edited

Loading

Thirumalai-Shaktivel Feb 16, 2023

gptsarthak Feb 16, 2023

Thirumalai-Shaktivel Feb 16, 2023 •

edited

Loading

gptsarthak Feb 16, 2023 •

edited

Loading

Thirumalai-Shaktivel commented Feb 16, 2023

Thirumalai-Shaktivel commented Feb 19, 2023

czgdp1807 left a comment

anutosh491 commented Feb 23, 2023

Thirumalai-Shaktivel commented Feb 23, 2023

Thirumalai-Shaktivel commented Feb 23, 2023

czgdp1807 commented Mar 2, 2023

gptsarthak commented Mar 5, 2023

certik commented May 4, 2023

		(Module [(Expr (ConstantStr ""\" ())) (Expr (ConstantStr "
		\" ())) (Expr (ConstantStr "\" ())) (Expr (ConstantStr " \" ())) (Expr (ConstantStr "\ " ())) (Expr (ConstantStr "\" ())) (Expr (ConstantStr "\\|" ())) (Expr (ConstantStr "Text\\" ())) (Expr (ConstantStr "\" ())) (Expr (Call (Name x Load) [(ConstantStr "\class a:

Increased Support for Printing Escape Sequences #1502

Increased Support for Printing Escape Sequences #1502

Conversation

gptsarthak commented Feb 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

czgdp1807 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thirumalai-Shaktivel Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thirumalai-Shaktivel Feb 14, 2023 • edited Loading

Choose a reason for hiding this comment

gptsarthak commented Feb 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thirumalai-Shaktivel Feb 16, 2023 • edited Loading

Choose a reason for hiding this comment

gptsarthak Feb 16, 2023 • edited Loading

Choose a reason for hiding this comment

Thirumalai-Shaktivel commented Feb 16, 2023

Thirumalai-Shaktivel commented Feb 19, 2023

czgdp1807 left a comment

Choose a reason for hiding this comment

anutosh491 commented Feb 23, 2023

Thirumalai-Shaktivel commented Feb 23, 2023

Thirumalai-Shaktivel commented Feb 23, 2023

czgdp1807 commented Mar 2, 2023

gptsarthak commented Mar 5, 2023

certik commented May 4, 2023

Thirumalai-Shaktivel Feb 10, 2023 •

edited

Loading

Thirumalai-Shaktivel Feb 14, 2023 •

edited

Loading

gptsarthak commented Feb 15, 2023 •

edited

Loading

Thirumalai-Shaktivel Feb 16, 2023 •

edited

Loading

gptsarthak Feb 16, 2023 •

edited

Loading