Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of C++ parsing ambiguities #1

Open
certik opened this issue Dec 7, 2023 · 0 comments
Open

Handling of C++ parsing ambiguities #1

certik opened this issue Dec 7, 2023 · 0 comments

Comments

@certik
Copy link
Contributor

certik commented Dec 7, 2023

List of parsing ambiguities and how to handle them

The general strategy is to fully disambiguate at the tokenizer level, and then parse using LALR(1) grammar using Bison into AST. Then perform AST->ASR semantic analysis.

Templates and comparison

Example: func<4 > 2> and func<4 > 2 > 1 > 3 > 3 > 8 > 9 > 8 > 7 > 8>.

Solution: a comparison operator must have spaces on both sides: a > b is a comparison operator. Otherwise it is the beginning or end of template parameters.

Another example: a::b<c>d; in https://stackoverflow.com/questions/1444961/is-there-a-good-python-library-that-can-parse-c/1447051#1447051.

Solution: this is a template, thus a declaration of d of type a::b<c>. We can even require a space between a type and the variable: a::b<c> d, although in this case it might not be needed.

Another example:

template <bool T1, int T2> class B;
void f(int a = B < c, 5>);         

Solution: here B < c is a comparison operator, so the above results in a syntax error. One must write it as: void f(int a = B<c, 5>);, then it will be parsed as template parameters.

Multiplication vs pointer

Example: A * p, A *p, A* p, A*p.

Solution: * is tokenized as multiplication if it either has no spaces around it, or has spaces on both sides, so a*b and a * b are both multiplications. Otherwise it is a pointer. So above A * p and A*p are both multiplication of A and p. But A *p and A* p are both a declaration of a pointer p of type A.

Function declaration vs object declaration

Example: int f(A);, this can either be a function declaration int f(A /*a*/); or an integer declaration f initialized to A.

Solution: Either extra parentheses int f((A)) to force variable declaration, or force using int f{A}; as variable initialization, or int f = A;.

Constructor vs object

Example: T ( A ); This can either be a constructor T with an unnamed parameter of type A, or a variable of type T and name A (the parentheses are redundant).

Solution: This will be interpreted as a constructor. You must remove the redundant parentheses as T a; to interpret this as a variable declaration.

Function callbacks

One can have arbitrarily complicated function callbacks syntax.

Example: void (*set_new_handler(void (*)(void)))(void);

Solution: This is so hard to read as a human too that we can just require to split this, such as:

typedef void (*new_handler)(void);
new_handler set_new_handler(new_handler);

Links

Last version Bison parser in gcc for C:

https://github.com/gcc-mirror/gcc/blob/29231b752cbc105c3158b4b45b97f8374f87cbac/gcc/c-parse.in

Last version Bison parser for C++:

https://github.com/gcc-mirror/gcc/blob/a47a68100f94e7c0679ef8ec478a523bbbaced7b/gcc/cp/parse.y

Links:

https://stackoverflow.com/questions/6319086/are-gcc-and-clang-parsers-really-handwritten

https://news.ycombinator.com/item?id=34410776

certik added a commit that referenced this issue Dec 12, 2023
* Update README
* Add a build system
* Use Clang
* Fix CMake to find clang
* Add a C
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant