True -- 表示函数声明,需要加返回语句。 +> - False -- 表示过程声明,不需要返回语句。 + +--- + +### 5.2.2 调用部分 + +**procedure** + +```python +# proc_stmt : ID +# | SYS_PROC +``` + +```python + if root_node.type == 'proc_stmt-simple': + self._add_new_quad('call', children[0]) +``` + +> 此处直接产生`call+name`即可,表示无条件跳转到procedure执行代码部分。 + +```python +# proc_stmt : ID LP args_list RP +# | SYS_PROC LP expression_list RP +# | kREAD LP factor RP +``` + +```python + if children[0] == 'read': + self._add_new_quad('begin_args', None) + args_val = self.gen_quad_list_from_expression_node(children[1]) + self._add_new_quad('read', args_val) +``` + +> 这是专门给`read`函数的执行语句,形式为`read args`,由于这个语法本身拥有这个语句,所以单独处理了。 + +```python +# expression_list : expression_list COMMA expression +# | expression +# args_list : args_list COMMA expression +# | expression +``` + +```python + else: + self._add_new_quad('begin_args', None) + args_list = traverse_skew_tree_gen(children[1], 'expression') + for args in args_list: + if isinstance(args, Node): + ret_val = self.gen_quad_list_in_expression_node(args) + self._add_new_quad('args', ret_val) + else: + self._add_new_quad('args', args) + self._add_new_quad('call', children[0]) +``` + +> 这里主要处理前两种语句,由于`args_list`和`expression_list`本质是一样的,因此可以联合处理。先计算参数值,这里通过将左递归语法树展平,再来从左至右逐一扫描,每一个`expression`均代表一个参数的计算,最后调用即可,不存在返回值。 + +**function** + +```python +# factor : SYS_FUNCT +# | ID LP args_list RP +# | SYS_FUNCT LP args_list RP +``` + +```python + elif expression_node.type == 'factor-func': + target = self.new_tmp_var + if len(expression_node.children) == 1: + self._add_new_quad('call', expression_node.children[0], target) + else: + func_name, args_list_node = expression_node.children + self._add_new_quad('begin_args', None) + args_list = traverse_skew_tree_gen(args_list_node, 'expression') + for args in args_list: + ret_val = self.gen_quad_list_in_expression_node(args) + self._add_new_quad('args', ret_val) + self._add_new_quad('call', func_name, target) + return target +``` + +> 函数调用位于`expression`系列中,这是因为函数调用存在返回值,可以作为赋值语句的右端。基本思路和过程调用类似,区分处理了系统自带无参函数和其余有参函数。只不过在最后需要返回存储函数返回值的临时变量名。 + +--- + + + +## 5.3 Assignment Statement + +  这是比较简单的一部分,只需要分为几种不同的左值处理即可。 + +### 5.3.1 普通变量 + +```python +# ID ASSIGN expression +``` + +```python + if root_node.type == 'assign_stmt': + if len(children) == 2: + id_, maybe_expression_node = children + if not isinstance(maybe_expression_node, Node): + self._add_new_quad(None, id_, maybe_expression_node) + else: + val = self.gen_quad_list_in_expression_node(maybe_expression_node) + self._add_new_quad(None, id_, val) +``` + +> 此处直接通过`expression`处理函数,当然首先是保证右值不是常数而是结点的前提下。最后直接赋值即可。 + +--- + +### 5.3.2 数组变量 + +```python +# ID LB expression RB assign expression +``` + +```python + elif root_node.type == 'assign_stmt-arr': + id_, index_expression_node, val_expression_node = children + index_val = self.gen_quad_list_in_expression_node(index_expression_node) + assign_val = self.gen_quad_list_in_expression_node(val_expression_node) + self._add_new_quad(None, f'{id_}[{index_val}]', assign_val) +``` + +> 此处也是直接进行两次`expression`调用获取索引值和右端值,按照教材写法,这里直接使用数组写法作为三段码输出即可。 + +--- + +### 5.3.3 结构体变量 + +```python +# assign_stmt : ID DOT ID ASSIGN expression +``` + +```python + else: + record_name, field_name, val_expression = root_node.children + address_var = self.new_tmp_var + self._add_new_quad('address', address_var, record_name, field_name) + ret_val = self.gen_quad_list_in_expression_node(val_expression) + self._add_new_quad(None, '*' + address_var, ret_val) +``` + +> 此处唯一要讲的是按照教材关于结构体成员取值的写法: +> +> ​ t = &x + field_offset(x, field_name) + +--- + + + +## 5.4 Control Statement + +  这一部分算是生成代码比较核心的一部分,主要贯穿着与汇编基本相同的写法,处理情况较多,Pascal包含了很多种循环体结构语句。但一旦厘清顺序和逻辑,本身并不复杂,由于这部分主要是控制语句,下面不会贴上代码,因为冗杂而不易理解,只展示控制流。 + +### 5.4.1 if statement + +```python +# if_stmt : kIF expression kTHEN stmt else_clause +# else_clause : kELSE stmt +# | empty +``` + +``` +t <- if_expression +if_false t goto else_label + +{if_stmt part} + +goto exit_label +else_label + +{else_stmt part}?(if exist) + +exit_label +``` + +--- + +### 5.4.2 repeat statement + +```python +# repeat_stmt : kREPEAT stmt_list kUNTIL expression +# stmt_list : stmt_list stmt SEMICON +# | empty +``` + +``` +enter_label + +{repeat_stmt} + +t <- repeat_expression +if_false t goto exit_label +goto enter_label +exit_label +``` + +--- + +### 5.4.3 while statement + +```python +# while_stmt : kWHILE expression kDO stmt +``` + +``` +judge_label +t <- while_expression +if_false t goto exit_label + +{while_stmt} + +goto judge_label +exit_label +``` + +--- + +### 5.4.4 for statement + +```python +# for_stmt : kFOR ID ASSIGN expression direction expression kDO stmt +# direction : kTO +# | kDOWNTO +``` + +``` +ID <- start_value_expression +bound_var <- end_value_expression + +judge_label + +*'to': t <- ID <= bound_var +*'downto': t <- ID >= bound_var + +if_false t goto exit_label + +{for_stmt} + +*'to': ID <- ID + 1 +*'downto': ID <- ID - 1 + +goto judge_label +exit_label +``` + +--- + +### 5.4.5 case statement + +```python +# case_stmt : kCASE expression kOF case_expr_list kEND +# case_expr_list : case_expr_list case_expr +# | case_expr +# case_expr : const_value COLON stmt SEMICON +# | ID COLON stmt SEMICON +# | kELSE COLON stmt SEMICON +``` + +``` +choose_var <- case_expression +case_list <- flatten the case_expr_list tree +for case_expr in case_list + +{judge_value <- case_expr.children[0] + next_label <- new_label + t <- choose_var == judge_value + if_false t goto next_label + {case_expr_stmt} + goto exit_label + next_label} + + exit_label +``` + +### 5.4.6 goto statement + +> 我们没有支持`goto`指令,因为这对语义分析带来了麻烦。 + +--- + + + +## 5.5 Expression Statement + +  这部分也是非常重要的部分,涉及到所有的算术运算,也是非常基本的分析部分。这块内容主要包含两个函数 + +- **def gen_quad_list_in_expression_node(self, expression_node)** +- **def gen_quad_list_from_expression_node(self, expression_node)** + +### 5.5.1 expression node + +这个函数是专门针对expression结点进行的,由于yacc中我们要求expression必须有结点,因此这里本质是算术运算的入口。 + +```python +# expression : expression GE expr +# | expression GT expr +# | expression LE expr +# | expression LT expr +# | expression EQUAL expr +# | expression UNEQUAL expr +# | expr +``` + +```python +def gen_quad_list_in_expression_node(self, expression_node): + if not isinstance(expression_node, Node): + return expression_node + if len(expression_node.children) == 1: + return self.gen_quad_list_from_expression_node(expression_node.children[0]) + else: + left_val = self.gen_quad_list_in_expression_node(expression_node.children[0]) + right_val = self.gen_quad_list_from_expression_node(expression_node.children[2]) + target = self.new_tmp_var + op = expression_node.children[1] + if op == '=': + self._add_new_quad('==', target, left_val, right_val) + elif op == '<>': + self._add_new_quad('!=', target, left_val, right_val) + else: + self._add_new_quad(op, target, left_val, right_val) + return target +``` + +> 此处只区分了是否含有一个以上的孩子结点,若没有,则根据是否为结点直接返回相应值或者临时变量;若有,则递归调用自己或者处理后部分的函数,得到后做判断即可。 + +--- + +### 5.5.2 expr/term/factor node + +这里分为四大部分,分别为`internal expression node`、`factor node`、`expr-OR/term-AND node`、`left node`。 + +**internal expression node** + +```python +# factor : LP expression RP +``` + +```python + if expression_node.type == 'expression': + return self.gen_quad_list_in_expression_node(expression_node) +``` + +> 这是一种特殊情况,虽然实际应用中,这个内部的expression node会直接接在外层的expression node下,但为了保险起见,还是在这里加一句。 + +--- + +**factor node** + +这里又分为几种不同的右值情况,与`assignment statement`左值情况类似。 + +```python +# kNOT factor +# SUBSTRACT factor +``` + +```python + elif expression_node.type == 'factor': + children = expression_node.children + unary_op, right_child = children + right_val = self.gen_quad_list_from_expression_node(right_child) + target = self.new_tmp_var + self._add_new_quad(unary_op, target, right_val) + return target +``` + +> 这是单操作符(not, -)运算。 + + + +```python +# factor : ID LB expression RB +``` + +```python + elif expression_node.type == 'factor-arr': + arr_id, right_child_node = expression_node.children + index_val = self.gen_quad_list_from_expression_node(right_child_node) + target = self.new_tmp_var + self._add_new_quad(None, target, f'{arr_id}[{index_val}]') + return target +``` + +> 这是数组元素取值。 + + + +```python +# factor : ID DOT ID +``` + +```python + elif expression_node.type == 'factor-member': + record_name, field_name = expression_node.children + target = self.new_tmp_var + self._add_new_quad('address', target, record_name, field_name) + return '*' + target +``` + +> 这是结构体取成员变量。 + + + +> `factor-func`见函数调用部分。 + +--- + +**expr-OR/term-AND node** + +这里要这么区分的原因在于,`and/or`操作连续出现时,我们必须按顺序从左往右逐一判断,对于`and`操作而言,本质是连续满足条件,一旦有一个条件不满足就需要直接结束判断,这是因为后续条件很可能需要这个前序条件成立。当然,这也是一种优化,我们不必一定要从头判断到尾,只要中间有一个打破或者满足条件了,就能离开或者进入。 + +```python +# expr : expr kOR term +``` + +```python + if expression_node.type == 'expr-OR': + bool_list = traverse_skew_tree_bool(expression_node, 'term', 'expr-OR') + jump_label = self.new_label + for or_node in bool_list: + condition_value = self.gen_quad_list_from_expression_node(or_node) + self._add_new_quad('goto', jump_label, 'if', condition_value) + target = self.new_tmp_var + self._add_new_quad(None, target, 0) + exit_label = self.new_label + self._add_new_quad('goto', exit_label) + self._add_new_quad(jump_label, None) + self._add_new_quad(None, target, 1) + self._add_new_quad(exit_label, None) + return target +``` + +> 这里最关键的就是 +> +> - **def traverse_skew_tree_bool(node, stop_node_type, target_node_type)** +> +> 这个函数功能在`overview`部分已经有介绍,目的是根据连续的`and/or`结点将表达式打断,形成一条长串,然后按照之前说的情形进行判断。 +> +> `term-AND`类似不再赘述。 + + + +⚠ 这里有一个语法本身的问题出现。 + +```pascal +if (a = 1 and a = 2 and a = 3) then +``` + +  在这个语句里,我们的语法图中为 + +![CG_00](.\imgs\CG_00.png) + +我们可以看到,在我们的语法中,`expression`的优先级高于`term`,因此产生`shitf-reduce`冲突时,优先进行`shift`操作,比如栈中出现`$a=1`时,这里有两个选择,一个是规约产生`expression`,一个是继续移入再规约产生`term`,由于默认移入,因此右值就被`term`结点抢走了,没有达到我们的要求,我们应当优先规约。 + +但我们可以换一种写法,即每一个判断式都加上括号 + +```pascal +if ((a = 1) and (a = 2) and (a = 3)) then +``` + +这样就会导致始终按照`(factor)`规约,保证`expression`的产生。 + +--- + +**left node** + +这里指的是剩余的结点操作可以统一起来 + +```python +# expr : expr ADD term +# | expr SUBTRACT term +# | term +# term : term MUL factor +# | term kDIV factor +# | term DIV factor +# | term kMOD factor +# | factor +``` + +```python +else: + if len(expression_node.children) == 1: + return self.gen_quad_list_from_expression_node(expression_node.children[0]) + left_child, right_child = expression_node.children + left_val, right_val = self.gen_quad_list_from_expression_node(left_child), \ + self.gen_quad_list_from_expression_node(right_child) + + bin_op = type_to_bin_op[expression_node.type] + target = self.new_tmp_var + self._add_new_quad(bin_op, target, left_val, right_val) + return target +``` + +> 此处处理很基本,不再赘述。 + +--- + + + +## 5.6 Optimization + diff --git a/report/imgs/CG_00.png b/report/imgs/CG_00.png new file mode 100644 index 0000000..13d1fc7 Binary files /dev/null and b/report/imgs/CG_00.png differ diff --git a/yapc/CodeGenerator.py b/yapc/CodeGenerator.py index 7eb3af2..35ada64 100644 --- a/yapc/CodeGenerator.py +++ b/yapc/CodeGenerator.py @@ -69,6 +69,11 @@ def __init__(self, ast, symbol_table): self._next_tmp_var = 0 self._routine_stack = [] + def write_file(self, file_path): + out = open(file_path, 'w') + out.writelines([str(quad) + '\n' for quad in self._quad_list]) + out.close() + @property def abstract_syntax_tree(self): return self._ast @@ -134,7 +139,7 @@ def _traverse_ast_gen_code(self, root_node): children = root_node.children if root_node.type == 'assign_stmt': # ID ASSIGN expression - if len(children) == 2: # ID ASSIGN expression + if len(children) == 2: # maybe_expression_node, because it may have been a constant folding result id_, maybe_expression_node = children if not isinstance(maybe_expression_node, Node): diff --git a/yapc/parselog.txt b/yapc/parselog.txt index 5bff3bd..90a6c34 100644 --- a/yapc/parselog.txt +++ b/yapc/parselog.txt @@ -5,471 +5,791 @@ yacc.py: 445:Action : Shift and goto state 3 yacc.py: 410: yacc.py: 411:State : 3 - yacc.py: 435:Stack : kPROGRAM . f"Syntax error at token `{p[3].value}` in type definition.") + f"Syntax error at token `{p[3].value}` in type definition.") def p_type_decl(p): @@ -175,11 +175,13 @@ def p_record_type_decl(p): 'record_type_decl : kRECORD field_decl_list kEND' p[0] = Node("record", p.lexer.lineno, p[2]) -#record定义出错 + +# record定义出错 def p_record_type_decl_error(p): 'record_type_decl : kRECORD error kEND' SemanticLogger.error(p[2].lineno, - f"Syntax error at token `{p[2].value}` in record definition.") + f"Syntax error at token `{p[2].value}` in record definition.") + def p_field_decl_list(p): '''field_decl_list : field_decl_list field_decl @@ -194,11 +196,13 @@ def p_field_decl(p): 'field_decl : name_list COLON type_decl SEMICON' p[0] = Node("field_decl", p.lexer.lineno, p[1], p[3]) -#record的成员变量定义出错 + +# record的成员变量定义出错 def p_field_decl_error(p): 'field_decl : error SEMICON' SemanticLogger.error(p[1].lineno, - f"Syntax error at token `{p[1].value}` in record member definition.") + f"Syntax error at token `{p[1].value}` in record member definition.") + def p_name_list(p): '''name_list : name_list COMMA ID @@ -231,7 +235,7 @@ def p_var_decl(p): p[0] = Node("var_decl", p.lexer.lineno, p[1], p[3]) -#var定义出错 +# var定义出错 def p_var_decl_error(p): 'var_decl : error SEMICON' SemanticLogger.error(p[1].lineno, @@ -263,7 +267,7 @@ def p_function_decl(p): p[0] = Node("function_decl", p.lexer.lineno, p[1], p[3]) -#函数体出错 +# 函数体出错 def p_function_decl_error(p): """ function_decl : function_head SEMICON error SEMICON @@ -271,6 +275,7 @@ def p_function_decl_error(p): SemanticLogger.error(p[3].lineno, f"Syntax error at token `{p[3].value}` in function definition.") + def p_function_head(p): 'function_head : kFUNCTION ID parameters COLON simple_type_decl' p[0] = Node("function_head", p.lexer.lineno, p[2], p[3], p[5]) @@ -280,7 +285,8 @@ def p_procedure_decl(p): 'procedure_decl : procedure_head SEMICON sub_routine SEMICON' p[0] = Node("procedure_decl", p.lexer.lineno, p[1], p[3]) -#procedure体出错 + +# procedure体出错 def p_procedure_decl_error(p): 'procedure_decl : procedure_head SEMICON error SEMICON' SemanticLogger.error(p[3].lineno, @@ -355,7 +361,7 @@ def p_stmt_list(p): p[0] = Node("stmt_list", p.lexer.lineno, p[1], p[2]) -#statement list出错 +# statement list出错 def p_stmt_list_error(p): 'stmt_list : stmt_list error SEMICON' SemanticLogger.error(p[2].lineno, @@ -639,18 +645,15 @@ def p_empty(p): logging.basicConfig( - level=logging.DEBUG, - filename="parselog.txt", - filemode="w", - format="%(filename)10s:%(lineno)4d:%(message)s" - ) - + level=logging.DEBUG, + filename="parselog.txt", + filemode="w", + format="%(filename)10s:%(lineno)4d:%(message)s" +) log = logging.getLogger() parser = yacc.yacc(debug=yacc.NullLogger(), debuglog=log) - if __name__ == '__main__': raise NotImplementedError("{} is just a module".format(__file__)) - diff --git a/yapc/yapc.py b/yapc/yapc.py index ba7b2d0..ce9a7fc 100755 --- a/yapc/yapc.py +++ b/yapc/yapc.py @@ -16,9 +16,10 @@ # test_file = 'test_yacc/simple.pas' test_file = 'demo_test_cases/assign_demo.pas' +out_file = './yapc.out' arg_parser = argparse.ArgumentParser() arg_parser.add_argument('--input', help='input pascal file', default=test_file) -arg_parser.add_argument('--output', help='output intermediate code path', default='./yapc.out') +arg_parser.add_argument('--output', help='output intermediate code path', default=out_file) args = arg_parser.parse_args() start = time.clock() @@ -41,7 +42,11 @@ SemanticLogger.info(None, 'producing three address code') code_generator = CodeGenerator(parse_tree_root, static_semantic_analyzer.symbol_table) code_generator.gen_three_address_code() - _ = [print(quadruple) for quadruple in code_generator.quadruple_list] + if args.output: + out_file = args.output + code_generator.write_file(out_file) + else: + _ = [print(quadruple) for quadruple in code_generator.quadruple_list] SemanticLogger.info(None, 'done') end = time.clock()