forked from ldct/isicp
-
Notifications
You must be signed in to change notification settings - Fork 0
/
5-5-compilation.html
1740 lines (1415 loc) · 108 KB
/
5-5-compilation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html>
<head>
<meta charset="UTF-8">
<link rel="stylesheet" type="text/css" href="web-worker-interpreter/deps/codemirror/lib/codemirror.css" />
<link rel="stylesheet" type="text/css" href="css/isicp.css" />
<link rel="stylesheet" type="text/css" href="css/footnotes.css" />
<link rel="stylesheet" type="text/css" href="css/theme.css" />
<script src="js/helper.js"></script>
<script src="js/jquery.min.js"></script>
<script src="web-worker-interpreter/deps/codemirror/lib/codemirror.js"></script>
<script src="web-worker-interpreter/deps/codemirror/mode/scheme/scheme.js"></script>
<script src="web-worker-interpreter/coding.js"> </script>
<script>
set_interpreter_path("web-worker-interpreter/");
set_language("scheme");
</script>
<script src="js/footnotes.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$']]}
});
</script>
<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<title> iSICP 2.4 - Multiple Representations for Abstract Data </title>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-36868476-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
<div id="sidebox">
<div class="tab"></div>
<div class="content">
<p>
<a href="index.html" class="navlink"> <img src='images/home.svg' width=32 height=32> </a>
<span id="toc-link" class="navlink"> <img src='images/list.svg' width=32 height=32> </span>
<span id="currently-editing-link" class="navlink"> <img src='images/file-edit.svg' width=32 height=32> </span>
<script src="http://cdn.jotfor.ms/static/feedback2.js?3.2.310" type="text/javascript">
new JotformFeedback({
formId:'40222623177447',
base:'http://jotform.me/',
windowTitle:'Notify Me',
background:'#FFA500',
fontColor:'#FFFFFF',
type:false,
height:500,
width:700
});
</script>
<a class="lightbox-40222623177447" style="cursor:pointer;color:blue;text-decoration:underline;"><img src='images/envelope.svg' width=32 height=32></a>
<p>
<div id="currently-editing"> </div>
<script>
function hideAll() {
$("#currently-editing").hide();
$("#toc").hide();
}
$("#currently-editing-link").click(function() {
hideAll();
$("#currently-editing").show();
});
$("#toc-link").click(function() {
hideAll();
$("#toc").show();
});
</script>
<div id="toc"> </div>
<p style='font-size:12px'> (Click on the left edge of this green box to hide it!)
<script>
hideAll();
$("#toc").show();
</script>
</div>
</div>
<script>
$('#sidebox .tab').toggle(function(){
$('#sidebox').animate({'right':'0%'});
}, function(){
$('#sidebox').animate({'right':'-30%'});
});
$(document).ready(createTOC);
</script>
<div id="main">
<h2> Compilation </h2>
<p> The explicit-control evaluator of section 5-4 is a register machine whose controller interprets Scheme programs. In this section we will see how to run Scheme programs on a register machine whose controller is not a Scheme interpreter.
<p> The explicit-control evaluator machine is universal---it can carry out any computational process that can be described in Scheme. The evaluator's controller orchestrates the use of its data paths to perform the desired computation. Thus, the evaluator's data paths are universal: They are sufficient to perform any computation we desire, given an appropriate controller.@footnote{This is a theoretical statement. We are not claiming that the evaluator's data paths are a particularly convenient or efficient set of data paths for a general-purpose computer. For example, they are not very good for implementing high-performance floating-point calculations or calculations that intensively manipulate bit vectors.}
<p> Commercial general-purpose computers are register machines organized around a collection of registers and operations that constitute an efficient and convenient universal set of data paths. The controller for a general-purpose machine is an interpreter for a register-machine language like the one we have been using. This language is called the <tt>native language</tt> of the machine, or simply <tt>machine language</tt> . Programs written in machine language are sequences of instructions that use the machine's data paths. For example, the explicit-control evaluator's instruction sequence can be thought of as a machine-language program for a general-purpose computer rather than as the controller for a specialized interpreter machine.
<p> There are two common strategies for bridging the gap between higher-level languages and register-machine languages. The explicit-control evaluator illustrates the strategy of interpretation. An interpreter written in the native language of a machine configures the machine to execute programs written in a language (called the <tt>source language</tt> ) that may differ from the native language of the machine performing the evaluation. The primitive procedures of the source language are implemented as a library of subroutines written in the native language of the given machine. A program to be interpreted (called the <tt>source program</tt> ) is represented as a data structure. The interpreter traverses this data structure, analyzing the source program. As it does so, it simulates the intended behavior of the source program by calling appropriate primitive subroutines from the library.
<p> In this section, we explore the alternative strategy of <tt>compilation</tt> . A compiler for a given source language and machine translates a source program into an equivalent program (called the <tt>object program</tt> ) written in the machine's native language. The compiler that we implement in this section translates programs written in Scheme into sequences of instructions to be executed using the explicit-control evaluator machine's data paths.@footnote{Actually, the machine that runs compiled code can be simpler than the interpreter machine, because we won't use the <tt>exp</tt> and <tt>unev</tt> registers. The interpreter used these to hold pieces of unevaluated expressions. With the compiler, however, these expressions get built into the compiled code that the register machine will run. For the same reason, we don't need the machine operations that deal with expression syntax. But compiled code will use a few additional machine operations (to represent compiled procedure objects) that didn't appear in the explicit-control evaluator machine.}
<p> Compared with interpretation, compilation can provide a great increase in the efficiency of program execution, as we will explain below in the overview of the compiler. On the other hand, an interpreter provides a more powerful environment for interactive program development and debugging, because the source program being executed is available at run time to be examined and modified. In addition, because the entire library of primitives is present, new programs can be constructed and added to the system during debugging.
<p> In view of the complementary advantages of compilation and interpretation, modern program-development environments pursue a mixed strategy. Lisp interpreters are generally organized so that interpreted procedures and compiled procedures can call each other. This enables a programmer to compile those parts of a program that are assumed to be debugged, thus gaining the efficiency advantage of compilation, while retaining the interpretive mode of execution for those parts of the program that are in the flux of interactive development and debugging. In section 5-5-7, after we have implemented the compiler, we will show how to interface it with our interpreter to produce an integrated interpreter-compiler development system.
<h4> An overview of the compiler </h4>
<p> Our compiler is much like our interpreter, both in its structure and in the function it performs. Accordingly, the mechanisms used by the compiler for analyzing expressions will be similar to those used by the interpreter. Moreover, to make it easy to interface compiled and interpreted code, we will design the compiler to generate code that obeys the same conventions of register usage as the interpreter: The environment will be kept in the <tt>env</tt> register, argument lists will be accumulated in <tt>argl</tt>, a procedure to be applied will be in <tt>proc</tt>, procedures will return their answers in <tt>val</tt>, and the location to which a procedure should return will be kept in <tt>continue</tt>. In general, the compiler translates a source program into an object program that performs essentially the same register operations as would the interpreter in evaluating the same source program.
<p> This description suggests a strategy for implementing a rudimentary compiler: We traverse the expression in the same way the interpreter does. When we encounter a register instruction that the interpreter would perform in evaluating the expression, we do not execute the instruction but instead accumulate it into a sequence. The resulting sequence of instructions will be the object code. Observe the efficiency advantage of compilation over interpretation. Each time the interpreter evaluates an expression---for example, <tt>(f 84 96)</tt>---it performs the work of classifying the expression (discovering that this is a procedure application) and testing for the end of the operand list (discovering that there are two operands). With a compiler, the expression is analyzed only once, when the instruction sequence is generated at compile time. The object code produced by the compiler contains only the instructions that evaluate the operator and the two operands, assemble the argument list, and apply the procedure (in <tt>proc</tt>) to the arguments (in <tt>argl</tt>).
<p> This is the same kind of optimization we implemented in the analyzing evaluator of section 4-1-7. But there are further opportunities to gain efficiency in compiled code. As the interpreter runs, it follows a process that must be applicable to any expression in the language. In contrast, a given segment of compiled code is meant to execute some particular expression. This can make a big difference, for example in the use of the stack to save registers. When the interpreter evaluates an expression, it must be prepared for any contingency. Before evaluating a subexpression, the interpreter saves all registers that will be needed later, because the subexpression might require an arbitrary evaluation. A compiler, on the other hand, can exploit the structure of the particular expression it is processing to generate code that avoids unnecessary stack operations.
<p> As a case in point, consider the combination <tt>(f 84 96)</tt>. Before the interpreter evaluates the operator of the combination, it prepares for this evaluation by saving the registers containing the operands and the environment, whose values will be needed later. The interpreter then evaluates the operator to obtain the result in <tt>val</tt>, restores the saved registers, and finally moves the result from <tt>val</tt> to <tt>proc</tt>. However, in the particular expression we are dealing with, the operator is the symbol <tt>f</tt>, whose evaluation is accomplished by the machine operation <tt>lookup-variable-value</tt>, which does not alter any registers. The compiler that we implement in this section will take advantage of this fact and generate code that evaluates the operator using the instruction
<div id="">
(assign proc (op lookup-variable-value) (const f) (reg env))
</div>
<script>
prompt();
</script>
<p> This code not only avoids the unnecessary saves and restores but also assigns the value of the lookup directly to <tt>proc</tt>, whereas the interpreter would obtain the result in <tt>val</tt> and then move this to <tt>proc</tt>.
<p> A compiler can also optimize access to the environment. Having analyzed the code, the compiler can in many cases know in which frame a particular variable will be located and access that frame directly, rather than performing the <tt>lookup-variable-value</tt> search. We will discuss how to implement such variable access in section 5-5-6. Until then, however, we will focus on the kind of register and stack optimizations described above. There are many other optimizations that can be performed by a compiler, such as coding primitive operations ``in line'' instead of using a general <tt>apply</tt> mechanism (see Exercise 5-38); but we will not emphasize these here. Our main goal in this section is to illustrate the compilation process in a simplified (but still interesting) context.
<h3> Structure of the Compiler </h3>
<p> In section 4-1-7 we modified our original metacircular interpreter to separate analysis from execution. We analyzed each expression to produce an execution procedure that took an environment as argument and performed the required operations. In our compiler, we will do essentially the same analysis. Instead of producing execution procedures, however, we will generate sequences of instructions to be run by our register machine.
<p> The procedure <tt>compile</tt> is the top-level dispatch in the compiler. It corresponds to the <tt>eval</tt> procedure of section 4-1-1, the <tt>analyze</tt> procedure of section 4-1-7, and the <tt>eval-dispatch</tt> entry point of the explicit-control-evaluator in section 5-4-1. The compiler, like the interpreters, uses the expression-syntax procedures defined in section 4-1-2.@footnote{Notice, however, that our compiler is a Scheme program, and the syntax procedures that it uses to manipulate expressions are the actual Scheme procedures used with the metacircular evaluator. For the explicit-control evaluator, in contrast, we assumed that equivalent syntax operations were available as operations for the register machine. (Of course, when we simulated the register machine in Scheme, we used the actual Scheme procedures in our register machine simulation.)} <tt>Compile</tt> performs a case analysis on the syntactic type of the expression to be compiled. For each type of expression, it dispatches to a specialized <tt>code generator</tt> :
<div id="">
(define (compile exp target linkage)
(cond ((self-evaluating? exp)
(compile-self-evaluating exp target linkage))
((quoted? exp) (compile-quoted exp target linkage))
((variable? exp)
(compile-variable exp target linkage))
((assignment? exp)
(compile-assignment exp target linkage))
((definition? exp)
(compile-definition exp target linkage))
((if? exp) (compile-if exp target linkage))
((lambda? exp) (compile-lambda exp target linkage))
((begin? exp)
(compile-sequence (begin-actions exp)
target
linkage))
((cond? exp) (compile (cond->if exp) target linkage))
((application? exp)
(compile-application exp target linkage))
(else
(error "Unknown expression type -- COMPILE" exp))))
</div>
<script>
prompt();
</script>
<h4> Targets and linkages </h4>
<p> <tt>Compile</tt> and the code generators that it calls take two arguments in addition to the expression to compile. There is a <tt>target</tt> , which specifies the register in which the compiled code is to return the value of the expression. There is also a <tt>linkage descriptor</tt> , which describes how the code resulting from the compilation of the expression should proceed when it has finished its execution. The linkage descriptor can require that the code do one of the following three things:
<ul>
<li>
continue at the next instruction in sequence (this is specified by the linkage descriptor <tt>next</tt>),
</li>
<li>
return from the procedure being compiled (this is specified by the linkage descriptor <tt>return</tt>), or
</li>
<li>
jump to a named entry point (this is specified by using the designated label as the linkage descriptor).
</li>
</ul>
<p> For example, compiling the expression <tt>5</tt> (which is self-evaluating) with a target of the <tt>val</tt> register and a linkage of <tt>next</tt> should produce the instruction
<div id="">
(assign val (const 5))
</div>
<script>
prompt();
</script>
<p> Compiling the same expression with a linkage of <tt>return</tt> should produce the instructions
<div id="">
(assign val (const 5))
(goto (reg continue))
</div>
<script>
prompt();
</script>
<p> In the first case, execution will continue with the next instruction in the sequence. In the second case, we will return from a procedure call. In both cases, the value of the expression will be placed into the target <tt>val</tt> register.
<h4> Instruction sequences and stack usage </h4>
<p> Each code generator returns an <tt>instruction sequence</tt> containing the object code it has generated for the expression. Code generation for a compound expression is accomplished by combining the output from simpler code generators for component expressions, just as evaluation of a compound expression is accomplished by evaluating the component expressions.
<p> The simplest method for combining instruction sequences is a procedure called <tt>append-instruction-sequences</tt>. It takes as arguments any number of instruction sequences that are to be executed sequentially; it appends them and returns the combined sequence. That is, if <seq_1> and <seq_2> are sequences of instructions, then evaluating
<div id="">
(append-instruction-sequences <seq_1> <seq_2>)
</div>
<script>
prompt();
</script>
<p> produces the sequence
<div id="">
<seq_1>
<seq_2>
</div>
<script>
prompt();
</script>
<p> Whenever registers might need to be saved, the compiler's code generators use <tt>preserving</tt>, which is a more subtle method for combining instruction sequences. <tt>Preserving</tt> takes three arguments: a set of registers and two instruction sequences that are to be executed sequentially. It appends the sequences in such a way that the contents of each register in the set is preserved over the execution of the first sequence, if this is needed for the execution of the second sequence. That is, if the first sequence modifies the register and the second sequence actually needs the register's original contents, then <tt>preserving</tt> wraps a <tt>save</tt> and a <tt>restore</tt> of the register around the first sequence before appending the sequences. Otherwise, <tt>preserving</tt> simply returns the appended instruction sequences. Thus, for example,
<div id="">
(preserving (list <reg_1> <reg_2>) <seq_1> <seq_2>)
</div>
<script>
prompt();
</script>
<p> produces one of the following four sequences of instructions, depending on how <seq_1> and <seq_2> use <reg_1> and <reg_2>:
<pre>
<seq_1> | (save <reg_1>) | (save <reg_2>) | (save <reg_2>)
<seq_2> | <seq_1> | <seq_1> | (save <reg_1>)
| (restore <reg_1>) | (restore <reg_2>) | <seq_1>
| <seq_2> | <seq_2> | (restore <reg_1>)
| | | (restore <reg_2>)
| | | <seq_2>
</pre>
<p> By using <tt>preserving</tt> to combine instruction sequences the compiler avoids unnecessary stack operations. This also isolates the details of whether or not to generate <tt>save</tt> and <tt>restore</tt> instructions within the <tt>preserving</tt> procedure, separating them from the concerns that arise in writing each of the individual code generators. In fact no <tt>save</tt> or <tt>restore</tt> instructions are explicitly produced by the code generators.
<p> In principle, we could represent an instruction sequence simply as a list of instructions. <tt>Append-instruction-sequences</tt> could then combine instruction sequences by performing an ordinary list <tt>append</tt>. However, <tt>preserving</tt> would then be a complex operation, because it would have to analyze each instruction sequence to determine how the sequence uses its registers. <tt>Preserving</tt> would be inefficient as well as complex, because it would have to analyze each of its instruction sequence arguments, even though these sequences might themselves have been constructed by calls to <tt>preserving</tt>, in which case their parts would have already been analyzed. To avoid such repetitious analysis we will associate with each instruction sequence some information about its register use. When we construct a basic instruction sequence we will provide this information explicitly, and the procedures that combine instruction sequences will derive register-use information for the combined sequence from the information associated with the component sequences.
<p> An instruction sequence will contain three pieces of information:
<ul>
<li>
the set of registers that must be initialized before the instructions in the sequence are executed (these registers are said to be <tt>needed</tt> by the sequence),
</li>
<li>
the set of registers whose values are modified by the instructions in the sequence, and
</li>
<li>
the actual instructions (also called <tt>statements</tt> ) in the sequence.
</li>
</ul>
<p> We will represent an instruction sequence as a list of its three parts. The constructor for instruction sequences is thus
<div id="">
(define (make-instruction-sequence needs modifies statements)
(list needs modifies statements))
</div>
<script>
prompt();
</script>
<p> For example, the two-instruction sequence that looks up the value of the variable <tt>x</tt> in the current environment, assigns the result to <tt>val</tt>, and then returns, requires registers <tt>env</tt> and <tt>continue</tt> to have been initialized, and modifies register <tt>val</tt>. This sequence would therefore be constructed as
<div id="">
(make-instruction-sequence '(env continue) '(val)
'((assign val
(op lookup-variable-value) (const x) (reg env))
(goto (reg continue))))
</div>
<script>
prompt();
</script>
<p> We sometimes need to construct an instruction sequence with no statements:
<div id="">
(define (empty-instruction-sequence)
(make-instruction-sequence '() '() '()))
</div>
<script>
prompt();
</script>
<p> The procedures for combining instruction sequences are shown in section 5-5-4.
<div class="exercise">
<p> <b>Exercise 5.31:</b> In evaluating a procedure application, the explicit-control evaluator always saves and restores the <tt>env</tt> register around the evaluation of the operator, saves and restores <tt>env</tt> around the evaluation of each operand (except the final one), saves and restores <tt>argl</tt> around the evaluation of each operand, and saves and restores <tt>proc</tt> around the evaluation of the operand sequence. For each of the following combinations, say which of these <tt>save</tt> and <tt>restore</tt> operations are superfluous and thus could be eliminated by the compiler's <tt>preserving</tt> mechanism:
<div id="">
(f 'x 'y)
((f) 'x 'y)
(f (g 'x) y)
(f (g 'x) 'y)
</div>
<script>
prompt();
</script>
</div>
<div class="exercise">
<p> <b>Exercise 5.32:</b> Using the <tt>preserving</tt> mechanism, the compiler will avoid saving and restoring <tt>env</tt> around the evaluation of the operator of a combination in the case where the operator is a symbol. We could also build such optimizations into the evaluator. Indeed, the explicit-control evaluator of section 5-4 already performs a similar optimization, by treating combinations with no operands as a special case.
<ul>
<li>
Extend the explicit-control evaluator to recognize as a separate class of expressions combinations whose operator is a symbol, and to take advantage of this fact in evaluating such expressions.
</li>
<li>
Alyssa P. Hacker suggests that by extending the evaluator to recognize more and more special cases we could incorporate all the compiler's optimizations, and that this would eliminate the advantage of compilation altogether. What do you think of this idea?
</li>
</ul>
</div>
<h3> Compiling Expressions </h3>
<p> In this section and the next we implement the code generators to which the <tt>compile</tt> procedure dispatches.
<h4> Compiling linkage code </h4>
<p> In general, the output of each code generator will end with instructions---generated by the procedure <tt>compile-linkage</tt>---that implement the required linkage. If the linkage is <tt>return</tt> then we must generate the instruction <tt>(goto (reg continue))</tt>. This needs the <tt>continue</tt> register and does not modify any registers. If the linkage is <tt>next</tt>, then we needn't include any additional instructions. Otherwise, the linkage is a label, and we generate a <tt>goto</tt> to that label, an instruction that does not need or modify any registers.@footnote{This procedure uses a feature of Lisp called <tt>backquote</tt> (or <tt>quasiquote</tt> ) that is handy for constructing lists. Preceding a list with a backquote symbol is much like quoting it, except that anything in the list that is flagged with a comma is evaluated.
<p> For example, if the value of <tt>linkage</tt> is the symbol@* <tt>branch25</tt>, then the expression@* <tt>`((goto (label ,linkage)))</tt>@* evaluates to the list@* <tt>((goto (label branch25)))</tt>.@* Similarly, if the value of <tt>x</tt> is the list <tt>(a b c)</tt>, then@* <tt>`(1 2 ,(car x))</tt>@* evaluates to the list@* <tt>(1 2 a)</tt>.}
<div id="">
(define (compile-linkage linkage)
(cond ((eq? linkage 'return)
(make-instruction-sequence '(continue) '()
'((goto (reg continue)))))
((eq? linkage 'next)
(empty-instruction-sequence))
(else
(make-instruction-sequence '() '()
`((goto (label ,linkage)))))))
</div>
<script>
prompt();
</script>
<p> The linkage code is appended to an instruction sequence by <tt>preserving</tt> the <tt>continue</tt> register, since a <tt>return</tt> linkage will require the <tt>continue</tt> register: If the given instruction sequence modifies <tt>continue</tt> and the linkage code needs it, <tt>continue</tt> will be saved and restored.
<div id="">
(define (end-with-linkage linkage instruction-sequence)
(preserving '(continue)
instruction-sequence
(compile-linkage linkage)))
</div>
<script>
prompt();
</script>
<h4> Compiling simple expressions </h4>
<p> The code generators for self-evaluating expressions, quotations, and variables construct instruction sequences that assign the required value to the target register and then proceed as specified by the linkage descriptor.
<div id="">
(define (compile-self-evaluating exp target linkage)
(end-with-linkage linkage
(make-instruction-sequence '() (list target)
`((assign ,target (const ,exp))))))
(define (compile-quoted exp target linkage)
(end-with-linkage linkage
(make-instruction-sequence '() (list target)
`((assign ,target (const ,(text-of-quotation exp)))))))
(define (compile-variable exp target linkage)
(end-with-linkage linkage
(make-instruction-sequence '(env) (list target)
`((assign ,target
(op lookup-variable-value)
(const ,exp)
(reg env))))))
</div>
<script>
prompt();
</script>
<p> All these assignment instructions modify the target register, and the one that looks up a variable needs the <tt>env</tt> register.
<p> Assignments and definitions are handled much as they are in the interpreter. We recursively generate code that computes the value to be assigned to the variable, and append to it a two-instruction sequence that actually sets or defines the variable and assigns the value of the whole expression (the symbol <tt>ok</tt>) to the target register. The recursive compilation has target <tt>val</tt> and linkage <tt>next</tt> so that the code will put its result into <tt>val</tt> and continue with the code that is appended after it. The appending is done preserving <tt>env</tt>, since the environment is needed for setting or defining the variable and the code for the variable value could be the compilation of a complex expression that might modify the registers in arbitrary ways.
<div id="">
(define (compile-assignment exp target linkage)
(let ((var (assignment-variable exp))
(get-value-code
(compile (assignment-value exp) 'val 'next)))
(end-with-linkage linkage
(preserving '(env)
get-value-code
(make-instruction-sequence '(env val) (list target)
`((perform (op set-variable-value!)
(const ,var)
(reg val)
(reg env))
(assign ,target (const ok))))))))
(define (compile-definition exp target linkage)
(let ((var (definition-variable exp))
(get-value-code
(compile (definition-value exp) 'val 'next)))
(end-with-linkage linkage
(preserving '(env)
get-value-code
(make-instruction-sequence '(env val) (list target)
`((perform (op define-variable!)
(const ,var)
(reg val)
(reg env))
(assign ,target (const ok))))))))
</div>
<script>
prompt();
</script>
<p> The appended two-instruction sequence requires <tt>env</tt> and <tt>val</tt> and modifies the target. Note that although we preserve <tt>env</tt> for this sequence, we do not preserve <tt>val</tt>, because the <tt>get-value-code</tt> is designed to explicitly place its result in <tt>val</tt> for use by this sequence. (In fact, if we did preserve <tt>val</tt>, we would have a bug, because this would cause the previous contents of <tt>val</tt> to be restored right after the <tt>get-value-code</tt> is run.)
<h4> Compiling conditional expressions </h4>
<p> The code for an <tt>if</tt> expression compiled with a given target and linkage has the form
<div id="">
<<em>compilation of predicate, target <tt>val</tt>, linkage <tt>next</tt></em>>
(test (op false?) (reg val))
(branch (label false-branch))
true-branch
<<em>compilation of consequent with given target and given linkage or <tt>after-if</tt></em>>
false-branch
<<em>compilation of alternative with given target and linkage</em>>
after-if
</div>
<script>
prompt();
</script>
<p> To generate this code, we compile the predicate, consequent, and alternative, and combine the resulting code with instructions to test the predicate result and with newly generated labels to mark the true and false branches and the end of the conditional.@footnote{We can't just use the labels <tt>true-branch</tt>, <tt>false-branch</tt>, and <tt>after-if</tt> as shown above, because there might be more than one <tt>if</tt> in the program. The compiler uses the procedure <tt>make-label</tt> to generate labels. <tt>Make-label</tt> takes a symbol as argument and returns a new symbol that begins with the given symbol. For example, successive calls to <tt>(make-label 'a)</tt> would return <tt>a1</tt>, <tt>a2</tt>, and so on. <tt>Make-label</tt> can be implemented similarly to the generation of unique variable names in the query language, as follows:
<div id="">
(define label-counter 0)
(define (new-label-number)
(set! label-counter (+ 1 label-counter))
label-counter)
(define (make-label name)
(string->symbol
(string-append (symbol->string name)
(number->string (new-label-number)))))
</div>
<script>
prompt();
</script>
<p> } In this arrangement of code, we must branch around the true branch if the test is false. The only slight complication is in how the linkage for the true branch should be handled. If the linkage for the conditional is <tt>return</tt> or a label, then the true and false branches will both use this same linkage. If the linkage is <tt>next</tt>, the true branch ends with a jump around the code for the false branch to the label at the end of the conditional.
<div id="">
(define (compile-if exp target linkage)
(let ((t-branch (make-label 'true-branch))
(f-branch (make-label 'false-branch))
(after-if (make-label 'after-if)))
(let ((consequent-linkage
(if (eq? linkage 'next) after-if linkage)))
(let ((p-code (compile (if-predicate exp) 'val 'next))
(c-code
(compile
(if-consequent exp) target consequent-linkage))
(a-code
(compile (if-alternative exp) target linkage)))
(preserving '(env continue)
p-code
(append-instruction-sequences
(make-instruction-sequence '(val) '()
`((test (op false?) (reg val))
(branch (label ,f-branch))))
(parallel-instruction-sequences
(append-instruction-sequences t-branch c-code)
(append-instruction-sequences f-branch a-code))
after-if))))))
</div>
<script>
prompt();
</script>
<p> <tt>Env</tt> is preserved around the predicate code because it could be needed by the true and false branches, and <tt>continue</tt> is preserved because it could be needed by the linkage code in those branches. The code for the true and false branches (which are not executed sequentially) is appended using a special combiner <tt>parallel-instruction-sequences</tt> described in section 5-5-4.
<p> Note that <tt>cond</tt> is a derived expression, so all that the compiler needs to do handle it is to apply the <tt>cond->if</tt> transformer (from section 4-1-2) and compile the resulting <tt>if</tt> expression.
<h4> Compiling sequences </h4>
<p> The compilation of sequences (from procedure bodies or explicit <tt>begin</tt> expressions) parallels their evaluation. Each expression of the sequence is compiled---the last expression with the linkage specified for the sequence, and the other expressions with linkage <tt>next</tt> (to execute the rest of the sequence). The instruction sequences for the individual expressions are appended to form a single instruction sequence, such that <tt>env</tt> (needed for the rest of the sequence) and <tt>continue</tt> (possibly needed for the linkage at the end of the sequence) are preserved.
<div id="">
(define (compile-sequence seq target linkage)
(if (last-exp? seq)
(compile (first-exp seq) target linkage)
(preserving '(env continue)
(compile (first-exp seq) target 'next)
(compile-sequence (rest-exps seq) target linkage))))
</div>
<script>
prompt();
</script>
<h4> Compiling <tt>lambda</tt> expressions </h4>
<p> <tt>Lambda</tt> expressions construct procedures. The object code for a <tt>lambda</tt> expression must have the form
<em>construct procedure object and assign it to target register</em>
<linkage<
<p> When we compile the <tt>lambda</tt> expression, we also generate the code for the procedure body. Although the body won't be executed at the time of procedure construction, it is convenient to insert it into the object code right after the code for the <tt>lambda</tt>. If the linkage for the <tt>lambda</tt> expression is a label or <tt>return</tt>, this is fine. But if the linkage is <tt>next</tt>, we will need to skip around the code for the procedure body by using a linkage that jumps to a label that is inserted after the body. The object code thus has the form
<em>construct procedure object and assign it to target register</em>
<em>code for given linkage</em>> <em>or</em> <tt>(goto (label after-lambda))</tt>
<em>compilation of procedure body</em>
after-lambda
<p> <tt>Compile-lambda</tt> generates the code for constructing the procedure object followed by the code for the procedure body. The procedure object will be constructed at run time by combining the current environment (the environment at the point of definition) with the entry point to the compiled procedure body (a newly generated label).@footnote{[Footnote 38] We need machine operations to implement a data structure for representing compiled procedures, analogous to the structure for compound procedures described in section 4-1-3:
<div id="">
(define (make-compiled-procedure entry env)
(list 'compiled-procedure entry env))
(define (compiled-procedure? proc)
(tagged-list? proc 'compiled-procedure))
(define (compiled-procedure-entry c-proc) (cadr c-proc))
(define (compiled-procedure-env c-proc) (caddr c-proc))
</div>
<script>
prompt();
</script>
}
<div id="">
(define (compile-lambda exp target linkage)
(let ((proc-entry (make-label 'entry))
(after-lambda (make-label 'after-lambda)))
(let ((lambda-linkage
(if (eq? linkage 'next) after-lambda linkage)))
(append-instruction-sequences
(tack-on-instruction-sequence
(end-with-linkage lambda-linkage
(make-instruction-sequence '(env) (list target)
`((assign ,target
(op make-compiled-procedure)
(label ,proc-entry)
(reg env)))))
(compile-lambda-body exp proc-entry))
after-lambda))))
</div>
<script>
prompt();
</script>
<p> <tt>Compile-lambda</tt> uses the special combiner <tt>tack-on-instruction-sequence</tt> (section 5-5-4) rather than <tt>append-instruction-sequences</tt> to append the procedure body to the <tt>lambda</tt> expression code, because the body is not part of the sequence of instructions that will be executed when the combined sequence is entered; rather, it is in the sequence only because that was a convenient place to put it.
<p> <tt>Compile-lambda-body</tt> constructs the code for the body of the procedure. This code begins with a label for the entry point. Next come instructions that will cause the run-time evaluation environment to switch to the correct environment for evaluating the procedure body---namely, the definition environment of the procedure, extended to include the bindings of the formal parameters to the arguments with which the procedure is called. After this comes the code for the sequence of expressions that makes up the procedure body. The sequence is compiled with linkage <tt>return</tt> and target <tt>val</tt> so that it will end by returning from the procedure with the procedure result in <tt>val</tt>.
<div id="">
(define (compile-lambda-body exp proc-entry)
(let ((formals (lambda-parameters exp)))
(append-instruction-sequences
(make-instruction-sequence '(env proc argl) '(env)
`(,proc-entry
(assign env (op compiled-procedure-env) (reg proc))
(assign env
(op extend-environment)
(const ,formals)
(reg argl)
(reg env))))
(compile-sequence (lambda-body exp) 'val 'return))))
</div>
<script>
prompt();
</script>
<h3> Compiling Combinations </h3>
<p> The essence of the compilation process is the compilation of procedure applications. The code for a combination compiled with a given target and linkage has the form
<em>compilation of operator, target <tt>proc</tt>, linkage <tt>next</tt></em>
<em>evaluate operands and construct argument list in <tt>argl</tt></em>
<em>compilation of procedure call with given target and linkage</em>
<p> The registers <tt>env</tt>, <tt>proc</tt>, and <tt>argl</tt> may have to be saved and restored during evaluation of the operator and operands. Note that this is the only place in the compiler where a target other than <tt>val</tt> is specified.
<p> The required code is generated by <tt>compile-application</tt>. This recursively compiles the operator, to produce code that puts the procedure to be applied into <tt>proc</tt>, and compiles the operands, to produce code that evaluates the individual operands of the application. The instruction sequences for the operands are combined (by <tt>construct-arglist</tt>) with code that constructs the list of arguments in <tt>argl</tt>, and the resulting argument-list code is combined with the procedure code and the code that performs the procedure call (produced by <tt>compile-procedure-call</tt>). In appending the code sequences, the <tt>env</tt> register must be preserved around the evaluation of the operator (since evaluating the operator might modify <tt>env</tt>, which will be needed to evaluate the operands), and the <tt>proc</tt> register must be preserved around the construction of the argument list (since evaluating the operands might modify <tt>proc</tt>, which will be needed for the actual procedure application). <tt>Continue</tt> must also be preserved throughout, since it is needed for the linkage in the procedure call.
<div id="">
(define (compile-application exp target linkage)
(let ((proc-code (compile (operator exp) 'proc 'next))
(operand-codes
(map (lambda (operand) (compile operand 'val 'next))
(operands exp))))
(preserving '(env continue)
proc-code
(preserving '(proc continue)
(construct-arglist operand-codes)
(compile-procedure-call target linkage)))))
</div>
<script>
prompt();
</script>
<p> The code to construct the argument list will evaluate each operand into <tt>val</tt> and then <tt>cons</tt> that value onto the argument list being accumulated in <tt>argl</tt>. Since we <tt>cons</tt> the arguments onto <tt>argl</tt> in sequence, we must start with the last argument and end with the first, so that the arguments will appear in order from first to last in the resulting list. Rather than waste an instruction by initializing <tt>argl</tt> to the empty list to set up for this sequence of evaluations, we make the first code sequence construct the initial <tt>argl</tt>. The general form of the argument-list construction is thus as follows:
<em>compilation of last operand, targeted to <tt>val</tt></em>
(assign argl (op list) (reg val))
<em>compilation of next operand, targeted to <tt>val</tt></em>
(assign argl (op cons) (reg val) (reg argl))
...
<em>compilation of first operand, targeted to <tt>val</tt></em>
(assign argl (op cons) (reg val) (reg argl))
<p> <tt>Argl</tt> must be preserved around each operand evaluation except the first (so that arguments accumulated so far won't be lost), and <tt>env</tt> must be preserved around each operand evaluation except the last (for use by subsequent operand evaluations).
<p> Compiling this argument code is a bit tricky, because of the special treatment of the first operand to be evaluated and the need to preserve <tt>argl</tt> and <tt>env</tt> in different places. The <tt>construct-arglist</tt> procedure takes as arguments the code that evaluates the individual operands. If there are no operands at all, it simply emits the instruction
<div id="">
(assign argl (const ()))
</div>
<script>
prompt();
</script>
<p> Otherwise, <tt>construct-arglist</tt> creates code that initializes <tt>argl</tt> with the last argument, and appends code that evaluates the rest of the arguments and adjoins them to <tt>argl</tt> in succession. In order to process the arguments from last to first, we must reverse the list of operand code sequences from the order supplied by <tt>compile-application</tt>.
<div id="">
(define (construct-arglist operand-codes)
(let ((operand-codes (reverse operand-codes)))
(if (null? operand-codes)
(make-instruction-sequence '() '(argl)
'((assign argl (const ()))))
(let ((code-to-get-last-arg
(append-instruction-sequences
(car operand-codes)
(make-instruction-sequence '(val) '(argl)
'((assign argl (op list) (reg val)))))))
(if (null? (cdr operand-codes))
code-to-get-last-arg
(preserving '(env)
code-to-get-last-arg
(code-to-get-rest-args
(cdr operand-codes))))))))
(define (code-to-get-rest-args operand-codes)
(let ((code-for-next-arg
(preserving '(argl)
(car operand-codes)
(make-instruction-sequence '(val argl) '(argl)
'((assign argl
(op cons) (reg val) (reg argl)))))))
(if (null? (cdr operand-codes))
code-for-next-arg
(preserving '(env)
code-for-next-arg
(code-to-get-rest-args (cdr operand-codes))))))
</div>
<script>
prompt();
</script>
<h4> Applying procedures </h4>
<p> After evaluating the elements of a combination, the compiled code must apply the procedure in <tt>proc</tt> to the arguments in <tt>argl</tt>. The code performs essentially the same dispatch as the <tt>apply</tt> procedure in the metacircular evaluator of section 4-1-1 or the <tt>apply-dispatch</tt> entry point in the explicit-control evaluator of section 5-4-1. It checks whether the procedure to be applied is a primitive procedure or a compiled procedure. For a primitive procedure, it uses <tt>apply-primitive-procedure</tt>; we will see shortly how it handles compiled procedures. The procedure-application code has the following form:
<div id="">
(test (op primitive-procedure?) (reg proc))
(branch (label primitive-branch))
compiled-branch
<<em>code to apply compiled procedure with given target and appropriate linkage</em>>
primitive-branch
(assign <target>
(op apply-primitive-procedure)
(reg proc)
(reg argl))
<linkage>
after-call
</div>
<script>
prompt();
</script>
<p> Observe that the compiled branch must skip around the primitive branch. Therefore, if the linkage for the original procedure call was <tt>next</tt>, the compound branch must use a linkage that jumps to a label that is inserted after the primitive branch. (This is similar to the linkage used for the true branch in <tt>compile-if</tt>.)
<div id="">
(define (compile-procedure-call target linkage)
(let ((primitive-branch (make-label 'primitive-branch))
(compiled-branch (make-label 'compiled-branch))
(after-call (make-label 'after-call)))
(let ((compiled-linkage
(if (eq? linkage 'next) after-call linkage)))
(append-instruction-sequences
(make-instruction-sequence '(proc) '()
`((test (op primitive-procedure?) (reg proc))
(branch (label ,primitive-branch))))
(parallel-instruction-sequences
(append-instruction-sequences
compiled-branch
(compile-proc-appl target compiled-linkage))
(append-instruction-sequences
primitive-branch
(end-with-linkage linkage
(make-instruction-sequence '(proc argl)
(list target)
`((assign ,target
(op apply-primitive-procedure)
(reg proc)
(reg argl)))))))
after-call))))
</div>
<script>
prompt();
</script>
<p> The primitive and compound branches, like the true and false branches in <tt>compile-if</tt>, are appended using <tt>parallel-instruction-sequences</tt> rather than the ordinary <tt>append-instruction-sequences</tt>, because they will not be executed sequentially.
<h4> Applying compiled procedures </h4>
<p> The code that handles procedure application is the most subtle part of the compiler, even though the instruction sequences it generates are very short. A compiled procedure (as constructed by <tt>compile-lambda</tt>) has an entry point, which is a label that designates where the code for the procedure starts. The code at this entry point computes a result in <tt>val</tt> and returns by executing the instruction <tt>(goto (reg continue))</tt>. Thus, we might expect the code for a compiled-procedure application (to be generated by <tt>compile-proc-appl</tt>) with a given target and linkage to look like this if the linkage is a label
<div id="">
(assign continue (label proc-return))
(assign val (op compiled-procedure-entry) (reg proc))
(goto (reg val))
proc-return
(assign <target> (reg val)) ; included if target is not <tt>val</tt>
(goto (label <linkage>)) ; linkage code
</div>
<script>
prompt();
</script>
<p> or like this if the linkage is <tt>return</tt>.
<div id="">
(save continue)
(assign continue (label proc-return))
(assign val (op compiled-procedure-entry) (reg proc))
(goto (reg val))
proc-return
(assign <target> (reg val)) ; included if target is not <tt>val</tt>
(restore continue)
(goto (reg continue)) ; linkage code
</div>
<script>
prompt();
</script>
<p> This code sets up <tt>continue</tt> so that the procedure will return to a label <tt>proc-return</tt> and jumps to the procedure's entry point. The code at <tt>proc-return</tt> transfers the procedure's result from <tt>val</tt> to the target register (if necessary) and then jumps to the location specified by the linkage. (The linkage is always <tt>return</tt> or a label, because <tt>compile-procedure-call</tt> replaces a <tt>next</tt> linkage for the compound-procedure branch by an <tt>after-call</tt> label.)
<p> In fact, if the target is not <tt>val</tt>, that is exactly the code our compiler will generate.@footnote{Actually, we signal an error when the target is not <tt>val</tt> and the linkage is <tt>return</tt>, since the only place we request <tt>return</tt> linkages is in compiling procedures, and our convention is that procedures return their values in <tt>val</tt>.} Usually, however, the target is <tt>val</tt> (the only time the compiler specifies a different register is when targeting the evaluation of an operator to <tt>proc</tt>), so the procedure result is put directly into the target register and there is no need to return to a special location that copies it. Instead, we simplify the code by setting up <tt>continue</tt> so that the procedure will ``return'' directly to the place specified by the caller's linkage:
<em>set up <tt>continue</tt> for linkage</em>
(assign val (op compiled-procedure-entry) (reg proc))
(goto (reg val))
<p> If the linkage is a label, we set up <tt>continue</tt> so that the procedure will return to that label. (That is, the <tt>(goto (reg continue))</tt> the procedure ends with becomes equivalent to the <tt>(goto (label <linkage</tt>>)) at <tt>proc-return</tt> above.)
<div id="">
(assign continue (label <linkage>))
(assign val (op compiled-procedure-entry) (reg proc))
(goto (reg val))
</div>
<script>
prompt();
</script>
<p> If the linkage is <tt>return</tt>, we don't need to set up <tt>continue</tt> at all: It already holds the desired location. (That is, the <tt>(goto (reg continue))</tt> the procedure ends with goes directly to the place where the <tt>(goto (reg continue))</tt> at <tt>proc-return</tt> would have gone.)
<div id="">
(assign val (op compiled-procedure-entry) (reg proc))
(goto (reg val))
</div>
<script>
prompt();
</script>
<p> With this implementation of the <tt>return</tt> linkage, the compiler generates tail-recursive code. Calling a procedure as the final step in a procedure body does a direct transfer, without saving any information on the stack.
<p> Suppose instead that we had handled the case of a procedure call with a linkage of <tt>return</tt> and a target of <tt>val</tt> as shown above for a non-<tt>val</tt> target. This would destroy tail recursion. Our system would still give the same value for any expression. But each time we called a procedure, we would save <tt>continue</tt> and return after the call to undo the (useless) save. These extra saves would accumulate during a nest of procedure calls.@footnote{Making a compiler generate tail-recursive code might seem like a straightforward idea. But most compilers for common languages, including C and Pascal, do not do this, and therefore these languages cannot represent iterative processes in terms of procedure call alone. The difficulty with tail recursion in these languages is that their implementations use the stack to store procedure arguments and local variables as well as return addresses. The Scheme implementations described in this book store arguments and variables in memory to be garbage-collected. The reason for using the stack for variables and arguments is that it avoids the need for garbage collection in languages that would not otherwise require it, and is generally believed to be more efficient. Sophisticated Lisp compilers can, in fact, use the stack for arguments without destroying tail recursion. (See Hanson 1990 for a description.) There is also some debate about whether stack allocation is actually more efficient than garbage collection in the first place, but the details seem to hinge on fine points of computer architecture. (See Appel 1987 and Miller and Rozas 1994 for opposing views on this issue.)}
<p> <tt>Compile-proc-appl</tt> generates the above procedure-application code by considering four cases, depending on whether the target for the call is <tt>val</tt> and whether the linkage is <tt>return</tt>. Observe that the instruction sequences are declared to modify all the registers, since executing the procedure body can change the registers in arbitrary ways.@footnote{The variable <tt>all-regs</tt> is bound to the list of names of all the registers:
<div id="">
(define all-regs '(env proc val argl continue))
</div>
<script>
prompt();
</script>
}
<p> Also note that the code sequence for the case with target <tt>val</tt> and linkage <tt>return</tt> is declared to need <tt>continue</tt>: Even though <tt>continue</tt> is not explicitly used in the two-instruction sequence, we must be sure that <tt>continue</tt> will have the correct value when we enter the compiled procedure.
<div id="">
(define (compile-proc-appl target linkage)
(cond ((and (eq? target 'val) (not (eq? linkage 'return)))
(make-instruction-sequence '(proc) all-regs
`((assign continue (label ,linkage))
(assign val (op compiled-procedure-entry)
(reg proc))
(goto (reg val)))))
((and (not (eq? target 'val))
(not (eq? linkage 'return)))
(let ((proc-return (make-label 'proc-return)))
(make-instruction-sequence '(proc) all-regs
`((assign continue (label ,proc-return))
(assign val (op compiled-procedure-entry)
(reg proc))
(goto (reg val))
,proc-return
(assign ,target (reg val))
(goto (label ,linkage))))))
((and (eq? target 'val) (eq? linkage 'return))
(make-instruction-sequence '(proc continue) all-regs
'((assign val (op compiled-procedure-entry)
(reg proc))
(goto (reg val)))))
((and (not (eq? target 'val)) (eq? linkage 'return))
(error "return linkage, target not val -- COMPILE"
target))))
</div>
<script>
prompt();
</script>
<h3> Combining Instruction Sequences </h3>
<p> This section describes the details on how instruction sequences are represented and combined. Recall from section 5-5-1 that an instruction sequence is represented as a list of the registers needed, the registers modified, and the actual instructions. We will also consider a label (symbol) to be a degenerate case of an instruction sequence, which doesn't need or modify any registers. So to determine the registers needed and modified by instruction sequences we use the selectors
<div id="">
(define (registers-needed s)
(if (symbol? s) '() (car s)))
(define (registers-modified s)
(if (symbol? s) '() (cadr s)))
(define (statements s)
(if (symbol? s) (list s) (caddr s)))
</div>
<script>
prompt();
</script>
<p> and to determine whether a given sequence needs or modifies a given register we use the predicates
<div id="">
(define (needs-register? seq reg)
(memq reg (registers-needed seq)))
(define (modifies-register? seq reg)
(memq reg (registers-modified seq)))
</div>
<script>
prompt();
</script>
<p> In terms of these predicates and selectors, we can implement the various instruction sequence combiners used throughout the compiler.
<p> The basic combiner is <tt>append-instruction-sequences</tt>. This takes as arguments an arbitrary number of instruction sequences that are to be executed sequentially and returns an instruction sequence whose statements are the statements of all the sequences appended together. The subtle point is to determine the registers that are needed and modified by the resulting sequence. It modifies those registers that are modified by any of the sequences; it needs those registers that must be initialized before the first sequence can be run (the registers needed by the first sequence), together with those registers needed by any of the other sequences that are not initialized (modified) by sequences preceding it.
<p> The sequences are appended two at a time by <tt>append-2-sequences</tt>. This takes two instruction sequences <tt>seq1</tt> and <tt>seq2</tt> and returns the instruction sequence whose statements are the statements of <tt>seq1</tt> followed by the statements of <tt>seq2</tt>, whose modified registers are those registers that are modified by either <tt>seq1</tt> or <tt>seq2</tt>, and whose needed registers are the registers needed by <tt>seq1</tt> together with those registers needed by <tt>seq2</tt> that are not modified by <tt>seq1</tt>. (In terms of set operations, the new set of needed registers is the union of the set of registers needed by <tt>seq1</tt> with the set difference of the registers needed by <tt>seq2</tt> and the registers modified by <tt>seq1</tt>.) Thus, <tt>append-instruction-sequences</tt> is implemented as follows:
<div id="">
(define (append-instruction-sequences . seqs)
(define (append-2-sequences seq1 seq2)
(make-instruction-sequence
(list-union (registers-needed seq1)
(list-difference (registers-needed seq2)
(registers-modified seq1)))
(list-union (registers-modified seq1)
(registers-modified seq2))
(append (statements seq1) (statements seq2))))
(define (append-seq-list seqs)
(if (null? seqs)
(empty-instruction-sequence)
(append-2-sequences (car seqs)
(append-seq-list (cdr seqs)))))
(append-seq-list seqs))
</div>
<script>
prompt();
</script>
<p> This procedure uses some simple operations for manipulating sets represented as lists, similar to the (unordered) set representation described in section 2-3-3:
<div id="">
(define (list-union s1 s2)
(cond ((null? s1) s2)
((memq (car s1) s2) (list-union (cdr s1) s2))
(else (cons (car s1) (list-union (cdr s1) s2)))))
(define (list-difference s1 s2)
(cond ((null? s1) '())
((memq (car s1) s2) (list-difference (cdr s1) s2))
(else (cons (car s1)
(list-difference (cdr s1) s2)))))
</div>
<script>
prompt();
</script>
<p> <tt>Preserving</tt>, the second major instruction sequence combiner, takes a list of registers <tt>regs</tt> and two instruction sequences <tt>seq1</tt> and <tt>seq2</tt> that are to be executed sequentially. It returns an instruction sequence whose statements are the statements of <tt>seq1</tt> followed by the statements of <tt>seq2</tt>, with appropriate <tt>save</tt> and <tt>restore</tt> instructions around <tt>seq1</tt> to protect the registers in <tt>regs</tt> that are modified by <tt>seq1</tt> but needed by <tt>seq2</tt>. To accomplish this, <tt>preserving</tt> first creates a sequence that has the required <tt>save</tt>s followed by the statements of <tt>seq1</tt> followed by the required <tt>restore</tt>s. This sequence needs the registers being saved and restored in addition to the registers needed by <tt>seq1</tt>, and modifies the registers modified by <tt>seq1</tt> except for the ones being saved and restored. This augmented sequence and <tt>seq2</tt> are then appended in the usual way. The following procedure implements this strategy recursively, walking down the list of registers to be preserved:@footnote{Note that <tt>preserving</tt> calls <tt>append</tt> with three arguments. Though the definition of <tt>append</tt> shown in this book accepts only two arguments, Scheme standardly provides an <tt>append</tt> procedure that takes an arbitrary number of arguments.}
<div id="">
(define (preserving regs seq1 seq2)
(if (null? regs)
(append-instruction-sequences seq1 seq2)
(let ((first-reg (car regs)))
(if (and (needs-register? seq2 first-reg)
(modifies-register? seq1 first-reg))
(preserving (cdr regs)
(make-instruction-sequence
(list-union (list first-reg)
(registers-needed seq1))
(list-difference (registers-modified seq1)
(list first-reg))
(append `((save ,first-reg))
(statements seq1)
`((restore ,first-reg))))
seq2)
(preserving (cdr regs) seq1 seq2)))))
</div>
<script>
prompt();
</script>
<p> Another sequence combiner, <tt>tack-on-instruction-sequence</tt>, is used by <tt>compile-lambda</tt> to append a procedure body to another sequence. Because the procedure body is not ``in line'' to be executed as part of the combined sequence, its register use has no impact on the register use of the sequence in which it is embedded. We thus ignore the procedure body's sets of needed and modified registers when we tack it onto the other sequence.
<div id="">
(define (tack-on-instruction-sequence seq body-seq)
(make-instruction-sequence
(registers-needed seq)
(registers-modified seq)
(append (statements seq) (statements body-seq))))
</div>
<script>
prompt();
</script>
<p> <tt>Compile-if</tt> and <tt>compile-procedure-call</tt> use a special combiner called <tt>parallel-instruction-sequences</tt> to append the two alternative branches that follow a test. The two branches will never be executed sequentially; for any particular evaluation of the test, one branch or the other will be entered. Because of this, the registers needed by the second branch are still needed by the combined sequence, even if these are modified by the first branch.
<div id="">
(define (parallel-instruction-sequences seq1 seq2)
(make-instruction-sequence
(list-union (registers-needed seq1)
(registers-needed seq2))
(list-union (registers-modified seq1)
(registers-modified seq2))
(append (statements seq1) (statements seq2))))
</div>