CS 6983 Special Topics: The Technology of Lambda -*- Outline -*- Olin Shivers Homework 5: Closure conversion part II: transform Due Monday 2025/4/15 midnight by email We've marked up the CPS program with all the annotations we need to transform it to closure-converted form. Again, after closure conversion, all variables can be viewed as if they were registers (on a machine with an arbitrary number of registers). Our target language is no longer lambda calculus of any kind. The target program executes on a machine that has - a program counter, - a stack, which we can think of as a sequence of frames, - a heap where we can allocate closure tuples for first-class function values, and - an arbitrary number of registers. The target language is called AM, for "abstract machine." It is a kind of SSA form; we can regard variables as registers, and every variable has only one lexical definition. The language has two name spaces: code labels and var/register names. The language has been carefully defined to hide word size and be independent of what direction the stack grows -- from high memory downwards, or from low memory upwards. ------------------------------------------------------------------------------- * Grammar of AM --------------- ;;; An INSN is a straight-line computation. ;;; An INSN+ terminates in a control transfer somewhere. ;;; As each kind of code includes the ability to call a function ;;; in the middle of the computation, non-termination is also ;;; a possibility in both cases. ;;; ;;; This is essentially SSA: each variable is assigned/bound ;;; at only one spot (bound across a jump, or by an := statement). prog ::= insn+ insn-seq ::= (insn ...) ; Possibly empty insn-seq+ ::= (insn ... insn+) ; Never empty insn ::= (BLOCK . insn-seq) ; Compound instruction | (LABEL (label code) ...) ; Locally introduce code blocks | (:= var rhs) ; Bind var / assign reg | (FRAME-STORE n var) ; Store VAR's value into frame slot #n insn+ ::= (BLOCK . insn-seq+) ; Compound instruction | (IF var insn+ insn+) | (TAIL target arg ... retpc) ; Tail call: pop frame & jump to proc | (CALL target arg ... retpc) ; Simple call: jump to proc, no pop | (RETURN target arg ...) ; Return: pop frame & jump to cont | (GO target arg ...) ; Local jump, no frame-pop retpc, target ::= var | labref ; Can jump through reg or to labelled location labref ::= (% label) ; Address of labelled code code ::= (CODE (var ...) . insn-seq+) ; Continuation | (PROC (var ...) cvar n . insn-seq+) ; Procedure; frame size N. rhs ::= (prim arg ...) ; Note: no continuation arg | (CALL target arg ...) ; Note: no continuation arg | (CLOSURE arg ...) ; Allocate closure tuple on heap | memref ; Load from stack frame or a closure tuple arg ::= labref | const ; A constant value | loc ; A value source: reg ref or memory load | (OFFSET loc n) ; For making proc values from closure tuples loc ::= var | memref ; The location of a previously computed value memref ::= (FRAME-REF n) ; Load slot #n off the stack frame | (TUPLE-REF var n) ; Load slot #n off the closure tuple ------------------------------------------------------------------------------- * AM discussion and rationale ----------------------------- Label references are marked with the (% label) form, to distinguish them from variable/register references. ** INSN's and INSN+'s ===================== INSN forms include the declaration of labelled code blocks. We permit these declarations anywhere to make it easy for tree-walking code to add these as locally needed. INSN forms do not include IF conditionals, to ensure that all join points in the code are explicitly labelled (and take parameters). The compiled body of a lambda is an INSN+, which is essentially a binary tree of IF's with control transfers at the leaves. There are three kinds of control transfers: - (CALL target arg ... retpc) This jumps to TARGET (which can be a register or a fixed label), which should give the address of a PROC form. The call passes the given arguments plus the return-address RETPC. This form is used to make a non-tail call -- it returns to a shared join point (RETPC) that will run using the caller's stack frame, so it doesn't pop the current stack frame. Note that most non-tail calls are *not* made this way, but are rather made on the right-hand side of a (:= var (CALL ...)) INSN. This INSN+ jump is used only when the return point is a shared join point (e.g., locally bound by the caller so it can be used in both arms of an IF. See a following section for further discussion. - (TAIL target arg ... retpc) This jumps to TARGET (which can be a register or a fixed label), which should give the address of a PROC form. The call passes the given arguments plus the return-address RETPC. We use this form when we want a function to terminate in a tail call, so the current frame of the stack is popped as we jump to TARGET. - (RETURN target arg ...) This jumps to TARGET, which should be the return address of some call. We use this form when we want a function to terminate by returning to its caller, so the current frame of the stack is popped as we jump to TARGET. - (GO target arg ...) This jumps to TARGET, which should be the address of code local to the current function -- that is, the target of the jump has been compiled to run with the same stack frame as the source of the jump. So the stack is *not* popped. Here's the table of the four INSN+ jump instructions, which are distinguished by two properties: where we are jumping (frame-allocating PROC, or just a CODE form that runs on an pre-existing stack frame), and whether or not we pop the caller's frame as part of the jump: stack op POP NO-POP +-------+-----+ jump PROC |TAIL | CALL| target CONT |RETURN | GO | +-------+-----+ We write a procedure body as (PROC (var ...) kvar n . insn-seq+) This stands for code that 1. Binds/loads the VAR variables/registers with the values being passed from the caller. 2. Binds/loads variable/register KVAR with the return address being passed from the caller. 3. Adds an N-word frame onto the tip of the stack. 4. Runs the code in INSN-SEQ+ A CODE form is similar, but - no return address is passed from source of the jump, and there is no corresponding KVAR to be bound, and - no new stack frame is pushed on entry. Named CODE blocks are used for join points in intra-procedural control-flow graphs; the CODE's parameter list provide the equivalent of phi functions in an SSA representation. Just as PROC forms arise from (some) user lambdas in the CPS, CODE blocks arise from (some) continuation lambdas in the CPS source. A PROC form can be called from three places in the grammar: - A (TAIL target arg ... retpc) or (CALL target arg ... retpc) call performed at the terminal leaf of some INSN+ IF-tree. - A function call done on the right-hand side of an := binding/assignment form, e.g., (:= y-small? (call < y 5)) Note that in this case, the RHS CALL form does *not* provide a continuation argument. A register/variable V is assigned / bound to values in one of two ways: - As the result of a (:= v rhs) insn. There are five kinds of right-hand side: 1. A primop application, e.g., (:= x ($* y z)) 2. A non-tail call to a procedure, e.g. (:= x (call leap-year? y)) 3. The allocation of a closure tuple, e.g. (:= clo (closure (% lam87) ; Slot 0 is code pointer n ; Slot 1 is free-var value a)) ; Slot 2 is free-var value 4. A load from the stack frame, e.g., (:= x (frame-ref 5)) The frame is indexed starting from 0. 5. A load from a closure tuple, e.g. (:= x (tuple-ref clo 3)) The tuple is indexed starting at 0. - As the result of a control jump: all control jumps pass values to the registers/variables named at the target CODE or PROC form. (For a comparison, this serves the purpose provided by the phi functions of SSA, at code join points.) The TAIL/CALL/RETURN/GO form doing the control transfer can pass six kinds of arguments to the variables/registers waiting to be bound/assigned at the jump target: 1. A code label, e.g., (% lam87) 2. A constant, e.g., 5. 3. A variable/register, e.g., x. 4. The address of a specific slot in a memory tuple, e.g., (offset clo 2)) If variable CLO's value is the address of some tuple, then the (offset clo 2) form produces that address plus a 2-word offset, that is, the address of slot #2 in the tuple. This is used to convert the address of a closure tuple into a procedure, when we have multiple procedures sharing a single closure record. 5. A stack-frame load, e.g., (FRAME-REF 3) 6. A closure-tuple load, e.g., (TUPLE-REF clo 2) Arguments passed across control jumps are restricted to be values already computed; actual computation happens in := forms. We include memory loads as legal arguments to set us up to handle things if we ever extended the compiler to handle register spilling. E.g., if we perform a function call with twenty arguments on a processor with sixteen registers, we have no chance of getting all the arguments into physical registers before the call. We also do this for arguments to the closure-allocating form, for similar reasons. In general, any atomic computational form in AM that could take an arbitrary number of inputs is defined so that the inputs can come from either variables (registers) or memory. When a program has a LETREC that binds multiple variables to closure lambdas, we arrange for all of these procedures to share one closure tuple. For example, if we have a pair of mutually recursive F and G functions (letrec ((f (fun (n1 k1) ... g ... x ...)) ; F & G must be closed. (g (fun (n2 k2) ... f ... y ...))) ; Closure includes X & Y. (h f g ktop)) ; Export F & G closures to H. then the corresponding AM code makes only *one*, shared 4-slot closure tuple: (:= clo (closure (% f-lam) ; <-- Procedure F points here (% g-lam) ; <-- Procedure G points here x y)) ;; Procedure F is (offset clo 0) = CLO ;; Procedure G is (offset clo 1) Now the F-LAM code accesses X from its closure parameter CF as (tuple-ref cf 2) while the G-LAM code accesses X from its closure parameter CG as (tuple-ref cg 1) because CG already points one slot off from the beginning of its containing tuple. Stack-frame slots are written with the FRAME-STORE insn, e.g., (frame-store 7 X) stores variable/register X into slot #7 of the current stack frame. INSN terms permit labels to be declared for PROC and CODE code blocks, using the LABEL form. These labels obey lexical scope. In AM, primops no longer take continuation arguments, as they are now invoked on the right-hand side of := definitions. So if our source CPS term is ($+ x y (cont (z) ...following code in scope of x, y & z...)) in AM we'd say (block (:= z ($+ x y)) ...following code in scope of x, y & z...) Note that in CPS, the continuation parameter K in a user FUN form (fun (x y k) ...) is bound to a continuation value -- a closure tuple packaging up some code with some environment. But in AM, the "continuation parameter" RET of PROC form (proc (x y) ret 3 ...) is bound to *just* a code pointer, the return address of the caller. The rest of the continuation (the saved variable bindings that the code at RET will use) is the stack frame at the tip of the stack on entry to the PROC. When the PROC's body returns by jumping to RET, it will ensure that the stack is restored back to its state when the caller jumped to the PROC form (*before* the PROC allocated *its* three-slot frame), so the stack's tip frame when we jump (return) to RET will be the same as when we originally jumped to the PROC. (The necessary stack-frame popping is done by the TAIL and RETURN INSN+ forms.) The := assignment/binding form only defines a single variable. We could have a multi-var binder (:= (var ...) rhs) form (after all, continuations can be called on multiple arguments), but we won't bother for this basic compiler, as our source language doesn't have a multiple-value return primop like Scheme's VALUES function. It would be a simple extension to add, however. (Feel free to add it, if you like.) ** Scope ======== AM is lexically scoped, with a scoping principle tuned specifically for its use as a low-level, pre-register-allocated IR. A code label is visible from its point of declaration inside a PROC or CODE block down to the end of the code block. (Unless shadowed by a following definition of a different label using the same name.) Additionally, a label declaration has circular scope. For example, a LABEL declaration (label (f (code (w x) ... (% f) ... (% g) ...)) ; Labels F & G (g (code (y z) ... (% f) ... (% g) ...))) ; visible in here. ...Labels F & G visible from here forward to end of containing PROC or CODE form... makes labels F and G visible - in the code of the innermost containing PROC or CODE form, from the point of the LABEL form to the end of the code. - inside the two declared code blocks that are bound to F and G. The circular-scope property is why the LABEL form permits you to bind more than one label: so that you can declare a group of mutually referential code blocks, together. Variable scoping works as follows. Every PROC and CODE block starts a completely empty variable scope (which means that it doesn't matter where in the program it occurs, in terms of variables), which is then immediately augmented by the parameters of the code block. A (:= var rhs) form defines VAR, which is visible to the end of the code block -- unless shadowed by a subsequent definition of a new variable with the same name. If we think of a PROC or CODE block as being a tree of IF's with TAIL/CALL/RETURN/GO jumps at the leaves of the tree, then a := form binds a variable at some internal node of the IF-tree, which is then visible to the subtree rooted at the := form. Additionally, a (:= var (CALL ...)) INSN resets the scope of following code to *just* VAR. This is odd and worth emphasizing: a subroutine call kills all prior register bindings. We are modelling the behavior of *registers* with our variables. For example: (block (:= x ($+ y 7)) (:= z (call (% reverse) numlist)) ;; Here, we cannot reference X or Y. Only Z. ...) In short, variables are always local to (a portion of) the PROC or CODE block in which they appear; we abandon all these local variables when execution arrives at some leaf of the IF-tree and jump to some other labelled code block; all live variables must be passed to that code block as parameters of the TAIL/CALL/RETURN/GO jump. All this parameter-passing is not expensive because we haven't done actual register allocation yet. When we do, we'll coalesce register assignments across calls, so the variable/variable copies that happen across TAIL/CALL/RETURN/GO jumps will mostly go away (that is, they are handled statically, by the compiler). Note that one way to look at what we've done is to say that we've made liveness analysis for variables a simple question of local scope. BLOCK forms are not scope boundaries. For example, this is a well formed procedure body: (proc (a b) ret-pc 0 (block (:= c ($+ a b))) ; The C defined in this block (return ret-pc c)) ; is visible after/outside the block. So you can use BLOCK forms promiscuously anywhere you want to package up a sequence of INSN's into a single compound INSN, or turn some INSN's and one INSN+ into a single compound INSN+. If you give these scoping principles a little thought, you'll see: - You can easily alphatise labels to ensure all labels in a program have distinct names. It's just lambda-calculus alphatisation, which you've already done, and which isn't very hard. - You can merge two adjacent LABEL declarations, although you might have to alpha-rename one set of labels if they overlap. - You can always pull LABEL bindings up the code tree. This works because variable scope is always local to its containing CODE or PROC block, so we can move these code blocks around without messing up variable scoping. This all means that, if you first alphatised your program's labels, you could simply yank all labelled code blocks from their declaration inside the code tree, up to top level. Your program would then look like (BLOCK (LABEL (lab1 code1) (lab2 code2) ...) . insn-seq+) where the INSN-SEQ+ would have no internal LABEL declarations at all. That is, your program would be (1) a set of labelled code blocks and (2) a "main" block to execute in the context of those labels. Alternatively, you could implement a pass to split LABEL binding sets into connected components, then push each binding set down the code tree as far as possible, until we reach - an IF split where both arms contain free references to a label in the binding set, or - a primitive INSN or INSN+ that directly references a label in the binding set. This will make an AM program that resembles the kind of control-flow graph built by compilers for first-order languages such as C or Fortran. Local loops, recursions and join points will show up as local, LABELed code blocks. You might consider the relationship of such an AST to the dominator tree for the program -- and what this might say about how to find loops for loop optimisations. This is all sensible and rigorous, but in practical terms you can use labels and variables in a simpler regime: just ensure you always use fresh identifiers and preserve alpha-uniqueness, and the issues of name capture become irrelevant. So, there are two ways you can proceed, in terms of name management: - Operate in an alpha-unique way, worry free. - Reuse names with abandon, exploiting the scoping rules, and be prepared to alpha-rename labels if you do transforms that move code blocks around. Are AM variables assigned (by side effect) or bound (as in a functional language)? The answer is: yes. That is to say: it doesn't matter. Think of it whichever way you prefer. ** Other ======== Why no (IF arg insn1 insn2) form in the INSN grammar? That would introduce implicit control joins at the end of INSN1 and INSN2 -- and all joins in AM are explicit jumps that pass values. Some compiler people refer to a tree of IF's where control only enters at the root of the tree as an "extended basic block" (EBB). AM's INSN+ non-terminal is similar, except that we permit arbitrary function calls inside the tree, so an INSN+ is not atomic and not guaranteed to terminate. (True basic blocks and extended basic blocks are an all-or-nothing kind of "macro-instruction.") ------------------------------------------------------------------------------- * Examples ---------- Here are some examples of code written in our target language. I have written these in a fully alphatised way to make the code easier to read. Note also that I use variable names like N-1 and A*I when I think that will make the code easier to read. ;;; Recursive factorial (block (label (fact (proc (n) k 2 ; 2 means 2-slot frame (:= b ($zero? n)) (if b (return k 1) (block (frame-store 0 n) ; Save N on frame (frame-store 1 k) ; Save ret pc (:= n-1 ($- n 1)) (:= a1 (call (% fact) n-1)) ; a1 = (fact n-1) (:= n2 (frame-ref 0)) ; Load N from frame (:= a2 ($* a1 n2)) ; a2 = n * (fact n-1) (:= k2 (frame-ref 1)) ; Load K from frame (return k2 a2)))))) ; n! => K (:= m ($read)) (:= ans (call (% fact) m)) ; Leave answer in ANS. (halt)) ;;; Iterative factorial, iterates by tail-calling LP. (block (label (fact (proc (n) k 0 ; 0 means empty frame ;; (LP I A K1) => (K1 (* I! A)) (label (lp (proc (i a) k1 0 (:= b ($zero? i)) (if b (return k1 a) (block (:= a*i ($* a i)) (:= i-1 ($- i 1)) (tail (% lp) i-1 a*i k1)))))) (tail (% lp) n 1 k)))) (:= m ($read)) (:= ans (call (% fact) m)) ; Leave answer in ANS. (halt)) ;;; Iterative factorial after contifying the loop. This is ;;; better, but your compiler doesn't know how to do this ;;; transform. (block (label (fact (proc (n) k 0 ; 0 means empty frame ;; (LP I A) => (K (* I! A)) (label (lp (code (i a k1) ; Loop is now CODE. (:= b ($zero? i)) (if b (return k1 a) (block (:= a*i ($* a i)) (:= i-1 ($- i 1)) (go (% lp) i-1 a*i k1)))))) (go (% lp) n 1 k)))) (:= m ($read)) (:= ans (call (% fact) m)) ; Leave answer in ANS. (halt)) ;;; Function composition. This Scheme code ;;; ;;; (let ((compose (lambda (f g) ;;; (lambda (x) (f (g x)))))) ;;; ... (compose fun1 fun2) ... (compose fun3 fun4) ...) ;;; ;;; compiles to the following, assuming COMPOSE is used in first-order ;;; ways. This is an example that is heavy on closure creation & unpacking. (block (label (lam21 (proc (clo1 x) k1 2 ; LAM21 is (lambda (x) ...) (:= f1 (tuple-ref clo1 1)) ; Fetch F from closure (:= g1 (tuple-ref clo1 2)) ; Fetch G from closure (frame-store f1 0) ; Save F on stack (frame-store k1 1) ; Save K1 on stack (:= g1-code (tuple-ref g1 0)) ; Call (G X) with (:= tmp (call g1-code g1 x)) ; closure protocol (:= f2 (frame-ref 0)) ; Load F off stack (:= k2 (frame-ref 1)) ; Load return address (:= f2-code (tuple-ref f2 0)) ; Tail call (F TMP) with (tail f2-code tmp k2))) ; closure protocol (compose (proc (f g) k2 0 (:= clo2 (closure (% lam21) f g)) (return k2 clo2)))) ... body of the LET, using COMPOSE, goes here ...) ------------------------------------------------------------------------------- * From CPS to AM ---------------- ** Call arguments ================= In CPS, a call argument can be a constant, a variable, or a lambda. Constants and variables are reasonably straightforward. If an argument is a FUN expression being used as a value (that is, not a first-order or open lambda), however, we must first create a closure tuple packaging up the lambda's code with the values of its free variables in a little prelude preceding the call; then we can do the call using a register whose value is this tuple as the actual argument. So if we had the CPS call (f (fun (x k1) (@ (label lam38)) ...) k2) where f is a first-order function, we'd produce this AM INSN+: (block (labels (lam38 (proc (clo38 x) k1 ...))) (:= proc38 (closure (% lam38) q y z)) (tail f proc38 k2)) where LAM38 is the label of the (FUN (X K1) ...) code, CLO38 is the extra argument for the closure tuple that will be passed to this code when it is called, and PROC38 is the variable/register holding the closure when we allocate it. The closure's first slot is the code pointer (% LAM38), and its next three slots hold the values of free variables Q, Y and Z. We produce the PROC form for the lambda by recursively compiling the CPS lambda. If an argument is a closed CONT form, e.g., (f x (cont (y) ...body...)) then we produce an AM := definition for Y, and carry on compiling the body of the CONT form in following code: (block (:= y (call f x)) ; Do the call. ...compiled body goes here...) ; Do this after the call. But wait! This is not enough. While F is running, before it returns to our call site, it might recursively run *this* code, thus clobbering our local registers, including any that are live across the (:= y (call f x)) call. This is why they are not in scope in the code following the call. So we have to - first save onto the stack any variables needed by the ...BODY... that will run when the F call returns, and - then make sure we compile the ...BODY... code with a symbol table indicating that these CPS variables correspond to AM stack-slot references, *not* AM variables. We know which variables need to be saved on the stack before the F call -- that is given by the FRAME+ annotation on the CONT form. So our CPS example above really looks like this (f x (cont (y) (@ (frame+ (a 3) (q 7))) ...body...)) and it is compiled to this AM code (block (frame-store a' 3) ; Save A at slot 3 of stack frame (frame-store q' 7) ; Save Q at slot 7 of stack frame (:= y (call f x)) ; Call F. ...compiled body ; Return here. A Q ref from here goes here...) ; forward is (FRAME-REF 7), not Q'. In AM, a "call argument" is one of the expressions that appear in TAIL/CALL/RETURN/GO control jumps, which are passed to the target PROC or CODE block and bound to its initial parameters. These call arguments are intended to be things already computed; the point of passing them is to give these things new names. Here are the kinds of things we can use as arguments in an AM call: - Constants: either a label reference such as (% fact-code), or a simple constant such as 17. - A reference to some data source: a register/variable or a memory location, e.g., X, (FRAME-REF 3) or (TUPLE-REF CLO 7). - A constant offset from some address fetched from a register or memory location, e.g., (OFFSET CLO 2), which means the address of the tuple that is the value of variable CLO, plus 2 words. The last case is how we convert a closure tuple allocated by a CPS LETREC into the different procedures that share that tuple. The offset produces the address of the slot in the tuple where the desired procedure's code pointer lives. So, if we had the CPS call to first-order function F (letrec ((g (fun ... a ...)) ; G closes over A (h (fun ... b ...))) ; H closes over B (f 3 g (fun (x k1) ($+ x a k1)) ; Closes over A, too. h k2)) we'd produce this AM code: (block (label (g-code (proc ...)) ; Compiled code for (fun ... a ...) (h-code (proc ...))) ; Compiled code for (fun ... b ...) ;; Make the shared closure for G & H. (:= gh-clo (closure (% g-code) ; Slot 0: G code pointer (% h-code) ; Slot 1: H code pointer a ; Slot 2: A b)) ; Slot 3: B ;; Now do the F call. First we have to close arg #3. (label (lam102 (proc (clo1 x) k1 0 (:= tmp ($+ x (tuple-ref clo1 1))) (return k1 tmp)))) (:= clo (closure (% lam102) a)) ; Close over A (tail f 3 (offset gh-clo 0) ; G (But, really, just use GH-CLO) clo ; The (fun (x k1) ...) closure (offset gh-clo 1)) ; H k2) ; Tail call -- F will return to K2. ** To compile a call ==================== There are multiple kinds of call. First, bear in mind that some CPS arguments will require producing pre-call code to heap-allocate closures or save a continuation's free vars on the stack. In the case of a FUN user lambda, you'll also have to produce a LABEL declaration for the FUN's compiled code, as described above. OK, onward to the actual calls. There are seven cases: we can call six different kinds of lambda {open, first-order, closed} x {user lambda, continuation} or we can call a primop. Each case is handled differently. - Open continuation call / beta-redex: ((cont (x) ...body...) arg) Here, we know that ARG must be a (FUN ...) form, not a constant or a variable, and that there is more than one reference to X in the body of the continuation (thanks to your simplifier). + If ARG is a first-order FUN In this case, compile ARG into a PROC form and name it with its label in a LABEL form. Compile the body of the continuation with an extended symbol table marking X as a first-order variable: (block (label (lab102 (proc ...arg's code goes here...))) ...compiled cont body here...) ; Calls to X will use (% lab102). Don't forget to add extra parameters to the PROC for ARG's extended free variables, which will be passed to the PROC from all calls to X. + If ARG is a closed FUN In this case, make a closure for ARG and assign it to a fresh variable/register. (block (label (lab102 (proc ...arg's code goes here...))) (:= clo (closure (% lab102) fv1 ...))) This code is followed by the code for the body of the continuation, translated using a symbol table that maps CPS variable X to AM variable CLO. - Open user-lambda call / beta-redex: ((fun (x k) ...body...) arg carg) This is handled the same way as above. [Fixme: Except for the impact on TAIL vs. CALL jumps. This is to be written up later.] - First-order user-procedure call: (f arg ... carg) + Tail call: (f x 37 k) A tail call is one where the continuation arg is a variable. Since tail calls don't return to the caller, this is the end of the local code sequence -- we're going to F, who will return to K. We're out of the picture. Thus, we're producing a leaf in the INSN+ IF-tree. This produces a CALL INSN+ to F's label. Since this is a first-order call, we must pass F values for its extended variables, too. Note that when we look up CPS variable K in the symbol table, we'll get an AM argument that represents the return address. The rest of K's closure is on the stack frame under the current one; the CALL form is responsible for popping the current frame so that K's frame will be at the tip of the stack when we jump to the continuation's code. Put all this together, and we might get this: (tail (% f-lab) x' 37 a z q k') where F-LAB is F's code label; X' is the AM variable we're using for X; A, Z & Q are extended free variables for F, and K' is the AM variable for K. How do we know that F is a first-order variable, that its AM code label is F-LAB, and which CPS source variables are in its EXTENDED set, so that we can pass their corresponding AM variable along in the call? All this information is stored in the symbol table as the entry for CPS variable F. + Non-tail call: (f x 37 ; Call F (cont (y) ; Return here. (@ (frame+ (b 3) (w 7))) ... b ... w ...)) This turns into a block: (block (frame-store b' 3) ; Close the continuation as (frame-store w' 7) ; directed by FRAME+ annotation. (:= y' (call (% f-lab) x' 37 a z q)) ; Do the call ... ; Cont's code goes here ) Here, we first save the continuation's variable needs on the stack, then do the call: B' and W' are the AM variables for the B and W CPS variables. Since this is a first-order call, we must pass F's extended free variable needs as extra parameters A, Z and Q. No return address is specified in the CALL form; it is implicitly the following code. When we compile the body of the (cont (y) ...) continuation to make the code following the F call, we have to use a symbol table updated to map B & W to stack slots (frame-ref 3) and (frame-ref 7), instead of AM variables B' and W' - First-order continuation call: (k arg ...) This turns into a GO to the label of K's associated CONT form. Don't forget to pass along all of K's extended variables as specified by its EXTENDED annotation: (go (% k-label) arg1' ... xv1 ...) ; XVi are the extended free vars - Primop call: ($+ x 3 (cont (z) ...)) or ($+ x 3 k) In the first case, where the primop's continuation is an explicit CONT form, this turns into a (:= z' ($+ x' 3)) INSN followed by the code for the body of the continuation produced with an updated symbol table, where z' and x' are the AM variables being used for the CPS variables x and z. The second case is like the first, but we assign a fresh temporary variable, which is then used in an immediately following RETURN or GO to K. (K might be a closure or a first-order continuation -- to call one of these, see other entries in this bullet list.) If we assume that K is not a first-order variable, the code would look like this: (block (:= tmp ($+ x' 3)) (return k' tmp)) If K is a first-order variable, the code would be (block (:= tmp ($+ x' 3)) (go (% k-label) tmp)) where K-LABEL is the code label for K. - Closed user-procedure call: (f arg ... carg) In this case, F is a pointer into a closure tuple. The AM code should extract the code address at offset 0 from the pointer and call it, passing the tuple as an extra, first argument. Here's a tail call (that is, one where CARG is a CPS variable) to the example procedure above: (block (:= f-code (tuple-ref f' 0)) (tail f-code f' arg' ... carg')) where F', ARG' and CARG' are the AM translations of F and the the call arguments. Alternately, the F call is non-tail, which means that CARG is some CONT form, (cont (x) ...). In this case, the call compiles into some stack saves, as indicated by the CONT's FRAME+ annotation, followed by an (:= x (call f-code f' arg' ...)) INSN, followed by the AM code for the body of the continuation, using a symbol table mapping those saved variables to frame loads. - Closed continuation call: (k arg ...) This is a no-kidding procedure return. In the corresponding AM code, the continuation variable will hold the return address. The AM code will be a RETURN INSN+: (return k' arg' ...) where K' and ARG' are the AM expressions corresponding to K and ARG. Of course, as usual, if any args are closed FUN forms, you'll first have to produce a chunk of AM code to bind labels to their compiled code and then allocate closures. ** Symbol table =============== As the CPS->AM compiler recursively walks the CPS tree translating it to AM, the translation is critically mediated by 1. the annotations your previous analysis phase sprinkled over the code, and 2. the symbol table passed down through the tree walk. The symbol table maps in-scope CPS source variables to information needed when we encounter references to them. - A first-order variable's entry gives its call protocol -- that is, the ordered list of extended-free CPS variables that must be passed to it when it is called, and the label of the AM PROC form to call. - A regular variable has these bits: + How to refer to it when including it in a closure record This is an AM ARG (see the AM grammar): an AM var, a FRAME-REF or a TUPLE-REF memory-load expression. It will never be a constant, a (% label) label reference, or an OFFSET form Why never an OFFSET form? Because you'll add the OFFSET when you use the closure tuple as a procedure value (that is, as an arg). + How to refer to it as an argument in a call (OR FUNCTION?) Again, this is an AM ARG, but if it is describing a LETREC-bound procedure that is part of a shared closure tuple, the AM term will include the specific offset needed for this particular procedure. As an example, suppose we have three closed functions bound by a CPS LETREC form to variables F, G and H (letrec ((f (fun ...)) (g (fun ...)) (h (fun ...))) ...) The AM code allocates a single closure tuple (:= fgh-clo (closure (% f-code) (% g-code) (% h-code) free-var1 free-var2 ...)) The symbol table that we use to compile the body of the LETREC says that CPS variables F, G, H all map to FGH-CLO *when we want to close code over F, G, or H* or *when we want to pass F, G, or H as an extended free var to a first-order procedure.* But! If a function has to close over both F *and* G, we'll just include the FGH-CLO in its closure tuple *once*. This is because in the first case, we're passing around *environment structure* -- the *bindings* of F, G and H. In the second case, we want the *values* of F, G and H that can be passed around the program (consed into lists, stored in hash tables, passed as arguments, returned as values and, of course, eventually called). In this case, we have to take the binding information and convert it into a value: add the right offset to the shared closure tuple. Because of this, your symbol table for general, non-first-order variables needs to provide two entries. When are [Var |--> FRAME-REF] entries added to the symbol table? Answer: when we compile code for a closed CONT form, as directed by the CONT's FRAME+ annotation. When are [Var |--> TUPLE-REF] entries added to the symbol table? Answer: when we compile code for a closed FUN form, as directed by the FUN's EXTENDED annotation. ** To compile a LETREC ====================== We have three pieces to handle: - a set of first-order bindings - a set of closure bindings - the body of the LETREC 1. Compile the RHS lambdas to a set of labelled PROC and CODE blocks. These go in a (LABEL ...) binding. - To compile a 1st-order FUN: Create fresh AM variables for the lambda's parameters *and* all its extended free variables -- these will be the parameters of the AM CODE or PROC form you produce. (But remember... just *one* AM variable for all extended free variables that are bound by the same LETREC.) (Including this LETREC itself, if it binds any closed lambdas.) Compile the body of the lambda with this symbol table. The symbol table also needs entries describing the call protocol (code label and extended free var set) for each first-order lambda bound by this LETREC. - To compile a closed FUN: Create fresh AM variables for the lambda's parameters, plus one more for the extra closure-tuple parameter that the caller will pass. Add entries to the symbol table for + The CPS lambda's parameters, which map to the corresponding AM variables. + The lambda's extended free variables, which map to TUPLE-REF expressions fetching the values from the closure-tuple parameter. Remember that when you fetch a value from the closure tuple CLO that is, itself, a closure tuple, CLO', that was created by some other, outer LETREC, you'll need to add an OFFSET to the CLO' value to make the actual procedure pointer, so it'll look something like (offset (tuple-ref clo 3) ; This fetches a shared closure tuple 2) ; This adds the offset needed for a ; specific procedure using this tuple. This offset adjustment is used for variables bound to closures by *this* LETREC, too -- just make an offset from the FUN's own closure tuple! Don't forget to adjust your offsets to account for the fact that the closure-tuple parameter doesn't point to the first word of the closure tuple; it points to the slot in the tuple that contains this lambda's code pointer. First-order variables bound by this LETREC are mapped to their calling protocol, as above. - To compile a 1st-order or closed CONT: Similar to the FUN cases, but a closed CONT fetches closed variables from the stack; it doesn't need an extra parameter for the closure tuple. Your compiler will never have to handle a CONT form appearing on the RHS of a LETREC binding unless you introduce some extra transforms ("contification"). So this is optional. 2. Following the (LABEL ...) that provides the code for all the RHS lambdas, you want a piece of code that allocates the single, shared closure tuple for all the closed RHS FUN's, if you have any: (:= clo (closure (% rhs-code1) (% rhs-code2) ... ...)) The CLOSURE form references code labels from the LABEL form you made in #1 above. 3. Now compile the body of the LETREC in an appropriately extended symbol table. ------------------------------------------------------------------------------- * BLOCK flattening ------------------ Write a simple transform that flattens nested BLOCK forms, e.g., (block insn1 (block insn2 insn3) (block insn4 insn+)) => (block insn1 insn2 insn3 insn4 insn+) You want to flatten the bodies of BLOCK, PROC and CODE forms. This is an easy pass to implement. If you use the BLOCK feature heavily for packaging up little snippets of code that then get dropped into other snippets, dumping all this structure after you've produced your code (or while you're doing so), might make your AM code much less ugly to read and debug. Alternately, you can ensure your INSN-SEQ and INSN-SEQ+ sequences are always flattened by writing "smart" constructors to flatten them as you create them -- for example: - A function (insns->block i1 ...) which makes a BLOCK form out of INSN I1, I2, ..., splicing in the contents of any argument which is itself a BLOCK form. - A function (insns+->block i1 ... i') which does the same with INSN I1, I2, ... and INSN+ I'. - A function (insn->seq i) which converts an INSN I into a sequence of INSNs: if I is a (BLOCK i1 ...), the answer is (list i1 ...); if I is not a BLOCK, the answer is (list i). - A similar function (insn+->seq i), which does the same for an INSN+. ------------------------------------------------------------------------------- * Looking back, looking forward ------------------------------- Stop and reflect on where the program is, at this point. You're below the level of C code. Your program is code, registers and blocks of memory. No lambda. What's left to do, to get something that's machine level? Here's the list: 1. Register allocation We're set up to do a coloring allocator, coalescing the var/var copies that happen at jumps to first-order lambda join points. 1a. Spilling We need some way of spilling, when our register allocator needs more physical registers than the machine has. There are two ways to do this. One, we can simply "create more registers," by setting aside a little bank of RAM as extra registers. (This block of statically managed memory will certainly reside in the on-cpu L0 cache, so we'll get some wins here, even if they won't be as fast as the true registers.) Two, we can do the standard chooose/edit/retry technique, where we choose spill candidates, force them out to a spill area on the stack frame, rewrite all uses of the variable so that every use is preceded by a load and every write of the variable is followed by a store, giving the variable tiny live ranges that don't intersect other variable live ranges. 1b. Eliminating duplicate frame and closure loads Once a variable is saved onto the stack (because it is live across a call), it is reloaded from the stack each time it is needed, over and over again. We could eliminate these redundant, repeated loads if we had some form of simple "common subexpression elimination" optimisation that worked for frame and closure loads, so that (block ... (:= x1 (frame-ref 7)) ... (:= x2 (frame-ref 7)) ...) was optimised by killing the X2 definition and replacing all references to X2 with X1. Note that this increases register pressure by extending the live range of X1, that is, by tying up whatever physical register we assign X1 across a larger span of code. So X1 may end getting spilled as a result... This can also be done with loads from closure tuples. The frame-slot case is complicated slightly because we *reuse* frame slots -- when a slot goes dead, it will be reused. In our simple, non-circular tree-walk "flow analysis," an assignment to a frame slot "kills" any register copy we might be tracking for that slot. This is not a problem for tracking loads from closure tuples, as their contents are never altered after being allocated. 1c. Callee-saves If you look at your produced code, you'll notice that a function call is assumed to kill all registers. Any variable live across a function call is saved to the stack before the call. This is basically a caller-saves management strategy. We can do better if we manage some registers (four seems to be the traditional number) using a callee-saves strategy. Appel discusses how to do this in *Compiling with Continuations* and also in separate papers that he published. It amounts to a simple strategy: continuation closures are "spread" across 6 registers/arguments: the return-PC parameter, four others used for the first four elements of the closure, and SP, the implicitly passed stack register, which points to the rest of the closure. You would alter the PROC form to be (PROC (var ...) (ret-pc kv1 kv2 kv3 kv4) frame-size . insn-seq+) and the continuation call protocol would be altered to pass KV1 through KV4 as the first four arguments to the continuation code: (CALL ret-pc kv1 kv2 kv3 kv4 arg1 ...) Now a leaf procedure, like the add5 function looks like this: (proc (n) (ret-pc kv1 kv2 kv3 kv4) 0 (:= n+5 ($+ n 5)) (return ret-pc kv1 kv2 kv3 kv4 n+5)) The KVi vars are passed from ADD5's caller through ADD5 and forward to the continuation code without ever going onto the stack. 2. Instruction selection for a specific processor This is pretty straightforward: the current IR is machine level, just not machine specific. You'll have to get specific about the details of your function-call protocol and stack management, which has been slightly abstracted in our current IR. 3. Contification If you write a tight loop in your source Scheme program, e.g., to compute factorial iteratively, your loop will be expressed by a LETREC-bound function that tail-calls itself. Your current compiler will render such an iteration as a tail call to a user procedure. So each time around the loop, the current frame will be popped and then re-pushed. It would be much better instead to demote the loop lambda from user procedure to continuation by arranging for the final return continuation not to be passed around and around the loop as a parameter. This is how you get continuation lambdas bound by LETRECs. MLton relies on this transform to get exactly the wins described above, for example. For example, here is iterative factorial before and after contification, in CPS: (fun (n ktop) (letrec ((lp (fun (a i k) ($zero? i (cont (z?) (if z? (k a) ($* a i (cont (a*i) ($- i 1 (cont (i-1) (lp a*i i-1 k))))))))))) (lp 1 n ktop))) (fun (n ktop) (letrec ((lp (cont (a i) ; No longer a FUN but a CONT ($zero? i (cont (z?) ; so no frame pop & push (if z? ; on each iteration; we just (ktop a) ; run on the PROC's (empty) ($* a i ; frame. (cont (a*i) ($- i 1 (cont (i-1) (lp a*i i-1))))))))))) (lp 1 n))) You can use simple first-order syntactic criteria or do a full flow analysis to spot these opportunities. There's a paper by the MLton people explaining how they do it that you should read, first. 4. Code linearisation Real machine code is a graph embedded into a linear sequence. In our current IR, that graph is embedded into a tree. In particular, machine-code conditionals either branch or fall through to the following instruction, while our current IR uses two-way IF forms. You need some way of linearising the code to use machine-code conditionals. Appel's textbook (chapter 8) shows a simple way to do this in the context of his Tiger compiler. LLVM tries harder to linearise code with better layouts, but it's the same problem. 5. Better IF Most processors have conditional branches that do more than test the true/false value of a register -- for example, on a RISC-V processor, you can compare two registers ra and rb and then branch if ra < rb, ra >= rb, ra <> rb, or a few other possibilities. We could produce better code if our CPS IF form directly reflected these possibilities. That is, we could extend IF to be something like (IF (rel arg1 arg2) prog prog) where REL is one of: < > <= >= <> =. 6 and up: - You're compiling Scheme, not SML, so you need run-time arity checks on function calls. And you'd like to optimise away cases where you know the arity check is satisfied at compile time. You could extend closure tuples to store arity in the slot just before the code pointer. Or you could put arity information into the low 10 bits of the procedure pointer and shift them away. Or you could require CODE and PROC forms to have an initial header with arity information. - Oh, and what about n-ary Scheme functions, where call induces list consing? In real Scheme, this is written as (lambda (x1 x2 . more-xs) ...) This function takes at least two arguments, which are bound to X1 and X2. Arguments #3 and following are consed into a list which is bound to MORE-XS. This is the standard. It's clearly painfully inefficient. Perhaps you could extend the language to support more efficient n-ary call protocols? (Dybvig has published alternative mechanisms. Racket provides them.) Maybe you could provide programmers the ability to define extensions of associative operators to n-ary functions, with special beta-reduction rules, using a magic ASSOCIATIVE-BINARY keyword, e.g. (define + (associative-nary binary+ 0)) (define * (associative-nary binary* 1)) where the compiler will convert (+ x y z) => (binary+ (binary+ x y) z) (+ x y) => (binary+ x y) (+ x) => x (+) => 0 Oops, now you've drifted from writing a compiler into language design (influenced by your knowledge of how compilers work). This is how it starts. When you start publishing papers in ICFP, it will be time to admit you have a language problem and consider seeking a support group and possibly entering rehab. - Your standard function-call protocol is to pass all arguments in the registers. This is clearly impossible, if you call a function with more parameters than the machine has registers (unless you do what Orbit did, and have an "extended register set" out in RAM for just this purpose). Change the protocol to pass the first four parameters in registers and the remaining ones on the stack. - Flow analysis? Of course! - Factor LETREC bindings into strongly connected components You need to do this to prevent space leaks from shared closures. - Web-based function-call protocol specialisation - You're missing out on the power of Scheme if you don't have a general, hygienic macro front end. Go figure out how to do that. - Don't forget: real Scheme has call/cc... General continuations involve copying the entire stack -- very heavyweight. Perhaps you could do some analysis to spot the cheap, easy cases? (For example, should throwing out of the middle of a tight loop really require saving and restoring the *entire stack* to/from the heap? That's crazy talk.) Perhaps you could redesign your compilation strategies and run-time stack representations to make continuation capture incremental? See work by Dybvig for more. - Your numeric-intensive codes would run a *lot* faster if you could figure out how to transform your loops to use MMX/SSE/AVX SIMD instructions on the processor. How hard could that be? - And what about the garbage collector and tagged data representations? - Or you could just punt lots of the above off to LLVM. - Why are you compiling Scheme, anyway? Wouldn't you be happier with a polymorphic static type system? - I could go on...