CS 6983 Special Topics: The Technology of Lambda Olin Shivers Homework 2: Getting started with CPS Due 2025/2/5 midnight by email This assignment has you go forth and work with the CPS conversion algorithm, both in a toy Scheme converter, and then with a more serious nanopass converter. * Read up on CPS ---------------- See the course bibliography for papers. --------------------------------------------------------------------------- * Scheme sexp grammar --------------------- Here is a grammar for basic Scheme. In the CFG, - keywords are in upper-case; - syntactic meta-variables are lower case; - " ..." means a sequence of zero or more s. exp ::= const | var | (IF exp exp exp) | (LETREC ((var lam) ...) exp) | (fun exp ...) fun ::= primop | exp lam ::= (LAMBDA (var ...) exp) Constants are #t, #f, numbers and strings. In our simple sexp concrete grammar, different symbols represent different things: keywords, primops, and variables. - keywords The symbols IF, LAMBDA, LETREC are used as keywords identifying different forms. These symbols may not be used for variable or primop names. - primops A primop is basically a named constant function known to the compiler. In our concrete sexp grammar, we'll use symbols beginning with a dollar sign for primops. - variables A variable is any symbol that is not a keyword or a primop. Note that we only permit primops in function-call position. (If you need a higher-order addition function, just eta-expand the primop: (lambda (x y) ($+ x y)). Here's recursive factorial, in our Scheme: (lambda (n) (letrec ((f (lambda (m) (if ($zero? m) 1 ($* m (f ($- m 1))))))) (f n))) or (letrec ((f (lambda (m) (if ($zero? m) 1 ($* m (f ($- m 1))))))) f) ** Alternate Scheme grammar designs =================================== There are a number of ways to skin the cat here. We are prioritising simple and easy to handle over syntactically "pretty." Here are some alternative ways we could have designed things. Our focus is not on these front-end syntactic design issues. - Do we really need primops in our source language? Not really. We could just have the compiler process the code assuming an initial, top-level environment binding, say, variable + to (lambda (x y) ($+ x y)) The programmer writes code using +, not $+, and the compiler will then replace all references to the variable with its value (lambda (x y) ($+ x y)). So this (f (+ q 3)) => (f ((lambda (x y) ($+ x y)) q 3)) => (f ($+ q 3)) So all first-order direct calls to + would get (easily) transformed to $+, while all remaining uses of + as an argument would be left as (lambda (x y) ($+ x y)). Or we could do something similar with our (forthcoming) CPS representation of the program -- it's still lambda calculus, we still get to do all the lambda things. I put first-order primops into the game early simply so we'd aware that they're around, and where they fit into the picture. - We could have used one namespace for primops and variables, and required the parser to use lexical scope to determine which are which. Then our compiler would simply begin the parse with an initial, top-level symbol table saying that top-level references to +, *, CONS, etc. are primops. - If we wanted general sexp constants, including lists, we could add a (QUOTE sexp) production to our grammar. (Feel free to do this, if you wish.) - We could have made the function-call form use a keyword, e.g., (call $+ 3 (call f 2 5)) instead of making it the "no keyword" default ($+ 3 (f 2 5)) This would grammatically separate the namespace of keywords and variables/primops, so we could, for example, have LAMBDA be both a keyword and a variable. It makes simple functional code tedious to write, however, so we aren't doing that. - Likewise, we could have a keyword form to mark primop calls, e.g. (call-prim * (f 8) (call-prim + 3 5)) and then have a distinct (and fixed) namespace for primops. None of these different approaches are particularly interesting from the compiler's point of view, so we aren't going to obsess this kind of syntactic-sugar fine tuning of the source language. --------------------------------------------------------------------------- * CPS sexp grammar ------------------ Here is a grammar for a "factored CPS" that includes an IF form instead of using a two-continuation conditional primop. This means all function calls take exactly one continuation, no more, no less, which simplifies things. (If we wanted to handle more ambitious control features, such as exceptions, we'd have to revisit this decision.) By "factored," we mean that the worlds of syntax and value are both split into "user" syntax (variables, calls, lambdas, primops) and values, and "continuation" syntax and values. We use the keyword FUN instead of LAMBDA (1) to make it obvious when we're looking at CPS instead of source Scheme, and (2) because it's only three characters instead of six, and we're going to have a lot of these. The keyword CONT is for writing down continuation functions. (We'll informally refer to LAMBDA, FUN and CONT terms as "lambdas.") - All user lambdas (that is, FUN terms) and calls take an arbitrary number (zero or more) of user arguments and exactly one continuation. - All continuation lambdas (that is, CONT terms) and calls take an arbitrary number (zero or more) of user arguments and no continuation. f ::= primop | uvar | lam ; Function position of call a ::= const | uvar | lam ; Arg position of call k ::= primcont | cvar | cont ; Continuation position of call cont ::= (CONT (uvar ...) prog) ; Continuation lambda lam ::= (FUN (uvar ... cvar) prog) ; User lambda prog ::= (f a ... k) ; User function call | (k a ...) ; Continuation call | (IF a prog prog) ; Conditional | (LETREC (lr-clause ...); Circular scope prog) lr-clause ::= (uvar lam) | (cvar cont) The LETREC form can bind both continuation and user functions. But now that you've seen LETREC's place in things, forget about it for a little while. Let's work first with a LETREC-free source and CPS grammar, and go back to LETREC later. So a function body (a PROG) is basically a tree of IF's terminating at function-call leaves, with LETREC's sprinkled throughout to introduce needed circular-environment function bindings. Racket symbols are used in the sexp concrete syntax for multiple things; as in our Scheme grammar, we partition these symbols into different sets for syntactic purposes: - Keywords The symbols CONT, FUN, IF, LETREC are keywords and can't be used for anything else. - Primops A symbol that begins with a dollar sign is a primop. The one continuation primop, '$halt, means "halt the program." - Continuation variables A symbol beginning with the letter "k" is a continuation variable. - User variables A symbol that is not a keyword, primop or continuation variable can be used as a user variable. ** Examples =========== Here is recursive factorial in this grammar: (fun (n ktop) (letrec ((f (fun (m k) ($zero? m (cont (z) (if z (cont () (k 1)) (cont () ($- m 1 (cont (m-1) (f m-1 (cont (a) ($* a m)))))))))))) (f n ktop))) Here is iterative factorial: (fun (n ktop) (letrec ((f (fun (m a k) ($zero? m (cont (z) (if z (cont () (k a)) (cont () ($- m 1 (cont (m-1) ($* a m (cont (a*m) (f m-1 a*m k)))))))))))) (f n 1 ktop))) Interesting: here is a variant of iterative factorial that reduces F to a continuation KF. It exploits the fact that in the F loop above, K is always bound to KTOP. So we just alter the (K A) call on line 5 to be (KTOP A) and stop passing K around the loop over and over: (fun (n ktop) (letrec ((kf (cont (m a) ; Look! No continuation parameter ($zero? m (cont (z) (if z (cont () (k a)) (cont () ($- m 1 (cont (m-1) ($* a m (cont (a*m) (kf m-1 a*m)))))))))))) (kf n 1))) Since FUN forms *must* take a continuation parameter, but CONT forms do *not*, we have to "demote" FUN F to CONT KF. How might we spot this opportunity and do the transform? We could do a flow analysis; we'll talk about this later in the semester. ** Design rationale =================== Partitioning the space of variable names into "user" and "continuation" names means that we can go keyword-free for *both* user-function and continuation-function calls: the F and K non-terminals are syntactically distinct, so the (f a ... k) ; user function call (k a ...) ; continuation call productions are therefore also distinct. ------------------------------------------------------------------------------- * A testing interpreter ----------------------- You have been provided an interpreter for the CPS language, written in Racket. It is intended for two purposes: - To help you in debugging CPS code. - To give you a simple "machine model" for CPS lambda calculus. Understand the interpreter and you have a semantics for the language. Study the interpreter. Then write the LETREC part of the code, which I have left out: function EVAL-PROG/LETREC. Hint: you will need to use side effects to either closure or environment structure to create the proper closure structure. You will see that the code as supplied marks the "env" field of both user and continuation closures to make this possible, but you can also leave closures immutable and instead mutate binding structure if you wish. --------------------------------------------------------------------------- * A first converter ------------------- Write a CPS converter that fixes the issues of the ones we did in class: - Function calls can take an arbitrary number of arguments. Hint: you'll need an auxiliary function CPS* that takes a *list* of expressions to convert. - We don't want code explosion from replicating large continuation terms down both sides of an IF. Write it in Scheme. Write some low-tech sexp term recognisers and "destructor" functions, as we did in class (lambda-exp?, lambda:vars, if:test, and so forth). Don't deal with LETREC; it will be easy to add after you've gotten this basic one written. --------------------------------------------------------------------------- * Extended converter -------------------- Add LET and LETREC to your Scheme grammar and LETREC to your CPS grammar. exp ::= ... | (LET ((var exp) ...) exp) Extend your CPS converter. By the way, note that your CPS converter will *never* produce a CPS LETREC that binds continuations; as described above, we'd need some static analysis to spot opportunities to transform CPS code to produce such bindings. (If you asked Sabry & Felleisen, they'd point out that the CPS converter will never produce *lots* of legal CPS programs from direct-style Scheme -- nor would you ever get these programs as intermediate forms as you "executed" the CPS program. So that represents some redundancy in the notation... unless you are doing fancy transformations.) * CPS in Nanopass ----------------- Define your two (extended) languages in nanopass. Implement your CPS converter in nanopass. * What's next ------------- Next, we're going to do a simple analysis (scope and free variables), and then a basic simplifier/optimiser that mostly does beta-reduction and other forms of constant folding.