CS 6983 Special Topics: The Technology of Lambda
Olin Shivers

Homework 2: Getting started with CPS
Due 2025/2/5 midnight by email

This assignment has you go forth and work with the CPS conversion
algorithm, both in a toy Scheme converter, and then with a more
serious nanopass converter.

* Read up on CPS
----------------
See the course bibliography for papers.

---------------------------------------------------------------------------
* Scheme sexp grammar
---------------------
Here is a grammar for basic Scheme. In the CFG,
- keywords are in upper-case;
- syntactic meta-variables are lower case;
- "<item> ..." means a sequence of zero or more <item>s.

exp ::= const | var
      | (IF exp exp exp)
      | (LETREC ((var lam) ...) exp)
      | (fun exp ...)

fun ::= primop | exp
lam ::= (LAMBDA (var ...) exp)

Constants are #t, #f, numbers and strings.

In our simple sexp concrete grammar, different symbols represent
different things: keywords, primops, and variables.

- keywords
  The symbols IF, LAMBDA, LETREC are used as keywords identifying
  different forms. These symbols may not be used for variable or
  primop names.

- primops
  A primop is basically a named constant function known to the
  compiler. In our concrete sexp grammar, we'll use symbols beginning
  with a dollar sign for primops.

- variables
  A variable is any symbol that is not a keyword or a primop.

Note that we only permit primops in function-call position. (If you
need a higher-order addition function, just eta-expand the primop:
(lambda (x y) ($+ x y)).

Here's recursive factorial, in our Scheme:

    (lambda (n)
      (letrec ((f (lambda (m)
                    (if ($zero? m) 1
                        ($* m (f ($- m 1)))))))
        (f n)))

or
  
    (letrec ((f (lambda (m)
                  (if ($zero? m) 1
                      ($* m (f ($- m 1)))))))
      f)

** Alternate Scheme grammar designs
===================================
There are a number of ways to skin the cat here. We are prioritising
simple and easy to handle over syntactically "pretty." Here are some
alternative ways we could have designed things. Our focus is not on
these front-end syntactic design issues.

- Do we really need primops in our source language? Not really.

  We could just have the compiler process the code assuming an
  initial, top-level environment binding, say, variable + to
    (lambda (x y) ($+ x y))
  The programmer writes code using +, not $+, and the compiler
  will then replace all references to the variable with its
  value (lambda (x y) ($+ x y)). So this
    (f (+ q 3))
    => (f ((lambda (x y) ($+ x y)) q 3))
    => (f ($+ q 3))
  So all first-order direct calls to + would get (easily) transformed 
  to $+, while all remaining uses of + as an argument would be left
  as (lambda (x y) ($+ x y)).

  Or we could do something similar with our (forthcoming) CPS
  representation of the program -- it's still lambda calculus,
  we still get to do all the lambda things.
  
  I put first-order primops into the game early simply so we'd
  aware that they're around, and where they fit into the picture.

- We could have used one namespace for primops and variables, and
  required the parser to use lexical scope to determine which are
  which. Then our compiler would simply begin the parse with an
  initial, top-level symbol table saying that top-level references
  to +, *, CONS, etc. are primops.

- If we wanted general sexp constants, including lists, we could add
  a (QUOTE sexp) production to our grammar. (Feel free to do this, if
  you wish.)

- We could have made the function-call form use a keyword, e.g.,
    (call $+ 3 (call f 2 5))
  instead of making it the "no keyword" default
    ($+ 3 (f 2 5))
  This would grammatically separate the namespace of keywords and
  variables/primops, so we could, for example, have LAMBDA be both
  a keyword and a variable. It makes simple functional code tedious
  to write, however, so we aren't doing that.

- Likewise, we could have a keyword form to mark primop calls, e.g.
    (call-prim *
               (f 8)
	       (call-prim + 3 5))
  and then have a distinct (and fixed) namespace for primops.

None of these different approaches are particularly interesting from
the compiler's point of view, so we aren't going to obsess this kind
of syntactic-sugar fine tuning of the source language.

---------------------------------------------------------------------------
* CPS sexp grammar
------------------
Here is a grammar for a "factored CPS" that includes an IF form
instead of using a two-continuation conditional primop. This means all
function calls take exactly one continuation, no more, no less, which
simplifies things. (If we wanted to handle more ambitious control
features, such as exceptions, we'd have to revisit this decision.)

By "factored," we mean that the worlds of syntax and value are both
split into "user" syntax (variables, calls, lambdas, primops) and
values, and "continuation" syntax and values.

We use the keyword FUN instead of LAMBDA (1) to make it obvious
when we're looking at CPS instead of source Scheme, and (2) because
it's only three characters instead of six, and we're going to have
a lot of these.

The keyword CONT is for writing down continuation functions. (We'll
informally refer to LAMBDA, FUN and CONT terms as "lambdas.")

- All user lambdas (that is, FUN terms) and calls take an arbitrary
  number (zero or more) of user arguments and exactly one continuation. 

- All continuation lambdas (that is, CONT terms) and calls take an
  arbitrary number (zero or more) of user arguments and no continuation.

f ::= primop   | uvar | lam	; Function position of call
a ::= const    | uvar | lam	; Arg position of call
k ::= primcont | cvar | cont	; Continuation position of call

cont ::= (CONT (uvar ...)      prog)	; Continuation lambda
lam  ::= (FUN  (uvar ... cvar) prog)	; User lambda

prog ::= (f a ... k)		; User function call
     |   (k a ...)		; Continuation call
     |   (IF a prog prog)	; Conditional
     |   (LETREC (lr-clause ...); Circular scope
	   prog)

lr-clause ::= (uvar lam)
	  |   (cvar cont)

The LETREC form can bind both continuation and user functions. But now
that you've seen LETREC's place in things, forget about it for a
little while. Let's work first with a LETREC-free source and CPS
grammar, and go back to LETREC later.

So a function body (a PROG) is basically a tree of IF's terminating
at function-call leaves, with LETREC's sprinkled throughout to
introduce needed circular-environment function bindings.

Racket symbols are used in the sexp concrete syntax for multiple
things; as in our Scheme grammar, we partition these symbols into
different sets for syntactic purposes:

- Keywords
  The symbols CONT, FUN, IF, LETREC are keywords and can't be
  used for anything else.

- Primops
  A symbol that begins with a dollar sign is a primop.
  The one continuation primop, '$halt, means "halt the program."

- Continuation variables
  A symbol beginning with the letter "k" is a continuation variable.

- User variables
  A symbol that is not a keyword, primop or continuation variable
  can be used as a user variable.

** Examples
===========
Here is recursive factorial in this grammar:

(fun (n ktop)
  (letrec ((f (fun (m k)
                ($zero? m (cont (z)
		            (if z
			        (cont () (k 1))
				(cont () ($- m 1
				             (cont (m-1)
					       (f m-1
					          (cont (a) ($* a m))))))))))))
    (f n ktop)))

Here is iterative factorial:

(fun (n ktop)
  (letrec ((f (fun (m a k)
                ($zero? m (cont (z)
		            (if z
			        (cont () (k a))
				(cont () ($- m 1
				             (cont (m-1)
					       ($* a m
                                                   (cont (a*m)
                                                     (f m-1 a*m k))))))))))))
    (f n 1 ktop)))

Interesting: here is a variant of iterative factorial that reduces F
to a continuation KF. It exploits the fact that in the F loop above, K
is always bound to KTOP. So we just alter the (K A) call on line 5 to
be (KTOP A) and stop passing K around the loop over and over:

(fun (n ktop)
  (letrec ((kf (cont (m a)	; Look! No continuation parameter
                 ($zero? m (cont (z)
		             (if z
			         (cont () (k a))
			 	 (cont () ($- m 1
				            (cont (m-1)
					      ($* a m
                                                  (cont (a*m)
                                                    (kf m-1 a*m))))))))))))
    (kf n 1)))

Since FUN forms *must* take a continuation parameter, but CONT forms
do *not*, we have to "demote" FUN F to CONT KF.

How might we spot this opportunity and do the transform? We could do
a flow analysis; we'll talk about this later in the semester.

** Design rationale
===================
Partitioning the space of variable names into "user" and
"continuation" names means that we can go keyword-free for
*both* user-function and continuation-function calls: the
F and K non-terminals are syntactically distinct, so the
   (f a ... k) ; user function call
   (k a ...)   ; continuation call
productions are therefore also distinct.

-------------------------------------------------------------------------------
* A testing interpreter
-----------------------

You have been provided an interpreter for the CPS language, written
in Racket. It is intended for two purposes:

- To help you in debugging CPS code.

- To give you a simple "machine model" for CPS lambda calculus.
  Understand the interpreter and you have a semantics for the language.

Study the interpreter. Then write the LETREC part of the code, which I
have left out: function EVAL-PROG/LETREC. Hint: you will need to use
side effects to either closure or environment structure to create the
proper closure structure. You will see that the code as supplied marks
the "env" field of both user and continuation closures to make this
possible, but you can also leave closures immutable and instead mutate
binding structure if you wish.

---------------------------------------------------------------------------
* A first converter
-------------------
Write a CPS converter that fixes the issues of the ones we did in class:

- Function calls can take an arbitrary number of arguments.
  Hint: you'll need an auxiliary function CPS* that takes a *list*
  of expressions to convert.

- We don't want code explosion from replicating large continuation terms
  down both sides of an IF.

Write it in Scheme. Write some low-tech sexp term recognisers and
"destructor" functions, as we did in class (lambda-exp?, lambda:vars,
if:test, and so forth).

Don't deal with LETREC; it will be easy to add after you've gotten
this basic one written.

---------------------------------------------------------------------------
* Extended converter
--------------------
Add LET and LETREC to your Scheme grammar and LETREC to your CPS grammar.

exp ::= ...
      | (LET ((var exp) ...) exp)

Extend your CPS converter.

By the way, note that your CPS converter will *never* produce a CPS
LETREC that binds continuations; as described above, we'd need some
static analysis to spot opportunities to transform CPS code to produce
such bindings.

(If you asked Sabry & Felleisen, they'd point out that the CPS
converter will never produce *lots* of legal CPS programs from
direct-style Scheme -- nor would you ever get these programs as
intermediate forms as you "executed" the CPS program. So that
represents some redundancy in the notation... unless you are doing
fancy transformations.)

* CPS in Nanopass
-----------------
Define your two (extended) languages in nanopass. Implement your
CPS converter in nanopass.

* What's next
-------------
Next, we're going to do a simple analysis (scope and free variables),
and then a basic simplifier/optimiser that mostly does beta-reduction
and other forms of constant folding.