CS 6983 Special Topics: The Technology of Lambda Olin Shivers Homework 3: Writing a lambda simplifier Due 2025/2/29 midnight by email * Part 1: Symbol table ---------------------- Implement a functional/persistent symbol -> number dictionary. You can implement a simple one using alists, or you can build a more efficient one using a balanced-tree (e.g., red-black tree) dictionary. Feel free to use existing dictionary implementations already built for Racket, if you like. This is not a sophomore-year class on data structures. A symbol->number dictionary has this interface: ;;; The empty symbol table (empty-symtab) ;;; A new symbol table just like SYMTAB, but symbol SYM maps to number I. (update-symtab symtab sym i) ;;; Returns the number to which SYM maps, or #f if no entry. (lookup-sym symtab sym) ;;; Note that this function doesn't provide any functionality you ;;; can't get from LOOKUP-SYMTAB, but we provide it so that you ;;; can write clearer code. (symtab-contains? symtab sym) ; Returns #f or some true value. ;;; Here are two useful utilities, written for you: ;;; Apply function F to the entry kept for SYM in the table. (define (modify-entry symtab sym f) (update-symtab symtab (f (lookup-sym symtab sym)))) ;;; Add DELTA to SYM's entry in the table. "No entry" is treated as 0. ;;; Returns the updated table. (define (incr-entry symtab sym delta) (modify-entry symtab sym (lambda (x) (+ (or x 0) delta)))) ------------------------------------------------------------------------------- * Part 2: Alphatisation ----------------------- Write a nanopass transform that assigns every variable in your program a unique name. Making unique names on demand is pretty easy. Here is code that will do it: (define *gensym-counter* 0) (define (reset-gensym-counter!) (set! *gensym-counter* 0)) (define (new-name base) (let ((name (string->symbol (string-append (symbol->string base) "." (number->string *gensym-counter*))))) (set! *gensym-counter* (add1 *gensym-counter*)) name)) But feel free to work a little more to make the names not too ugly -- e.g., it would be nice if your transform alphatised this (fun (x k) (letrec ((x (fun (y k) ($+ y 3 k)))) (x 8))) into this (fun (x.1 k.1) (letrec ((x.2 (fun (y.1 k.2) ($+ y.1 3 k.2)))) (x.2 8 k.1))) Note that the X's are numbered independently of the Y's and the K's. - You can use your symbol->number dictionary for this cosmetic dot-number trick. Or you can expend a bit more cleverness and arrange for your transform to be idempotent. That is, applying your transform to a term where the variables are *already* alpha-unique doesn't alter the term at all. This is all cosmetic stuff intended to help keep your transformed code as human-readable as possible, which is handy for a limited-effort, one-semester compiler. ------------------------------------------------------------------------------- * Part 3: Variable census ------------------------- Write a "census" analysis, which returns a dictionary mapping every variable in a program term with the number of times it is referenced. This should only be applied to a term that has been "alphatised," so that all its variables have unique names. Write an analysis that computes a list of the free variables in a term. (It's not particularly efficient, but you can make use of the LSET-... functions from the SRFI-1 list library for doing set operations on lists.) ------------------------------------------------------------------------------- * Part 4: Substitution ---------------------- Write a substitution function for lambda terms: (substitute term var new) returns the tree TERM except that every free occurence of VAR in TERM is replaced with NEW. You might think that you can exploit the alpha-uniqueness property of your terms, post-alphatisation, to simplify and speed up this function. But... it's not that simple. - Suppose TERM and NEW are both subtrees of a larger term, PROG -- that is, we're recurring down inside PROG, and have decided to do a substitution. If there is a free reference to VAR down in TERM, then when we are done, our result will have NEW where that reference occured. But NEW may still occur elsewhere in PROG. So now we have two copies of NEW in the new version of PROG we are recursively constructing. This would violate alpha-uniqueness! Every variable locally bound in NEW would now have multiple binders. In such a case, every time we drop a copy of NEW into place inside TERM, we have to sweep over NEW and make sure the copied version binds fresh local identifiers. - If we do a substitution of NEW into TERM and TERM contains *multiple* references to VAR, then, again, we'll have to "freshen" up each copy of NEW we drop into place -- each copy has to have distinct bound vars. To that end, you should write a FRESHEN function that walks a term like NEW and makes a copy whose bound vars are *globally* unique across the whole program. (So your FRESHEN function will need access to the global state you maintain for your make-a-fresh-identifier machinery in the alphatisation pass.) You might also consider writing a special-case variant of SUBSTITUTE, (substitute/once term var new) which can *only* be called if - TERM contains exactly *one* free reference to VAR, and - we know that NEW will only appear inside TERM when done. In other words, SUBSTITUTE/ONCE can be used to *move* NEW from one place in the program some other place. No freshening necessary. This can make one-ref beta reductions much simpler and faster. We have seen in class how to do all this efficiently (with the uplinked BUBS data structure), but you can do it simply and not worry about speed. ------------------------------------------------------------------------------- * Part 5: Simplify ------------------ Enough preliminaries. Now we are ready to do simplification. Write a set of nanopass transforms that will simplify a term in CPS form. Your simplifier should do the following: - Any beta redexes should be reduced if + the bound variable is referred to only 0 or 1 times, or + the redex's substituend (the "argument" part of the redex) is a variable or a constant. - Eta redexes should be reduced - Constant fold arithmetic primops - Constant fold IF expressions with constant tests. - Propagate information determined by IF tests. That is, (if a (if a 4 5) (f a 6)) should simplify to (if a (if #t 4 5) (f #f 6)) which a later pass will constant-fold to (if a 4 (f #f 6)) - Want to do more? Fee free... Your simplifier should set a mutable CHANGED? boolean any time it finds a hit; your top level simplifier should iterate all the simplification passes until no change. ------------------------------------------------------------------------------- * Part 6: Anchor pointing ------------------------- Consider this piece extra credit, as it is a little challenging. Arrange for your simplifier to do "anchor pointing" -- that is, the IF-of-an-IF transform. In non-cps, direct style, we want to transform (if (if e1 e2 e3) e4 e5) to (let ((thunk4 (lambda () e4)) (thunk5 (lambda () e5))) (if e1 (if e2 (thunk4) (thunk5)) (if e3 (thunk4) (thunk5)))) Doing this in CPS will take a little thought. Note that the anti-code-replication thunks THUNK4 & THUNK5 may be eligible for reduction if expressions E4 or E5 are small. ------------------------------------------------------------------------------- * Part 7: Demo -------------- Show off a little. Submit some tests showing the combined power of these simple optimisations on some code of your choosing. ------------------------------------------------------------------------------- * Suggestion ------------ You can test your simplifier on large terms using the CPS interpreter to run both the "before" and "after" programs.