This repository has been archived on 2022-08-10. You can view files and clone it, but cannot push or open issues or pull requests.
chez-openbsd/nanopass/doc/user-guide.stex

2753 lines
106 KiB
Text
Raw Permalink Normal View History

2022-07-29 15:12:07 +02:00
\documentclass[letterpaper,10pt,oneside]{book}
\usepackage{fullpage}
\usepackage{scheme}
\usepackage[pdftitle="Nanopass Framework Users Guide",
pdfauthor="Andrew W. Keep",
pdfdisplaydoctitle]{hyperref}
\title{Nanopass Framework Users Guide\thanks{This documentation is largely
extracted from Chapter 2 of my dissertation~\cite{keep-phdthesis-2013}.
The user guide has been updated to reflect recent updates the nanopass
framework.
Several example passes and languages have also been replaced with a more
recent, publicly available example compiler.}}
\author{Andrew W. Keep}
\def\TODO#1{{\textcolor{red}{#1}}}
\newcommand{\dash}[1][1em]{\raise.5ex\hbox to #1{\leaders\hrule\hfil}}
\mathchardef\mhyphen="2D
\parskip 6pt
\parindent 0pt
\begin{document}
\maketitle
\chapter{Introduction} % 2.1
The nanopass framework is an embedded DSL for writing compilers.
The framework provides two main syntactic forms: \scheme{define-language} and
\scheme{define-pass}.
The \scheme{define-language} form specifies the grammar of an intermediate
language.
The \scheme{define-pass} form specifies a pass that operates over an input
language and produces another, possibly different, output language.
\section{A Little Nanopass Framework History}
The idea of writing a compiler as a series of small, single-purpose passes
grew out of a course on compiler construction taught by Dan
Friedman in 1999 at Indiana University.
The following year, R. Kent Dybvig and Oscar Waddell joined Friedman
to refine the idea of the {\it micropass compiler} into a set of assignments
that could be used in a single semester to construct a compiler for a subset of
Scheme.
The micropass compiler uses an S-expression pattern matcher
developed by Friedman to simplify the matching and rebuilding of language terms.
Erik Hilsdale added a support for
catamorphisms~\cite{Meijer:1991:FPB:645420.652535} that provides a more
succinct syntax for recurring
into sub-terms of the language, which further simplified pass development.
Passes in a micropass compiler are easy to understand, as each pass is
responsible for just one transformation.
The compiler is easier to debug when compared with a traditional compiler
composed of a few, multi-task passes.
The output from each pass can be inspected to ensure that it meets grammatical and
extra-grammatical constraints.
The output from each pass can also be tested in the host Scheme system to ensure
that the output of each pass evaluates to the value of the initial expression.
This makes it easier to isolate broken passes and identify bugs.
The compiler is more flexible than a compiler composed of a few, multi-task passes.
New passes can easily be added between existing passes, which allows
experimentation with new optimizations.
In an academic setting, writing compilers composed of many, single-task passes
is useful for assigning extra compiler passes to
advanced students who take the course.
Micropass compilers are not without drawbacks.
First, efficiency can be a problem due to pattern-matching overhead and the
need to rebuild large S-expressions.
Second, passes often contain boilerplate code to recur through otherwise
unchanging language forms.
For instance, in a pass to remove one-armed \scheme{if} expressions, where only
the \scheme{if} form changes, other forms in the language must be
handled explicitly to locate embedded \scheme{if} expressions.
Third, the representation lacks formal structure.
The grammar of each intermediate language can be documented in comments, but
the structure is not enforced.
The \scheme{define-language} and \scheme{define-pass} syntactic forms are used
by the nanopass framework to address these problems.
A \scheme{define-language} form formally specifies the grammar of an
intermediate language.
A \scheme{define-pass} form defines a pass that operates on one language and
produces output in a possibly different language.
Formally specifying the grammar of an intermediate language and writing passes
based on these intermediate languages
allows the nanopass framework to use a record-based
representation of language terms that is more efficient than the S-expression
representation, autogenerate boilerplate code to recur
through otherwise unchanging language forms, and generate checks to verify that
the output of each pass adheres to the output-language grammar.
The summer after Dybvig, Waddell, and Friedman taught their course, Jordan
Johnson implemented an initial prototype of the nanopass framework to support
the construction of micropass compilers.
In 2004, Dipanwita Sarkar, Oscar Waddell, and R. Kent Dybvig developed a
more complete prototype nanopass framework for compiler construction and
submitted a paper on it to ICFP~\cite{Sarkar:2004:NIC:1016850.1016878}.
The initial paper focused on the nanopass framework as a tool capable of
developing both academic and commercial quality compilers.
The paper was accepted but on the condition that it be refocused only on academic
uses.
The reviewers were not convinced that the framework or nanopass construction method
was capable of supporting a commercial compiler.
In retrospect, the reviewers were right.
Sarkar implemented only a few of the passes from the compiler used in the
course on compilers.
This implementation showed that the nanopass framework was viable, but it did
not support the claim
that the nanopass framework could be used for a commercial compiler.
In fact, because the class compiler was started but never completed, it is
unclear whether the prototype was even up to the task of writing the full class
compiler.
The nanopass framework described in this guide improves on the prototype
developed by Sarkar.
In this framework, language definitions are no longer restricted to
top-level definitions.
Additionally, passes can accept more than one argument and return zero or
more values.
Passes can be defined that operate on a subset of a language instead of being
restricted to starting from the entry-point nonterminal of the language.
Passes can also autogenerate nonterminal transformers not supplied by the
compiler writer.
The new nanopass framework also defines two new syntactic forms,
\scheme{nanopass-case} and \scheme{with-output-language}, that allow language
terms to be matched and constructed outside the context of a pass.
\section{The Nanopass Framework Today}
% TODO: Update this line count to reflect the current size of
% the nanopass framework
Although the nanopass framework defines just two primary syntactic forms, the
macros that implement them are complex, with approximately 4600 lines of code.
In both the prototype and the new version of the nanopass framework, the
\scheme{define-language} macro parses a language definition and stores a
representation of it in the compile-time environment.
This representation can be used to guide the definition of derived languages
and the construction of passes.
Both also create a set of record types used to represent language terms at run
time, along with an unparser for translating the record representation to an
S-expression representation.
Finally, both create meta-parsers to parse S-expression patterns and templates.
An S-expression to record-form parser can also be created from the language
using \scheme{define-parser}.\footnote{In the prototype, this was part of
the functionality of \scheme{define-language}, but in a commercial compiler
we do not frequently need an S-expression parser, so we no longer
autogenerate one.}
The \scheme{define-pass} form, in both versions of the framework, operates
over an input-language term and produces an output-language term.
The input-language meta-parser generates code to match the specified pattern as
records, as well as a set of bindings for the variables named in the pattern.
The output-language meta-parser generates record constructors and
grammar-checking code.
Within a pass definition, a transformer is used to define a translation from an
input nonterminal to an output nonterminal.
Each transformer has a set of clauses that match an input-language term and
construct an output-language term.
The pattern matching also supports
catamorphisms~\cite{Meijer:1991:FPB:645420.652535} for recurring into language
sub-terms.
\section{Examples using the Nanopass Framework}
There are two, publicly available examples of the nanopass framework.
The first is in the {\tt tests} sub-directory of the nanopass framework git
repository at
\href{https://github.com/akeep/nanopass-framework/}{github.com/akeep/nanopass-framework}.
This is part of a student compiler, originally included with the prototype
nanopass framework developed by Sarkar et al.\ and updated to conform with the
changes that have been made in the updated nanopass framework.
The second example is available in the
\href{https://github.com/akeep/scheme-to-c/}{github.com/akeep/scheme-to-c}
repository.
This compiler is better documented and provides a complete compiler
example targeting fairly low-level C from a simplified Scheme dialect.
It was developed to be presented at
\href{https://clojure-conj.org}{Clojure Conj 2013}, just
days before the Conj started, and compiles a small subset of Scheme to C.
It is similar to the included example, but has the advantage of being a
complete end-to-end compiler that can be run from a Scheme REPL.
It uses {\tt gcc}, targeting a 64-bit platform as the back-end, but I hope can
be modified to target other platforms without too much trouble, or even moved
off of C to target JavaScript, LLVM, or other back ends.
\section{Other Uses of the Nanopass Frameowrk}
The nanopass framework was used to replace the original Chez Scheme
compiler~\cite{dybvig:csug9} with a nanopass version of the compiler.
The nanopass version has officially been released as Chez Scheme version 9.0.
Chez Scheme is a closed-source commercial compiler.
The nanopass framework is also being used as part of the
\href{https://github.com/eholk/harlan}{Harlan} compiler.
Harlan is a general purpose language for developing programs for running on
the GPU.
Harlan uses an S-expression format that is compiled into C++ using OpenCL to
run computational kernels on the GPU.
The source code for Harlan is publicly available at
\href{https://github.com/eholk/harlan}{github.com/eholk/harlan}.
\chapter{Defining Languages and Passes} % old 2.4, new 2.3
The nanopass framework builds on the prototype, originally developed by
Sarkar et al.
The examples in this section are pulled from the Scheme to C compiler available
at \href{https://github.com/akeep/scheme-to-c}{github.com/akeep/scheme-to-c}.
\section{Defining languages}
The nanopass framework operates over a set of compiler-writer-defined
languages.
Languages defined in this way are similar to context-free grammars, in that
they are composed of a set of terminals, a set of nonterminal symbols, a set of
productions for each nonterminal, and a start symbol from the set of
nonterminal symbols.
We refer to the start symbol as the entry nonterminal of the language.
An intermediate language definition for a simple variant of the Scheme
programming language, post macro expansion, might look like:
{\small
\schemedisplay
(define-language Lsrc
(terminals
(symbol (x))
(primitive (pr))
(constant (c))
(datum (d)))
(Expr (e body)
pr
x
c
(quote d)
(if e0 e1)
(if e0 e1 e2)
(or e* ...)
(and e* ...)
(not e)
(begin e* ... e)
(lambda (x* ...) body* ... body)
(let ([x* e*] ...) body* ... body)
(letrec ([x* e*] ...) body* ... body)
(set! x e)
(e e* ...)))
\endschemedisplay
}
\noindent
The \scheme{Lsrc} language defines a subset of Scheme suitable for our
example compiler.
It is the output language of a more general ``parser'' that
parses S-expressions into \scheme{Lsrc} language forms.
The \scheme{Lsrc} language consists of a set of terminals (listed in the
\scheme{terminals} form) and a single nonterminal \scheme{Expr}.
The terminals of the language are
\begin{itemize}
\item \scheme{symbol} (for variables),
\item \scheme{primitive} (for the subset of Scheme primitives support
by this language),
\item \scheme{constant} (for the subset of Scheme constants, and
\item \scheme{datum} (for the subset of Scheme datum supported by this language).
\end{itemize}
The compiler writer must supply a predicate corresponding to each terminal,
lexically visible where the language is defined.
The nanopass framework derives the predicate name from the terminal name by
adding a \scheme{?} to the terminal name.
In this case, the nanopass framework expects \scheme{symbol?},
\scheme{primitive?}, \scheme{constant?}, and \scheme{datum?} to be
lexically visible where \scheme{Lsrc} is defined.
Each terminal clause lists one or more meta-variables, used to refer to the
terminal in nonterminal productions.
Here, \scheme{x} refers to a \scheme{symbol}, \scheme{pr} refers to
a \scheme{primitive}, \scheme{c} refers to a \scheme{constant},
and \scheme{d} refers to a \scheme{datum}.
For our example compiler, the host Scheme system's \scheme{symbol?} is used
to determine when an item is a variable.
The example compiler also selects a subset of primitives from Scheme and
represents these primitives as symbols.
A \scheme{primitive?} predicate like the following can be used to specify
this terminal.\footnote{In the example compiler, the primitives are specified
in separate association lists to capture the arity of each primitive and the
place in the compiler is handled as it goes through the compiler process.
This complexity has been eliminated for the dicussion here.
Please reference the source code for a more complete discussion of
primitive handling in the example compiler.}
{\small
\schemedisplay
(define primitive?
(lambda (x)
(memq x
'(cons make-vector box car cdr vector-ref vector-length unbox
+ - * / pair? null? boolean? vector? box? = < <= > >= eq?
vector-set! set-box!))))
\endschemedisplay
}
\noindent
Our example compiler also limits the constants that can be expressed to a subset of those allowed by Scheme.
The \scheme{constant?} predicate limits these to booleans (\scheme{#t} and
\scheme{#f}), null (\scheme{()}), and appropriately sized integers
(between $-2^{60}$ and $2^{60} - 1$).
{\small
\schemedisplay
(define target-fixnum?
(lambda (x)
(and (and (integer? x) (exact? x))
(<= (- (expt 2 60)) x (- (expt 2 60) 1)))))
(define constant?
(lambda (x)
(or (target-fixnum? x) (boolean? x) (null? x))))
\endschemedisplay
}
\noindent
The example compiler limits the Scheme datum that can be represented to
constants, pairs, vectors, and boxes.
The \scheme{datum?} predicate can be defined as follows:
{\small
\schemedisplay
(define datum?
(lambda (x)
(or (constant? x)
(and (box? x) (datum? (unbox x)))
(and (pair? x) (datum? (car x)) (datum? (cdr x)))
(and (vector? x)
(let loop ([i (vector-length x)])
(or (fx=? i 0)
(let ([i (fx- i 1)])
(and (datum? (vector-ref x i))
(loop i)))))))))
\endschemedisplay
}
\noindent
The \scheme{Lsrc} language also defines the nonterminal \scheme{Expr}.
Nonterminals start with a name, followed by a list of meta-variables and a set
of grammar productions.
In this case, the name is \scheme{Expr}, and two meta-variables, \scheme{e} and
\scheme{body}, are specified.
Just like the meta-variables named in the terminals clause, nonterminal
meta-variables are used to represent the nonterminal in nonterminal
productions.
Each production follows one of three forms.
It is a single meta-variable, an S-expression that starts with a
keyword, or an S-expression that does not start with a keyword (referred to as an
\emph{implicit} production).
The S-expression forms cannot include keywords past the initial starting
keyword.
In \scheme{Lsrc}, the \scheme{x}, \scheme{c}, and \scheme{pr} productions are
the single meta-variable productions and indicate that a stand-alone
\scheme{symbol}, \scheme{constant}, or \scheme{primitive} are valid
\scheme{Expr}s.
The only implicit S-expression production is the \scheme{(e e* ...)}
production, and it indicates a call that takes zero or more
\scheme{Expr}s as arguments.
(The \scheme{*} suffix on \scheme{e} is used by convention to indicate
plurality and does not have any semantic meaning: It is the \scheme{...} that
indicates that the field can take zero or more \scheme{Expr}s.)
The rest of the productions are S-expression productions with keywords that
correspond to the Scheme syntax that they represent.
In addition to the star, \scheme{*}, suffix mentioned earlier in the call
productions, meta-variable references can also use a
numeric suffix (as in the productions for \scheme{if}), a question mark (\scheme{?}), or a caret (\scheme{^}).
The \scheme{?} suffix is intended for use with \scheme{maybe} meta-variables,
and the \scheme{^} is used when expressing meta-variables with a more
mathematical syntax than the numeric suffixes provide.
Suffixes can also be used in combination.
References to meta-variables in a production must be unique, and the suffixes
allow the same root name to be used more than once.
Language definitions can also include more than one nonterminal, as the
following language illustrates:
{\small
\schemedisplay
(define-language L8
(terminals
(symbol (x a))
(constant (c))
(void+primitive (pr)))
(entry Expr)
(Expr (e body)
x
le
(quote c)
(if e0 e1 e2)
(begin e* ... e)
(set! x e)
(let ([x* e*] ...) abody)
(letrec ([x* le*] ...) body)
(primcall pr e* ...)
(e e* ...))
(AssignedBody (abody)
(assigned (a* ...) body) => body)
(LambdaExpr (le)
(lambda (x* ...) abody)))
\endschemedisplay
}
\noindent
This language has three nonterminals, \scheme{Expr}, \scheme{AssignedBody},
and \scheme{LambdaExpr}.
When more than one nonterminal is specified, one must be selected as the entry
point.
In language \scheme{L8}, the \scheme{Expr} nonterminal is selected as the entry
nonterminal by the \scheme{(entry Expr)} clause.
When the entry clause is not specified, the first nonterminal listed is
implicitly selected as the entry point.
The \scheme{L8} language uses a single terminal meta-variable production,
\scheme{x},
to indicate that a stand-alone \scheme{symbol} is a valid \scheme{Expr}.
In addition, the \scheme{L8} language uses a single nonterminal meta-variable
production, \scheme{le}, to indicate that any \scheme{LambdaExpr} production is
also a valid \scheme{Expr}.
The \scheme{LambdaExpr} is separated from \scheme{Expr} because the
\scheme{letrec} production is now limited to binding \scheme{symbol}s to
\scheme{LambdaExpr}s.
The \scheme{assigned} production of the \scheme{AssignedBody} nonterminal
utilizes a the \scheme{=>} syntax to indicate a pretty unparsing form.
This allows the unparser that is automatically produced by
\scheme{define-language} to generate an S-expression that can be evaluated in
the host Scheme system.
In this case, the \scheme{assigned} from is not a valid Scheme form, so we
simply eliminated the \scheme{assigned} wrapper and list of assigned variables
when unparsing.\footnote{Unparsers can also produce the non-pretty from by
passing both the language form to be unparsed and a \scheme{#f} to indicate
the pretty form should not be used.}
In addition to the nanopass framework providing a syntax for specifying list
structures in a language
production, it is also possible to indicate that a field of a language
production might not contain a (useful) value.
The following language has an example of this:
{\small
\schemedisplay
(define-language Lopt
(terminals
(uvar (x))
(label (l))
(constant (c))
(primitive (pr)))
(Expr (e body)
x
(quote c)
(begin e* ... e)
(lambda (x* ...) body)
(let ([x* e*] ...) body)
(letrec ([x* le*] ...) body)
(pr e* ...)
(call (maybe l) (maybe e) e* ...))
(LambdaExpr (le)
(lambda (x* ...) body)))
\endschemedisplay
}
\noindent
The \scheme{(maybe l)} field indicates that either a label, \scheme{l}, or
\scheme{#f} will be provided.
Here, \scheme{#f} is a stand-in for bottom, indicating that the value is not
specified.
The \scheme{(maybe e)} field indicates that either an \scheme{Expr} or
\scheme{#f} will be provided.
Instead of using \scheme{(maybe l)} to indicate a label that might be provided,
a \scheme{maybe-label} terminal that serves the same purpose could be added.
It is also possible to eliminate the \scheme{(maybe e)} form, although it
requires the creation of a separate nonterminal that has both an \scheme{e}
production and a production to represent $\bot$, when no \scheme{Expr} is
available.
\section{Extending languages\label{subsec:extended-define-language}}
The first ``pass'' of the example compiler is a simple expander that produces
\scheme{Lsrc} language forms from S-expressions.
The next pass takes the \scheme{Lsrc} language and removes the one-armed-if
expressions, replacing them with a two-armed-if that results in the void value
being produced by the expression when the test clause is false.
code appropriate to construct these constants.
The output grammar of this pass changes just one production of the language,
exchanging potentially complex quoted datum with quoted
constants and making explicit the code to build the constant pairs and vectors when the program
begins execution.
The compiler writer could specify the new language by rewriting the
\scheme{Lsrc} language and replacing the appropriate terminal forms.
Rewriting each language in its full form, however, can result in verbose
source code, particularly in a compiler like the class compiler, which has
nearly 30 different intermediate languages.
Instead, the nanopass framework supports a language extension form.
The output language can be specified as follows:
{\small
\schemedisplay
(define-language L1
(extends Lsrc)
(terminals
(- (primitive (pr)))
(+ (void+primitive (pr))))
(Expr (e body)
(- (if e0 e1))))
\endschemedisplay
}
\noindent
The \scheme{L1} language removes the \scheme{primitive} terminal and replaces it
with the \scheme{void+primitive} terminal.
It also removes the \scheme{(if e0 e1)} production.
A language extension form is indicated by including the \scheme{extends}
clause, in this case \scheme{(extends Lsrc)}, that indicates that this is
an extension of the given base language.
In a language extension, the \scheme{terminals} form now contains
subtraction clauses, in
this case \scheme{(- (primitive (pr)))}, and addition clauses, in this case
\scheme{(+ (void+primitive (pr)))}.
These addition and subtraction clauses can contain one or more terminal
specifiers.
The nonterminal syntax is similarly modified, with the subtraction clause, in
this case \scheme{(- (if e0 e1))}, that indicates productions to be removed
and an addition clause that indicates productions to be added, in this case
no productions are added.
The list of meta-variables indicated for the nonterminal form is also updated
to use the set in the extension language.
It is important to include not only the meta-variables named in the language
extension but also those for terminal and nonterminal forms that will be
maintained from the base language.
Otherwise, these meta-variables will be unbound in the extension language,
leading to errors.
Nonterminals can be removed in an extended language by removing all of the
productions of the nonterminal.
New nonterminals can be added in an extended language by adding the
productions of the new nonterminal.
For instance, language \scheme{L15} removes the \scheme{x}, \scheme{(qoute c)},
and \scheme{(label l)} productions from the \scheme{Expr} nonterminal and
adds the \scheme{SimpleExpr} nonterminal.
{\small
\schemedisplay
(define-language L15
(extends L14)
(Expr (e body)
(- x
(quote c)
(label l)
(primcall pr e* ...)
(e e* ...))
(+ se
(primcall pr se* ...) => (pr se* ...)
(se se* ...)))
(SimpleExpr (se)
(+ x
(label l)
(quote c))))
\endschemedisplay
}
\subsection{The {\tt define-language} form}
The \scheme{define-language} syntax has two related forms.
The first form fully specifies a new language.
The second form uses the \scheme{extends} clause to indicate that the language
is an extension of an existing base language.
Both forms of \scheme{define-language} start with the same basic syntax:
{\small
\schemedisplay
(define-language \var{language-name} \var{clause} ...)
\endschemedisplay
}
\noindent
where \var{clause} is an \scheme{extension} clause, an \scheme{entry} clause, a
\scheme{terminals} clause, or a nonterminal clause.
\noindent
\textbf{Extension clause.}
The extension clause indicates that the new language is an extension of an existing
language.
This clause slightly changes the syntax of the \scheme{define-language} form
and is described in Section~\ref{subsec:extended-define-language}.
\noindent
\textbf{Entry clause.}
The entry clause specifies which nonterminal is the starting point for this
language.
This information is used when generating passes to determine which nonterminal
should be expected first by the pass.
This default can be overridden in a pass definition, as described in
Section~\ref{sec:pass-syntax}.
The entry clause has the following form:
{\small
\schemedisplay
(entry \var{nonterminal-name})
\endschemedisplay
}
\noindent
where \var{nonterminal-name} corresponds to one of the nonterminals specified
in this language.
Only one entry clause can be specified in a language definition.
\noindent
\textbf{Terminals clause.}
The terminals clause specifies one or more terminals used by the language.
For instance, in the \scheme{Lsrc} example language, the terminals clause
specifies three terminal types: \scheme{uvar}, \scheme{primitive}, and
\scheme{datum}.
The terminals clause has the following form:
{\small
\schemedisplay
(terminals \var{terminal-clause} ...)
\endschemedisplay
}
\noindent
where \var{terminal-clause} has one of the following forms:
{\small
\schemedisplay
(\var{terminal-name} (\var{meta-var} ...))
(=> (\var{terminal-name} (\var{meta-var} ...)) \var{prettifier})
(\var{terminal-name} (\var{meta-var} ...)) => \var{prettifier}
\endschemedisplay
}
Here,
\partopsep=-\parskip
\begin{itemize}
\item \var{terminal-name} is the name of the terminal, and a corresponding
\scheme{\var{terminal-name}?} predicate function exists to determine whether a
Scheme object is of this type when checking the output of a pass,
\item \var{meta-var} is the name of a meta-variable used for referring to this
terminal type in language and pass definitions, and
\item \var{prettifier} is a procedure expression of one argument used
when the language unparser is called in ``pretty'' mode to produce
a pretty, S-expression representation.
\end{itemize}
The final form is syntactic sugar for the form above it.
When the \var{prettifier} is omitted, no processing is done on the terminal
when the unparser runs.
\noindent
\textbf{Nonterminal clause.}
A nonterminal clause specifies the valid productions in a language.
Each nonterminal clause has a name, a set of meta-variables, and a set of
productions.
A nonterminal clause has the following form:
{\small
\schemedisplay
(\var{nonterminal-name} (\var{meta-var} ...)
\var{production-clause}
...)
\endschemedisplay
}
\noindent
where \var{nonterminal-name} is an identifier that names the nonterminal,
\var{meta-var} is the name of a meta-variable used when referring to this
nonterminal in language and pass definitions, and \var{production-clause}
has one of the following forms:
{\small
\schemedisplay
\var{terminal-meta-var}
\var{nonterminal-meta-var}
\var{production-s-expression}
(\var{keyword} . \var{production-s-expression})
\endschemedisplay
}
\noindent
Here,
\begin{itemize}
\item \var{terminal-meta-var} is a terminal meta-variable that is a stand-alone
production for this nonterminal,
\item \var{nonterminal-meta-var} is a nonterminal meta-variable that
indicates that any form allowed by the specified nonterminal is also allowed by
this nonterminal,
\item \var{keyword} is an identifier that must be matched exactly when parsing
an S-expression representation, language input pattern, or language output
template, and
\item \var{production-s-expression} is an S-expression that represents a
pattern for production and has the following form:
\end{itemize}
{\small
\schemedisplay
\var{meta-variable}
(maybe \var{meta-variable})
(\var{production-s-expression} \var{ellipsis})
(\var{production-s-expression} \var{ellipsis} \var{production-s-expression} ... . \var{production-s-expression})
(\var{production-s-expression} . \var{production-s-expression})
()
\endschemedisplay
}
\noindent
Here,
\begin{itemize}
\item \var{meta-variable} is any terminal or nonterminal meta-variable
extended with an arbitrary number of digits, followed by an arbitrary
combination of \scheme{*}, \scheme{?}, or \scheme{^} characters; for example,
if the meta-variable is \scheme{e}, then \scheme{e1}, \scheme{e*}, \scheme{e?},
and \scheme{e4*?} are all valid meta-variable expressions;
\item \scheme{(maybe \var{meta-variable})} indicates that an element in the
production is either of the type of the meta-variable or bottom (represented by
\scheme{#f}); and
\item \var{ellipsis} is the literal \scheme{...} and indicates that a list of
the \var{production-s-expression} that proceeds it is expected.
\end{itemize}
Thus, a Scheme language form such as \scheme{let} can be represented as a
language production as:
{\small
\schemedisplay
(let ([x* e*] ...) body* ... body)
\endschemedisplay
}
\noindent
where \scheme{let} is the \var{keyword}, \scheme{x*} is a meta-variable that
indicates a list of variables, \scheme{e*} and \scheme{body*} are
meta-variables that each indicate a list of expressions, and \scheme{body} is a
meta-variable that indicates a single expression.
Using the \scheme{maybe} form, something similar to the named-let form could
be represented as follows:
{\small
\schemedisplay
(let (maybe x) ([x* e*] ...) body* ... body)
\endschemedisplay
}
\noindent
although this would be slightly different from the normal named-let form, in that
the non-named form would then need an explicit \scheme{#f} to indicate that no name
was specified.
\subsection{Extensions with the {\tt define-language} form\label{subsubsec:extended-define-language}}
A language defined as an extension of an existing language has a slightly
modified syntax to indicate what should be added to or removed from
the base language to create the new language.
A compiler writer indicates that a language is an extension by using an
extension clause.
\noindent
\textbf{Extension clause.}
The extension clause has the following form:
{\small
\schemedisplay
(extends \var{language-name})
\endschemedisplay
}
\noindent
where \var{language-name} is the name of an already defined language.
Only one extension clause can be specified in a language definition.
\noindent
\textbf{Entry clause.}
The entry clause does not change syntactically in an extended language.
It can, however, name a nonterminal from the base language that is retained in
the extended language.
\noindent
\textbf{Terminals clause.}
When a language derives from a base language, the \scheme{terminals} clause has the following form:
{\small
\schemedisplay
(terminals \var{extended-terminal-clause} ...)
\endschemedisplay
}
\noindent
where \var{extended-terminal-clause} has one of the following forms:
{\small
\schemedisplay
(+ \var{terminal-clause} ...)
(- \var{terminal-clause} ...)
\endschemedisplay
}
\noindent
where the \var{terminal-clause} uses the syntax for terminals specified in the
non-extended \scheme{terminals} form.
The \scheme{+} form indicates terminals that should be added to the new language.
The \scheme{-} form indicates terminals that should be removed from the list in
the old language when producing the new language.
Terminals not mentioned in a terminals clause will be copied unchanged into the new
language.
Note that adding and removing \var{meta-var}s from a terminal currently
requires removing the terminal type and re-adding it.
This can be done in the same step with a \scheme{terminals} clause, similar to the following:
{\small
\schemedisplay
(terminals
(- (variable (x)))
(+ (variable (x y))))
\endschemedisplay
}
\noindent
\textbf{Nonterminal clause.}
When a language extends from a base language, a nonterminal clause has the
following form:
{\small
\schemedisplay
(\var{nonterminal-name} (\var{meta-var} ...)
\var{extended-production-clause}
...)
\endschemedisplay
}
\noindent
where \var{extended-production-clause} has one of the following forms:
{\small
\schemedisplay
(+ \var{production-clause} ...)
(- \var{production-clause} ...)
\endschemedisplay
}
\noindent
The \scheme{+} form indicates nonterminal productions that should be added to
the nonterminal in the new language.
The \scheme{-} form indicates nonterminal productions that should not be
copied from the list of productions for this nonterminal in the base language when
producing the new language.
Productions not mentioned in a nonterminal clause will be copied unchanged into the
nonterminal in the new language.
If a nonterminal has all of its productions removed in a new language, the
nonterminal will be dropped in the new language.
Conversely, new nonterminals can be added by naming the new nonterminal and
using the \scheme{+} form to specify the productions of the new nonterminal.
\subsection{Products of {\tt define-language}}
The \scheme{define-language} form produces the following user-visible bindings:
\begin{itemize}
\item a language definition, bound to the specified \var{language-name};
\item an unparser (named \scheme{unparse-\var{language-name}}) that can be used
to unparse a record-based representation back into an S-expression representation; and
\item a set of predicates that can be used to identify a term of the language
or a term from a specified nonterminal in the language.
\end{itemize}
It also produces the following internal bindings:
\begin{itemize}
\item a meta-parser that can be used by the \scheme{define-pass} macro to parse
the patterns and templates used in passes and
\item a set of record definitions that will be used to represent the language
forms.
\end{itemize}
The \scheme{Lsrc} language, for example, will bind the identifier
\scheme{Lsrc} to the language definition, produce an unparser named
\scheme{unparse-Lsrc}, and create two predicates, \scheme{Lsrc?} and
\scheme{Lsrc-Expr?}.
The language definition is used when the \var{language-name} is specified as
the base of a new language definition and in the definition of a pass.
The \scheme{define-parser} form can also be used to create a simple parser for
parsing S-expressions into language forms as follows:
{\small
\schemedisplay
(define-parser \var{parser-name} \var{language-name})
\endschemedisplay
}
\noindent
The parser does not support backtracking; thus, grammars must be specified, either by specifying a keyword or by having
different length S-expressions so that the productions are unique.
For instance, the following language definition cannot be parsed because all
four of the \scheme{set!} forms have the same keyword and are S-expressions of
the same length:
{\small
\schemedisplay
(define-language Lunparsable
(terminals
(variable (x))
(binop (binop))
(integer-32 (int32))
(integer-64 (int64)))
(Program (prog)
(begin stmt* ... stmt))
(Statement (stmt)
(set! x0 int64)
(set! x0 x1)
(set! x0 (binop x1 int32))
(set! x0 (binop x1 x2))))
\endschemedisplay
}
\noindent
Instead, the \scheme{Statement} nonterminal must be broken into multiple
nonterminals, as in the following language:
{\small
\schemedisplay
(define-language Lparsable
(terminals
(variable (x))
(binop (binop))
(integer-32 (int32))
(integer-64 (int64)))
(Program (prog)
(begin stmt* ... stmt))
(Statement (stmt)
(set! x rhs))
(Rhs (rhs)
x
int64
(binop x arg))
(Argument (arg)
x
int32))
\endschemedisplay
}
\section{Defining passes\label{sec:define-pass}}
Passes are used to specify transformations over languages defined by using
\scheme{define-language}.
Before going into the formal details of defining passes, we need to take a look
at a simple pass to convert an input program from the \scheme{Lsrc}
intermediate language to the \scheme{L1} intermediate language.
This pass removes the one-armed-if by making the
result of the \scheme{if} expression explicit when the predicate is false.
We define a pass called \scheme{remove-one-armed-if} to accomplish this
task, without using any of the
catamorphism~\cite{Meijer:1991:FPB:645420.652535} or
autogeneration features of the nanopass framework.
Below, we can see how this feature helps eliminate boilerplate code.
{\small
\schemedisplay
(define-pass remove-one-armed-if : Lsrc (e) -> L1 ()
(Expr : Expr (e) -> Expr ()
[(if ,e0 ,e1) `(if ,(Expr e0) ,(Expr e1) (void))]
[,pr pr]
[,x x]
[,c c]
[(quote ,d) `(quote ,d)]
[(if ,e0 ,e1 ,e2) `(if ,(Expr e0) ,(Expr e1) ,(Expr e2))]
[(or ,e* ...) `(or ,(map Expr e*) ...)]
[(and ,e* ...) `(and ,(map Expr e*) ...)]
[(not ,e) `(not ,(Expr e))]
[(begin ,e* ... ,e) `(begin ,(map Expr e*) ... ,(Expr e))]
[(lambda (,x* ...) ,body* ... ,body)
`(lambda (,x* ...) ,(map Expr body*) ... ,(Expr body))]
[(let ([,x* ,e*] ...) ,body* ... ,body)
`(let ([,x* ,(map Expr e*)] ...)
,(map Expr body*) ... ,(Expr body))]
[(letrec ([,x* ,e*] ...) ,body* ... body)
`(letrec ([,x* ,(map Expr e*)] ...)
,(map Expr body*) ... ,(Expr body))]
[(set! ,x ,e) `(set! ,x ,(Expr e))]
[(,e ,e* ...) `(,(Expr e) ,(map Expr e*) ...)])
(Expr e))
\endschemedisplay
}
\noindent
The pass definition starts with a name (in this case,
\scheme{remove-one-armed-if})
and a signature.
The signature starts with an input-language specifier (e.g. \scheme{Lsrc}),
along with a list of formals.
Here, there is just one formal, \scheme{e}, for the input-language term.
The second part of the signature has an output-language specifier (in this case,
\scheme{L1}), as well as a list of extra return values (in this case, empty).
Following the name and signature, is an optional definitions clause, not
used in this pass.
The \scheme{definitions} clause can contain any Scheme expression valid in a
definition context.
Next, a transformer from the input nonterminal \scheme{Expr} to the output
nonterminal \scheme{Expr} is defined.
The transformer is named \scheme{Expr} and has a signature similar to that
of the pass, with an input-language nonterminal and list of formals followed
by the output-language nonterminal and list of extra-return-value expressions.
The transformer has a clause that processes each production of the \scheme{Expr}
nonterminal.
Each clause consists of an input pattern, an optional \scheme{guard} clause,
and one or more expressions that specify zero or more return values based on the
signature.
The input pattern is derived from the S-expression productions specified
in the input language.
Each variable in the pattern is denoted by unquote (\scheme{,}).
For instance, the clause for the \scheme{set!} production matches the pattern
\scheme{(set! ,x ,e)}, binds \scheme{x} to the \scheme{symbol} specified by the
\scheme{set!} and \scheme{e} to the \scheme{Expr} specified by the
\scheme{set!}.
% I might do this as an asside, if I could figure out how to bend LaTeX to my
% will enough to do that.
The variable names used in pattern bindings are based on the meta-variables
listed in the language definition.
This allows the pattern to be further restricted.
For instance, if we wanted to match only \scheme{set!} forms that had a
variable reference as the RHS, we could specify our pattern as
\scheme{(set! ,x0 ,x1)}, which would be equivalent of using our original
pattern with the \scheme{guard} clause: \scheme{(guard (symbol? e))}.
The output-language expression is constructed using the \scheme{`(set! ,x ,(Expr e))} quasiquoted template.
Here, quasiquote, (\scheme{`}), is rebound to a form that can construct language
forms based on the template, and unquote (\scheme{,}), is used to escape back
into Scheme.
The \scheme{,(Expr e)} thus puts the result of the recursive call of
\scheme{Expr} into the output-language \scheme{(set! x e)} form.
Following the \scheme{Expr} transformer is the body of the pass, which calls
\scheme{Expr} to transform the \scheme{Lsrc} \scheme{Expr} term into an \scheme{L1}
\scheme{Expr} term and wraps the result in a \scheme{let} expression if any
structured quoted datum are found in the program that is being compiled.
In place of the explicit recursive calls to \scheme{Expr}, the compiler writer
can use the catamorphism syntax to indicate the recurrence, as in the
following version of the pass.
{\small
\schemedisplay
(define-pass remove-one-armed-if : Lsrc (e) -> L1 ()
(Expr : Expr (e) -> Expr ()
[(if ,[e0] ,[e1]) `(if ,e0 ,e1 (void))]
[,pr pr]
[,x x]
[,c c]
[(quote ,d) `(quote ,d)]
[(if ,[e0] ,[e1] ,[e2]) `(if ,e0 ,e1 ,e2)]
[(or ,[e*] ...) `(or ,e* ...)]
[(and ,[e*] ...) `(and ,e* ...)]
[(not ,[e]) `(not ,e)]
[(begin ,[e*] ... ,[e]) `(begin ,e* ... ,e)]
[(lambda (,x* ...) ,[body*] ... ,[body])
`(lambda (,x* ...) ,body* ... ,body)]
[(let ([,x* ,[e*]] ...) ,[body*] ... ,[body])
`(let ([,x* ,e*] ...)
,body* ... ,body)]
[(letrec ([,x* ,[e*]] ...) ,[body*] ... ,[body])
`(letrec ([,x* ,e*] ...)
,body* ... ,body)]
[(set! ,x ,[e]) `(set! ,x ,e)]
[(,[e] ,[e*] ...) `(,e ,e* ...)])
(Expr e))
\endschemedisplay
}
\noindent
Here, the square brackets that wrap the unquoted variable expression in a
pattern indicate that a catamorphism should be applied.
For instance, in the \scheme{set!} clause, the \scheme{,e} from the previous
pass becomes \scheme{,[e]}.
When the catamorphism is included on an element that is followed by an
ellipsis, \scheme{map} is used to process the elements of the list and to construct
the output list.
% another place for this to be an aside with a link down to the
% catamorphism section
Using a catamorphism changes, slightly, the meaning of the meta-variables used
in the pattern matcher.
Instead of indicatinng a input language restriction that must be met, it
indicates an output type that is expected.
In the \scheme{set!} clause example, we use \scheme{e} for both, because our
input language and output language both use \scheme{e} to refer to
their \scheme{Expr} nonterminal.
The nanopass framwork uses the input type and the output type, along with any
additional input values and extra expected return values to determine which
transformer should be called.
In some cases, specifically where a single input nonterminal form is
transformed into an equivalent output nonterminal form, these transformers can
be autogenerated by the framework.
Using catamorphisms helps to make the pass more succinct, but there is still
boilerplate code in the pass that the framework can fill in for the compiler
writer.
Several clauses simply match the input-language production and generate a matching
output-language production (modulo the catamorphisms for nested \scheme{Expr} forms).
Because the input and output languages are defined, the \scheme{define-pass}
macro can automatically generate these clauses.
Thus, the same functionality can be expressed as follows:
{\small
\schemedisplay
(define-pass remove-one-armed-if : Lsrc (e) -> L1 ()
(Expr : Expr (e) -> Expr ()
[(if ,[e0] ,[e1]) `(if ,e0 ,e1 (void))]))
\endschemedisplay
}
\noindent
In this version of the pass, only the one-armed-\scheme{if} form
is explicitly processed.
The \scheme{define-pass} form automatically generates the other clauses.
Although all three versions of this pass perform the same task, the final form is the
closest to the initial intention of replacing just the one-armed-if form with a two-armed-if.
In addition to \scheme{define-pass} autogenerating the clauses of a transformer, \scheme{define-pass} can also
autogenerate the transformers for nonterminals that must be traversed but are
otherwise unchanged in a pass.
For instance, one of the passes in the class compiler removes complex
expressions from the right-hand side of the \scheme{set!} form.
At this point in the compiler, the language has several nonterminals:
{\small
\schemedisplay
(define-language L18
(entry Program)
(terminals
(integer-64 (i))
(effect+internal-primitive (epr))
(non-alloc-value-primitive (vpr))
(symbol (x l))
(predicate-primitive (ppr))
(constant (c)))
(Program (prog)
(labels ([l* le*] ...) l))
(SimpleExpr (se)
x
(label l)
(quote c))
(Value (v body)
(alloc i se)
se
(if p0 v1 v2)
(begin e* ... v)
(primcall vpr se* ...)
(se se* ...))
(Effect (e)
(set! x v)
(nop)
(if p0 e1 e2)
(begin e* ... e)
(primcall epr se* ...)
(se se* ...))
(Predicate (p)
(true)
(false)
(if p0 p1 p2)
(begin e* ... p)
(primcall ppr se* ...))
(LocalsBody (lbody)
(locals (x* ...) body))
(LambdaExpr (le)
(lambda (x* ...) lbody)))
\endschemedisplay
}
\noindent
The pass, however, is only interested in the \scheme{set!} form and the
\scheme{Value} form in the right-hand-side position of the \scheme{set!} form.
Relying on the autogeneration of transformers, this pass can be written as:
{\small
\schemedisplay
(define-pass flatten-set! : L18 (e) -> L19 ()
(SimpleExpr : SimpleExpr (se) -> SimpleExpr ())
(Effect : Effect (e) -> Effect ()
[(set! ,x ,v) (flatten v x)])
(flatten : Value (v x) -> Effect ()
[,se `(set! ,x ,(SimpleExpr se))]
[(primcall ,vpr ,[se*] ...) `(set! ,x (primcall ,vpr ,se* ...))]
[(alloc ,i ,[se]) `(set! ,x (alloc ,i ,se))]
[(,[se] ,[se*] ...) `(set! ,x (,se ,se* ...))]))
\endschemedisplay
}
\noindent
Here, the \scheme{Effect} transformer has just one clause for matching the
\scheme{set!} form.
The \scheme{flatten} transformer is called to produce the final \scheme{Effect}
form.
The \scheme{flatten} transformer, in turn, pushes the \scheme{set!} form into
the \scheme{if} and \scheme{begin} forms and processes the contents of these
forms, which produces a final \scheme{Effect} form.
Note that the \scheme{if} and \scheme{begin} forms do not need to be provided
by the compiler writer.
This is because the input and output language provide enough structure that the
nanopass framework can automatically generate the appropriate clauses.
In the case of \scheme{begin} it will push the \scheme{set!} form into the
final, value producing, expression of the \scheme{begin} form.
In the case of the \scheme{if} it will push the \scheme{set!} form into both
the consquent and alternative of the if form, setting the variable at the
final, value producing expression on both possible execution paths.
The \scheme{define-pass} macro autogenerates transformers for \scheme{Program},
\scheme{LambdaExpr}, \scheme{LocalsBody}, \scheme{Value}, and
\scheme{Predicate} that recur through the input-language forms and produce the
output-language forms.
The \scheme{SimpleExpr} transformer only needs to be written to give a name to
the transformer so that it can be called by \scheme{flatten}.
It is sometimes necessary to pass more information than just
the language term to a transformer.
The transformer syntax allows extra formals to be named to support passing this information.
For example, in the pass from the scheme to C compiler that converts the
\scheme{closures} form into explicit calls to procedure primitives, the closure
pointer, \scheme{cp}, and the list of free variables, \scheme{free*}, are passed
to the \scheme{Expr} transformer.
{\small
\schemedisplay
(define-pass expose-closure-prims : L12 (e) -> L13 ()
(Expr : Expr (e [cp #f] [free* '()]) -> Expr ()
(definitions
(define handle-closure-ref
(lambda (x cp free*)
(let loop ([free* free*] [i 0])
(cond
[(null? free*) x]
[(eq? x (car free*)) `(primcall closure-ref ,cp (quote ,i))]
[else (loop (cdr free*) (fx+ i 1))]))))
(define build-closure-set*
(lambda (x* l* f** cp free*)
(fold-left
(lambda (e* x l f*)
(let loop ([f* f*] [i 0] [e* e*])
(if (null? f*)
(cons `(primcall closure-code-set! ,x (label ,l)) e*)
(loop (cdr f*) (fx+ i 1)
(cons `(primcall closure-data-set! ,x (quote ,i)
,(handle-closure-ref (car f*) cp free*))
e*)))))
'()
x* l* f**))))
[(closures ([,x* ,l* ,f** ...] ...)
(labels ([,l2* ,[le*]] ...) ,[body]))
(let ([size* (map length f**)])
`(let ([,x* (primcall make-closure (quote ,size*))] ...)
(labels ([,l2* ,le*] ...)
(begin
,(build-closure-set* x* l* f** cp free*) ...
,body))))]
[,x (handle-closure-ref x cp free*)]
[((label ,l) ,[e*] ...) `((label ,l) ,e* ...)]
[(,[e] ,[e*] ...) `((primcall closure-code ,e) ,e* ...)])
(LabelsBody : LabelsBody (lbody) -> Expr ())
(LambdaExpr : LambdaExpr (le) -> LambdaExpr ()
[(lambda (,x ,x* ...) (free (,f* ...) ,[body x f* -> body]))
`(lambda (,x ,x* ...) ,body)]))
\endschemedisplay
}
\noindent
The catamorphism and clause autogeneration facilities are also aware of the extra
formals expected by transformers.
In a catamorphism, this means that extra arguments need not be specified in
the catamorphism, if the formals are available in the transformer.
For instance, in the \scheme{Expr} transformer,
the catamorphism specifies only the binding of the output \scheme{Expr} form,
and \scheme{define-pass} matches the name of the formal to the transformer with the
expected argument.
In the \scheme{LambdaExpr} transformer, the extra arguments need to be
specified, both because they are not available as a formal of the transformer
and because the values change at the \scheme{LambdaExpr} boundary.
Autogenerated clauses in \scheme{Expr} also call the
\scheme{Expr} transformer with the extra arguments from the formals.
The \scheme{expose-closure-prims} pass also specifies default values for the
extra arguments passed to the \scheme{Expr} transformer.
It defaults the \scheme{cp} variable to \scheme{#f} and the \scheme{free*}
variable to the empty list.
The default values will only be used in calls to the \scheme{Expr} transformer
when the no other value is available.
In this case, this happen only when the \scheme{Expr} transformer is first
called in the body of the pass.
This is consistent with the body of the program, which cannot contain any free
variables and hence does not need a closure pointer.
Once we begin processing within the body of a \scheme{lambda} we then have a
closure pointer, with the list of free variables, if any.
Sometimes it is also necessary for a pass to return more than one value.
The nanopass framework relies upon Scheme's built-in functionality for dealing
with returning of multiple return values.
To inform the nanopass framework that a given transformer is returning more
than one value, we use the signature to tell the framework both how many values
we are expecting to return, and what the default values should be when a clause
is autogenerated.
For instance, the \scheme{uncover-free} pass returns two values, the language
form and the list of free variables.
{\small
\schemedisplay
(define-pass uncover-free : L10 (e) -> L11 ()
(Expr : Expr (e) -> Expr (free*)
[(quote ,c) (values `(quote ,c) '())]
[,x (values x (list x))]
[(let ([,x* ,[e* free**]] ...) ,[e free*])
(values `(let ([,x* ,e*] ...) ,e)
(apply union (difference free* x*) free**))]
[(letrec ([,x* ,[le* free**]] ...) ,[body free*])
(values `(letrec ([,x* ,le*] ...) ,body)
(difference (apply union free* free**) x*))]
[(if ,[e0 free0*] ,[e1 free1*] ,[e2 free2*])
(values `(if ,e0 ,e1 ,e2) (union free0* free1* free2*))]
[(begin ,[e* free**] ... ,[e free*])
(values `(begin ,e* ... ,e) (apply union free* free**))]
[(primcall ,pr ,[e* free**]...)
(values `(primcall ,pr ,e* ...) (apply union free**))]
[(,[e free*] ,[e* free**] ...)
(values `(,e ,e* ...) (apply union free* free**))])
(LambdaExpr : LambdaExpr (le) -> LambdaExpr (free*)
[(lambda (,x* ...) ,[body free*])
(let ([free* (difference free* x*)])
(values `(lambda (,x* ...) (free (,free* ...) ,body)) free*))])
(let-values ([(e free*) (Expr e)])
(unless (null? free*) (error who "found unbound variables" free*))
e))
\endschemedisplay
}
Transformers can also be written that handle terminals instead of nonterminals.
Because terminals have no structure, the body of such transformers is simply a
Scheme expression.
The Scheme to C compiler does not make use of this feature, but we could
imagine a pass where references to variables are replaced with already
specified locations, such as the following pass:
{\small
\schemedisplay
(define-pass replace-variable-refereces : L23 (x) -> L24 ()
(uvar-reg-fv : symbol (x env) -> location ()
(cond [(and (uvar? x) (assq x env)) => cdr] [else x]))
(SimpleExpr : SimpleExpr (x env) -> Triv ())
(Rhs : Rhs (x env) -> Rhs ())
(Pred : Pred (x env) -> Pred ())
(Effect : Effect (x env) -> Effect ())
(Value : Value (x env) -> Value ())
(LocalsBody : LocalsBody (x) -> Value ()
[(finished ([,x* ,loc*] ...) ,vbody) (Value vbody (map cons x* loc*))]))
\endschemedisplay
}
\noindent
The two interesting parts of this pass are the \scheme{LocalsBody} transformer
that creates the environment that maps variables to locations and the
\scheme{uvar-reg-fv} transformer that replaces variables with the appropriate
location.
In this pass, transformers cannot be autogenerated because extra arguments are
needed, and the nanopass framework only autogenerates transformers without extra
arguments or return values.
The autogeneration is limited to help reign in some of the unpredictable
behavior that can result from autogenerated transformers.
Passes can also be written that do not take a language form but that produce a
language form.
The initial parser for the Scheme to C compiler is a good example of this.
It expects an S-expression that conforms to an input grammar for the subset of
Scheme supported by the compiler.
{\small
\schemedisplay
(define-pass parse-and-rename : * (e) -> Lsrc ()
(definitions
(define process-body
(lambda (who env body* f)
(when (null? body*) (error who "invalid empty body"))
(let loop ([body (car body*)] [body* (cdr body*)] [rbody* '()])
(if (null? body*)
(f (reverse rbody*) (Expr body env))
(loop (car body*) (cdr body*)
(cons (Expr body env) rbody*))))))
(define vars-unique?
(lambda (fmls)
(let loop ([fmls fmls])
(or (null? fmls)
(and (not (memq (car fmls) (cdr fmls)))
(loop (cdr fmls)))))))
(define unique-vars
(lambda (env fmls f)
(unless (vars-unique? fmls)
(error 'unique-vars "invalid formals" fmls))
(let loop ([fmls fmls] [env env] [rufmls '()])
(if (null? fmls)
(f env (reverse rufmls))
(let* ([fml (car fmls)] [ufml (unique-var fml)])
(loop (cdr fmls) (cons (cons fml ufml) env)
(cons ufml rufmls)))))))
(define process-bindings
(lambda (rec? env bindings f)
(let loop ([bindings bindings] [rfml* '()] [re* '()])
(if (null? bindings)
(unique-vars env rfml*
(lambda (new-env rufml*)
(let ([env (if rec? new-env env)])
(let loop ([rufml* rufml*]
[re* re*]
[ufml* '()]
[e* '()])
(if (null? rufml*)
(f new-env ufml* e*)
(loop (cdr rufml*) (cdr re*)
(cons (car rufml*) ufml*)
(cons (Expr (car re*) env) e*)))))))
(let ([binding (car bindings)])
(loop (cdr bindings) (cons (car binding) rfml*)
(cons (cadr binding) re*)))))))
(define Expr*
(lambda (e* env)
(map (lambda (e) (Expr e env)) e*)))
(with-output-language (Lsrc Expr)
(define build-primitive
(lambda (as)
(let ([name (car as)] [argc (cdr as)])
(cons name
(if (< argc 0)
(error who
"primitives with arbitrary counts are not currently supported"
name)
(lambda (env . e*)
(if (= (length e*) argc)
`(,name ,(Expr* e* env) ...)
(error name "invalid argument count"
(cons name e*)))))))))
(define initial-env
(cons*
(cons 'quote (lambda (env d)
(unless (datum? d)
(error 'quote "invalid datum" d))
`(quote ,d)))
(cons 'if (case-lambda
[(env e0 e1) `(if ,(Expr e0 env) ,(Expr e1 env))]
[(env e0 e1 e2)
`(if ,(Expr e0 env) ,(Expr e1 env) ,(Expr e2 env))]
[x (error 'if (if (< (length x) 3)
"too few arguments"
"too many arguments")
x)]))
(cons 'or (lambda (env . e*) `(or ,(Expr* e* env) ...)))
(cons 'and (lambda (env . e*) `(and ,(Expr* e* env) ...)))
(cons 'not (lambda (env e) `(not ,(Expr e env))))
(cons 'begin (lambda (env . e*)
(process-body env e*
(lambda (e* e)
`(begin ,e* ... ,e)))))
(cons 'lambda (lambda (env fmls . body*)
(unique-vars env fmls
(lambda (env fmls)
(process-body 'lambda env body*
(lambda (body* body)
`(lambda (,fmls ...)
,body* ... ,body)))))))
(cons 'let (lambda (env bindings . body*)
(process-bindings #f env bindings
(lambda (env x* e*)
(process-body 'let env body*
(lambda (body* body)
`(let ([,x* ,e*] ...) ,body* ... ,body)))))))
(cons 'letrec (lambda (env bindings . body*)
(process-bindings #t env bindings
(lambda (env x* e*)
(process-body 'letrec env body*
(lambda (body* body)
`(letrec ([,x* ,e*] ...)
,body* ... ,body)))))))
(cons 'set! (lambda (env x e)
(cond
[(assq x env) =>
(lambda (as)
(let ([v (cdr as)])
(if (symbol? v)
`(set! ,v ,(Expr e env))
(error 'set! "invalid syntax"
(list 'set! x e)))))]
[else (error 'set! "set to unbound variable"
(list 'set! x e))])))
(map build-primitive user-prims)))
;;; App - helper for handling applications.
(define App
(lambda (e env)
(let ([e (car e)] [e* (cdr e)])
`(,(Expr e env) ,(Expr* e* env) ...))))))
(Expr : * (e env) -> Expr ()
(cond
[(pair? e)
(cond
[(assq (car e) env) =>
(lambda (as)
(let ([v (cdr as)])
(if (procedure? v)
(apply v env (cdr e))
(App e env))))]
[else (App e env)])]
[(symbol? e)
(cond
[(assq e env) =>
(lambda (as)
(let ([v (cdr as)])
(cond
[(symbol? v) v]
[(primitive? e) e]
[else (error who "invalid syntax" e)])))]
[else (error who "unbound variable" e)])]
[(constant? e) e]
[else (error who "invalid expression" e)]))
(Expr e initial-env))
\endschemedisplay
}
\noindent
The \scheme{parse-and-rename} pass is structured similarly to a simple expander with
keywords and primitives.\footnote{It could easily be extended to handle simple macros, in this case, just the fixed \scheme{and} macro,
\scheme{or} macro, and \scheme{not} macro would be available.}
It also performs syntax checking to ensure that the input grammar conforms to
the expected input grammar.
Finally, it produces an \scheme{Lsrc} language term that represents the Scheme
program to be compiled.
In the pass syntax, the \scheme{*} in place of the input-language name indicates
that no input-language term should be expected.
The \scheme{Expr} and \scheme{Application} transformers do not have pattern
matching clauses, as the input could be of any form.
The quasiquote is, however, rebound because an output language is specified.
It can also be useful to create passes without an output language.
The final pass of the Scheme to C compiler is the code generator that emits C
code.
{\small
\schemedisplay
(define-pass generate-c : L22 (e) -> * ()
(definitions
(define string-join
(lambda (str* jstr)
(cond
[(null? str*) ""]
[(null? (cdr str*)) (car str*)]
[else (string-append (car str*) jstr (string-join (cdr str*) jstr))])))
(define symbol->c-id
(lambda (sym)
(let ([ls (string->list (symbol->string sym))])
(if (null? ls)
"_"
(let ([fst (car ls)])
(list->string
(cons
(if (char-alphabetic? fst) fst #\_)
(map (lambda (c)
(if (or (char-alphabetic? c)
(char-numeric? c))
c
#\_))
(cdr ls)))))))))
(define format-function-header
(lambda (l x*)
(format "ptr ~a(~a)" l
(string-join
(map
(lambda (x)
(format "ptr ~a" (symbol->c-id x)))
x*)
", "))))
(define format-label-call
(lambda (l se*)
(format " ~a(~a)" (symbol->c-id l)
(string-join
(map (lambda (se)
(format "(ptr)~a" (format-simple-expr se)))
se*)
", "))))
(define format-general-call
(lambda (se se*)
(format "((ptr (*)(~a))~a)(~a)"
(string-join (make-list (length se*) "ptr") ", ")
(format-simple-expr se)
(string-join
(map (lambda (se)
(format "(ptr)~a" (format-simple-expr se)))
se*)
", "))))
(define format-binop
(lambda (op se0 se1)
(format "((long)~a ~a (long)~a)"
(format-simple-expr se0)
op
(format-simple-expr se1))))
(define format-set!
(lambda (x rhs)
(format "~a = (ptr)~a" (symbol->c-id x) (format-rhs rhs)))))
(emit-function-decl : LambdaExpr (le l) -> * ()
[(lambda (,x* ...) ,lbody)
(printf "~a;~%" (format-function-header l x*))])
(emit-function-def : LambdaExpr (le l) -> * ()
[(lambda (,x* ...) ,lbody)
(printf "~a {~%" (format-function-header l x*))
(emit-function-body lbody)
(printf "}~%~%")])
(emit-function-body : LocalsBody (lbody) -> * ()
[(locals (,x* ...) ,body)
(for-each (lambda (x) (printf " ptr ~a;~%" (symbol->c-id x))) x*)
(emit-value body x*)])
(emit-value : Value (v locals*) -> * ()
[(if ,p0 ,v1 ,v2)
(printf " if (~a) {~%" (format-predicate p0))
(emit-value v1 locals*)
(printf " } else {~%")
(emit-value v2 locals*)
(printf " }~%")]
[(begin ,e* ... ,v)
(for-each emit-effect e*)
(emit-value v locals*)]
[,rhs (printf " return (ptr)~a;\n" (format-rhs rhs))])
(format-predicate : Predicate (p) -> * (str)
[(if ,p0 ,p1 ,p2)
(format "((~a) ? (~a) : (~a))"
(format-predicate p0)
(format-predicate p1)
(format-predicate p2))]
[(<= ,se0 ,se1) (format-binop "<=" se0 se1)]
[(< ,se0 ,se1) (format-binop "<" se0 se1)]
[(= ,se0 ,se1) (format-binop "==" se0 se1)]
[(true) "1"]
[(false) "0"]
[(begin ,e* ... ,p)
(string-join
(fold-right (lambda (e s*) (cons (format-effect e) s*))
(list (format-predicate p)) e*)
", ")])
(format-effect : Effect (e) -> * (str)
[(if ,p0 ,e1 ,e2)
(format "((~a) ? (~a) : (~a))"
(format-predicate p0)
(format-effect e1)
(format-effect e2))]
[((label ,l) ,se* ...) (format-label-call l se*)]
[(,se ,se* ...) (format-general-call se se*)]
[(set! ,x ,rhs) (format-set! x rhs)]
[(nop) "0"]
[(begin ,e* ... ,e)
(string-join
(fold-right (lambda (e s*) (cons (format-effect e) s*))
(list (format-effect e)) e*)
", ")]
[(mset! ,se0 ,se1? ,i ,se2)
(if se1?
(format "((*((ptr*)((long)~a + (long)~a + ~d))) = (ptr)~a)"
(format-simple-expr se0) (format-simple-expr se1?)
i (format-simple-expr se2))
(format "((*((ptr*)((long)~a + ~d))) = (ptr)~a)"
(format-simple-expr se0) i (format-simple-expr se2)))])
(format-simple-expr : SimpleExpr (se) -> * (str)
[,x (symbol->c-id x)]
[,i (number->string i)]
[(label ,l) (format "(*~a)" (symbol->c-id l))]
[(logand ,se0 ,se1) (format-binop "&" se0 se1)]
[(shift-right ,se0 ,se1) (format-binop ">>" se0 se1)]
[(shift-left ,se0 ,se1) (format-binop "<<" se0 se1)]
[(divide ,se0 ,se1) (format-binop "/" se0 se1)]
[(multiply ,se0 ,se1) (format-binop "*" se0 se1)]
[(subtract ,se0 ,se1) (format-binop "-" se0 se1)]
[(add ,se0 ,se1) (format-binop "+" se0 se1)]
[(mref ,se0 ,se1? ,i)
(if se1?
(format "(*((ptr)((long)~a + (long)~a + ~d)))"
(format-simple-expr se0)
(format-simple-expr se1?) i)
(format "(*((ptr)((long)~a + ~d)))" (format-simple-expr se0) i))])
;; prints expressions in effect position into C statements
(emit-effect : Effect (e) -> * ()
[(if ,p0 ,e1 ,e2)
(printf " if (~a) {~%" (format-predicate p0))
(emit-effect e1)
(printf " } else {~%")
(emit-effect e2)
(printf " }~%")]
[((label ,l) ,se* ...) (printf " ~a;\n" (format-label-call l se*))]
[(,se ,se* ...) (printf " ~a;\n" (format-general-call se se*))]
[(set! ,x ,rhs) (printf " ~a;\n" (format-set! x rhs))]
[(nop) (if #f #f)]
[(begin ,e* ... ,e)
(for-each emit-effect e*)
(emit-effect e)]
[(mset! ,se0 ,se1? ,i ,se2)
(if se1?
(printf "(*((ptr*)((long)~a + (long)~a + ~d))) = (ptr)~a;\n"
(format-simple-expr se0) (format-simple-expr se1?)
i (format-simple-expr se2))
(printf "(*((ptr*)((long)~a + ~d))) = (ptr)~a;\n"
(format-simple-expr se0) i (format-simple-expr se2)))])
;; formats the right-hand side of a set! into a C expression
(format-rhs : Rhs (rhs) -> * (str)
[((label ,l) ,se* ...) (format-label-call l se*)]
[(,se ,se* ...) (format-general-call se se*)]
[(alloc ,i ,se)
(if (use-boehm?)
(format "(ptr)((long)GC_MALLOC(~a) + ~dl)"
(format-simple-expr se) i)
(format "(ptr)((long)malloc(~a) + ~dl)"
(format-simple-expr se) i))]
[,se (format-simple-expr se)])
;; emits a C program for our progam expression
(Program : Program (p) -> * ()
[(labels ([,l* ,le*] ...) ,l)
(let ([l (symbol->c-id l)] [l* (map symbol->c-id l*)])
(define-syntax emit-include
(syntax-rules ()
[(_ name) (printf "#include <~s>\n" 'name)]))
(define-syntax emit-predicate
(syntax-rules ()
[(_ PRED_P mask tag)
(emit-c-macro PRED_P (x) "(((long)x & ~d) == ~d)" mask tag)]))
(define-syntax emit-eq-predicate
(syntax-rules ()
[(_ PRED_P rep)
(emit-c-macro PRED_P (x) "((long)x == ~d)" rep)]))
(define-syntax emit-c-macro
(lambda (x)
(syntax-case x()
[(_ NAME (x* ...) fmt args ...)
#'(printf "#define ~s(~a) ~a\n" 'NAME
(string-join (map symbol->string '(x* ...)) ", ")
(format fmt args ...))])))
;; the following printfs output the tiny C runtime we are using
;; to wrap the result of our compiled Scheme program.
(emit-include stdio.h)
(if (use-boehm?)
(emit-include gc.h)
(emit-include stdlib.h))
(emit-predicate FIXNUM_P fixnum-mask fixnum-tag)
(emit-predicate PAIR_P pair-mask pair-tag)
(emit-predicate BOX_P box-mask box-tag)
(emit-predicate VECTOR_P vector-mask vector-tag)
(emit-predicate PROCEDURE_P closure-mask closure-tag)
(emit-eq-predicate TRUE_P true-rep)
(emit-eq-predicate FALSE_P false-rep)
(emit-eq-predicate NULL_P null-rep)
(emit-eq-predicate VOID_P void-rep)
(printf "typedef long* ptr;\n")
(emit-c-macro FIX (x) "((long)x << ~d)" fixnum-shift)
(emit-c-macro UNFIX (x) "((long)x >> ~d)" fixnum-shift)
(emit-c-macro UNBOX (x) "((ptr)*((ptr)((long)x - ~d)))" box-tag)
(emit-c-macro VECTOR_LENGTH_S (x) "((ptr)*((ptr)((long)x - ~d)))" vector-tag)
(emit-c-macro VECTOR_LENGTH_C (x) "UNFIX(VECTOR_LENGTH_S(x))")
(emit-c-macro VECTOR_REF (x i) "((ptr)*((ptr)((long)x - ~d + ((i+1) * ~d))))"
vector-tag word-size)
(emit-c-macro CAR (x) "((ptr)*((ptr)((long)x - ~d)))" pair-tag)
(emit-c-macro CDR (x) "((ptr)*((ptr)((long)x - ~d + ~d)))" pair-tag word-size)
(printf "void print_scheme_value(ptr x) {\n")
(printf " long i, veclen;\n")
(printf " ptr p;\n")
(printf " if (TRUE_P(x)) {\n")
(printf " printf(\"#t\");\n")
(printf " } else if (FALSE_P(x)) {\n")
(printf " printf(\"#f\");\n")
(printf " } else if (NULL_P(x)) {\n")
(printf " printf(\"()\");\n")
(printf " } else if (VOID_P(x)) {\n")
(printf " printf(\"(void)\");\n")
(printf " } else if (FIXNUM_P(x)) {\n")
(printf " printf(\"%ld\", UNFIX(x));\n")
(printf " } else if (PAIR_P(x)) {\n")
(printf " printf(\"(\");\n")
(printf " for (p = x; PAIR_P(p); p = CDR(p)) {\n")
(printf " print_scheme_value(CAR(p));\n")
(printf " if (PAIR_P(CDR(p))) { printf(\" \"); }\n")
(printf " }\n")
(printf " if (NULL_P(p)) {\n")
(printf " printf(\")\");\n")
(printf " } else {\n")
(printf " printf(\" . \");\n")
(printf " print_scheme_value(p);\n")
(printf " printf(\")\");\n")
(printf " }\n")
(printf " } else if (BOX_P(x)) {\n")
(printf " printf(\"#(box \");\n")
(printf " print_scheme_value(UNBOX(x));\n")
(printf " printf(\")\");\n")
(printf " } else if (VECTOR_P(x)) {\n")
(printf " veclen = VECTOR_LENGTH_C(x);\n")
(printf " printf(\"#(\");\n")
(printf " for (i = 0; i < veclen; i += 1) {\n")
(printf " print_scheme_value(VECTOR_REF(x,i));\n")
(printf " if (i < veclen) { printf(\" \"); } \n")
(printf " }\n")
(printf " printf(\")\");\n")
(printf " } else if (PROCEDURE_P(x)) {\n")
(printf " printf(\"#(procedure)\");\n")
(printf " }\n")
(printf "}\n")
(map emit-function-decl le* l*)
(map emit-function-def le* l*)
(printf "int main(int argc, char * argv[]) {\n")
(printf " print_scheme_value(~a());\n" l)
(printf " printf(\"\\n\");\n")
(printf " return 0;\n")
(printf "}\n"))]))
\endschemedisplay
}
\noindent
Again, a \scheme{*} is used to indicate that there is no language form in this
case for the output language.
The C code is printed to the standard output port.
Thus, there is no need
for any return value from this pass.
Passes can also return a value that is not a language form.
For instance, we could write the \scheme{simple?} predicate from \scheme{purify-letrec} pass as its own pass, rather than using the \scheme{nanopass-case} form.
It would look something like the following:
{\small
\schemedisplay
(define-pass simple? : (L8 Expr) (e bound* assigned*) -> * (bool)
(simple? : Expr (e) -> * (bool)
[(quote ,c) #t]
[,x (not (or (memq x bound*) (memq x assigned*)))]
[(primcall ,pr ,e* ...)
(and (effect-free-prim? pr) (for-all simple? e*))]
[(begin ,e* ... ,e) (and (for-all simple? e*) (simple? e))]
[(if ,e0 ,e1 ,e2) (and (simple? e0) (simple? e1) (simple? e2))]
[else #f])
(simple? e))
\endschemedisplay
}
\noindent
Here, the extra return value is indicated as \scheme{bool}.
The \scheme{bool} here is used to indicate to \scheme{define-pass} that an
extra value is being returned.
Any expression can be used in this position.
In this case, the \scheme{bool} identifier will simply be an unbound variable
if it is ever manifested.
It is not manifested in this case, however, because the body is explicitly
specified; thus, no code will be autogenerated for the body of the pass.
\subsection{The {\tt define-pass} syntactic form\label{sec:pass-syntax}}
The \scheme{define-pass} form has the following syntax.
{\small
\schemedisplay
(define-pass \var{name} : \var{lang-specifier} (\var{fml} ...) -> \var{lang-specifier} (\var{extra-return-val-expr} ...)
\var{definitions-clause}
\var{transformer-clause} ...
\var{body-expr} ...)
\endschemedisplay
}
\noindent
where \var{name} is an identifier to use as the name for the procedure
definition.
The \var{lang-specifier} has one of the following forms:
{\small
\schemedisplay
*
\var{lang-name}
(\var{lang-name} \var{nonterminal-name})
\endschemedisplay
}
\noindent
where
\begin{itemize}
\item \var{lang-name} refers to a language defined with the
\scheme{define-language} form, and
\item \var{nonterminal-name} refers to a nonterminal named within the language
definition.
\end{itemize}
When the \scheme{*} form is used as the input \var{lang-specifier}, it indicates
that the pass does not expect an input-language term.
When there is no input language, the transformers within the pass do not have
clauses with pattern matches because, without an input language, the \scheme{define-pass} macro
does not know what the structure of the input term will be.
When the \scheme{*} form is used as the output \var{lang-specifier}, it
indicates that the pass does not produce an output-language term and should not
be checked.
When there is no output language, the transformers within the pass do not bind
\scheme{quasiquote}, and there are no templates on the right-hand side of the
transformer matches.
It is possible to use the \scheme{*} specifier for both the input and output
\var{lang-specifier}.
This effectively turns the pass, and the transformers contained within it, into an
ordinary Scheme function.
When the \var{lang-name} form is used as the input \var{lang-specifier}, it
indicates that the pass expects an input-language term that is one of the
productions from the entry nonterminal.
When the \var{lang-name} form is used as the output \var{lang-specifier}, it
indicates that the pass expects that an output-language term will be produced and
checked to be one of the records that represents a production of the entry
nonterminal.
When the (\var{lang-name} \var{nonterminal-name}) form is used as the
input-language specifier, it indicates that the input-language term will be a
production from the specified nonterminal in the specified input language.
When the (\var{lang-name} \var{nonterminal-name}) form is used as the
output-language specifier, it indicates that the pass will produce an output
production from the specified nonterminal of the specified output language.
The \var{fml} is a Scheme identifier, and if the input \var{lang-specifier} is
not \scheme{*}, the first \var{fml} refers to the input-language term.
The \var{extra-return-val-expr} is any valid Scheme expression that is valid in value context.
These expressions are scoped within the binding of the identifiers named as
\var{fml}s.
The optional \var{definitions-clause} has the following form:
{\small
\schemedisplay
(definitions \var{scheme-definition} ...)
\endschemedisplay
}
\noindent
where \var{scheme-definition} is any Scheme expression that can be used in
definition context.
Definitions in the \var{definitions-clause} are in the same lexical scope as
the transformers, which means that procedures and macros defined in the
\var{definitions-clause} can refer to any transformer named in a
\var{transformer-clause}.
The \var{definitions-clause} is followed by zero or more
\var{transformer-clauses}s of the following form:
{\small
\schemedisplay
(\var{name} : \var{nt-specifier} (\var{fml-expr} ...) -> \var{nt-specifier} (\var{extra-return-val-expr} ...)
\var{definitions-clause}?
\var{transformer-body})
\endschemedisplay
}
\noindent
where \var{name} is a Scheme identifier that can be used to refer to the transformer within the pass.
The input \var{nt-specifier} is one of the following two forms:
{\small
\schemedisplay
*
\var{nonterminal-name}
\endschemedisplay
}
\noindent
When the \scheme{*} form is used as the input nonterminal, it indicates that no
input nonterminal form is expected and that the body of the
\var{transformer-body} will not contain pattern matching clauses.
When the \scheme{*} form is used as the output nonterminal, \scheme{quasiquote}
will not be rebound, and no output-language templates are available.
When both the input and output \var{nt-specifier} are \scheme{*}, the
transformer is effectively an ordinary Scheme procedure.
The \var{fml-expr} has one of the following two forms:
{\small
\schemedisplay
\var{fml}
[\var{fml} \var{default-val-expr}]
\endschemedisplay
}
\noindent
where \var{fml} is a Scheme identifier and \var{default-val-expr} is a Scheme
expression.
The \var{default-val-expr} is used when an argument is not specified in a
catamorphism or when a matching \scheme{fml} is not available in the calling
transformer.
All arguments must be explicitly provided when the transformer is called as an
ordinary Scheme procedure.
Using the catamorphism syntax, the arguments can be explicitly supplied, using
the syntax discussed on page~\pageref{cata:syntax}.
It can also be specified implicitly.
Arguments are filled in implicitly in catamorphisms that do not explicitly
provide the arguments and in autogenerated clauses when the nonterminal
elements of a production are processed.
These implicitly supplied formals are handled by looking for a formal in the
calling transformer that has the same name as the formal expected by the target
transformer.
If no matching formal is found, and the target transformer specifies a default
value, the default value will be used in the call; otherwise, another target
transformer must be found, a new transformer must be autogenerated, or an
exception must be raised to indicate that no transformer was found and none can
be autogenerated.
The \var{extra-return-val-expr} can be any Scheme expression.
These expressions are scoped within the \var{fml}s bound by the transformer.
This allows an input formal to be returned as an extra return value, implicitly
in the autogenerated clauses.
This can be useful for threading values through a transformer.
The optional \var{definitions-clause} can include any Scheme expression that
can be placed in a definition context.
These definitions are scoped within the transformer.
When an output nonterminal is specified, the \scheme{quasiquote} is also bound
within the body of the \scheme{definitions} clause to allow language term
templates to be included in the body of the definitions.
When the input \var{nt-specifier} is not \scheme{*}, the
\var{transformer-body} has one of the following forms:
{\small
\schemedisplay
[\var{pattern} \var{guard-clause} \var{body*} ... \var{body}]
[\var{pattern} \var{body*} ... \var{body}]
[else \var{body*} ... \var{body}]
\endschemedisplay
}
\noindent
where the \scheme{else} clause must be the last one listed in a transformer and
prevents autogeneration of missing clauses (because the \scheme{else} clause is
used in place of the autogenerated clauses).
The \var{pattern} is an S-expression pattern, based on the S-expression
productions used in the language definition.
Patterns can be arbitrarily nested.
Variables bound by the pattern are preceded by an \scheme{unquote} and are
named based on the meta-variables named in the language definition.
The variable name can be used to restrict the pattern by using a meta-variable
that is more specific than the one specified in the language definition.
The \var{pattern} can also contain catamorphisms that have one of the
following forms:
{\small
\label{cata:syntax}
\schemedisplay
[\var{Proc-expr} : \var{input-fml} \var{arg} ... -> \var{output-fml} \var{extra-rv-fml} ...]
[\var{Transformer-name} : \var{output-fml} \var{extra-rv-fml} ...]
[\var{input-fml} \var{arg} ... -> \var{output-fml} \var{extra-rv-fml} ...]
[\var{output-fml} \var{extra-rv-fml} ...]
\endschemedisplay
}
\noindent
In the first form, the \var{Proc-expr} is an explicitly specified procedure
expression, the \var{input-fml} and all arguments to the procedure are explicitly specified, and the results of calling the \var{Proc-expr} are bound by the \var{output-fml} and \var{extra-rv-fml}s.
Note that the \var{Proc-expr} may be a \var{Transformer-name}.
In the second form, the \var{Transformer-name} is an identifier that refers to
a transformer named in this pass.
The \scheme{define-pass} macro determines, based on the signature of the
transformer referred to by the \var{Transformer-name}, what arguments should be
supplied to the transformer.
In the last two forms, the transformer is determined automatically.
In the third form, the nonterminal type associated with the \var{input-fml},
the \var{arg}s, the output nonterminal type based on the \var{output-fml}, and
the \var{extra-rv-fml}s are used to determine the transformer to call.
In the final form, the nonterminal type for the field within the production,
along with the formals to the calling transformer, the output nonterminal type
based on the \var{output-fml}, and the \var{extra-rv-fml}s are used to
determine the transformer to call.
In the two forms where the transformer is not explicitly named, a new
transformer can be autogenerated when no \var{arg}s and no \var{extra-rv-fml}s
are specified.
This limitation is in place to avoid creating a transformer with extra formals
whose use is unspecified and extra return values with potentially dubious
return-value expressions.
The \var{input-fml} is a Scheme identifier with a name based on the
meta-variables named in the input-language definition.
The specification of a more restrictive meta-variable name can be used to further
restrict the pattern.
The \var{output-fml} is a Scheme identifier with a name based on the
meta-variables named in the output-language definition.
The \var{extra-rv-fml} is a Scheme identifier.
The \var{input-fml}s named in the fields of a pattern must be unique.
The \var{output-fml}s and \var{extra-rv-fml}s must also be unique, although they
can overlap with the \var{input-fml}s that are shadowed in the body by
the \var{output-fml} or \var{extra-rv-fml} with the same name.
Only the \var{input-fml}s are visible within the optional \var{guard-clause}.
This is because the \var{guard-clause} is evaluated before the catamorphisms
recur on the fields of a production.
The \var{guard-clause} has the following form:
{\small
\schemedisplay
(guard \var{guard-expr} ...)
\endschemedisplay
}
\noindent
where \var{guard-expr} is a Scheme expression.
The \var{guard-clause} has the same semantics as \scheme{and}.
The \var{body*} and \var{body} are any Scheme expression.
When the output \var{nt-specifier} is not \scheme{*},
\scheme{quasiquote} is rebound to a macro that interprets \scheme{quasiquote}
expressions as templates for productions in the output nonterminal.
Additionally, \scheme{in-context} is a macro that can be used to rebind
\scheme{quasiquote} to a different nonterminal.
Templates are specified as S-expressions based on the productions specified by
the output language.
In templates, \scheme{unquote} is used to indicate that the expression in the
\scheme{unquote} should be used to fill in the given field of the production.
Within an \scheme{unquote} expression, \scheme{quasiquote} is rebound to the
appropriate nonterminal based on the expected type of the field in the
production.
If the template includes items that are not \scheme{unquote}d where a field
value is expected, the expression found there is automatically quoted.
This allows self-evaluating items such as symbols, booleans, and numbers to be
more easily specified in templates.
A list of items can be specified in a field that expects a list, using an
ellipsis.
%More than one ellipsis can be specified to flatten out a list of lists.
Although the syntax of a language production is specified as an S-expression,
the record representation used for the language term separates each variable
specified into a separate field.
This means that the template syntax expects a separate value or list of values for
each field in the record.
For instance, in the \scheme{(letrec ([x* e*] ...) body)} production,
a template of the form
\scheme{(letrec (,bindings ...) ,body)} cannot be used
because the nanopass framework will not attempt to break up the
\scheme{bindings} list into its \scheme{x*} and \scheme{e*} component parts.
The template
\scheme{(letrec ([,(map car bindings) ,(map cadr bindings)] ...) ,body)}
accomplishes the same goal, explicitly separating the variables from the expressions.
It is possible that the nanopass framework could be extended to perform the task of
splitting up the \scheme{binding*} list automatically, but it is not done
currently, partially to avoid hiding the cost of deconstructing the
\scheme{binding*} list and constructing the \scheme{x*} and \scheme{e*} lists.
The \scheme{in-context} expression within the body has the following form:
{\small
\schemedisplay
(in-context \var{nonterminal-name} \var{body*} ... \var{body})
\endschemedisplay
}
The \scheme{in-context} form rebinds the \scheme{quasiquote} to allow
productions from the named nonterminal to be constructed in a context where
they are not otherwise expected.
\chapter{Working with language forms}
\section{Constructing language forms outside of a pass}
In addition to creating language forms using a parser defined with
\scheme{define-parser} or through a pass defined with \scheme{define-pass},
language forms can also be created using the
\scheme{with-output-language} form.
The \scheme{with-output-language} form binds the \scheme{in-context}
transformer for the language specified and, if a nonterminal is also specified,
binds the \scheme{quasiquote} form.
This allows the same template syntax used in the body of a transformer to be
used outside of the context of a pass.
In a commercial compiler, such as Chez Scheme, it is often convenient to use
functional abstraction to centralize the creation of a language term.
For instance, in the \scheme{convert-assignments} pass, the
\scheme{with-output-languge} form is wrapped around the \scheme{make-boxes} and
\scheme{build-let} procedures.
This is done so that primitive calls to \scheme{box} along with the \scheme{let} form of the \scheme{L10} language can be constructed with quasiquoted expressions.
{\small
\schemedisplay
(with-output-language (L10 Expr)
(define make-boxes
(lambda (t*)
(map (lambda (t) `(primcall box ,t)) t*)))
(define build-let
(lambda (x* e* body)
(if (null? x*)
body
`(let ([,x* ,e*] ...) ,body)))))
\endschemedisplay
}
\noindent
This rebinds both the \scheme{quasiquote} keyword and the \scheme{in-context} keyword.
The \scheme{with-output-language} form has one of the following forms:
{\small
\schemedisplay
(with-output-language \var{lang-name} \var{expr*} ... \var{expr})
(with-output-language (\var{lang-name} \var{nonterminal-name}) \var{expr*} ... \var{expr})
\endschemedisplay
}
\noindent
In the first form, the \scheme{in-context} form is bound and can be used to
specify a \var{nonterminal-name}, as described at the end of
Section~\ref{sec:define-pass}.
In the second form, both \scheme{in-context} and \scheme{quasiquote} are bound.
The \scheme{quasiquote} form is bound in the context of the specified
\var{nonterminal-name}, and templates can be defined just as they are on the
right-hand side of a transformer clause.
The \scheme{with-output-language} form is a splicing form, similar to \scheme{begin}
or \scheme{let-syntax}, allowing multiple definitions or expressions
that are all at the same scoping level as the
\scheme{with-output-language} form to be contained within the form.
This is convenient when writing a set of definitions that all construct some
piece of a language term from the same nonterminal.
This flexibility means that the \scheme{with-output-language} form cannot be
defined as syntactic sugar for the \scheme{define-pass} form.
\section{Matching language forms outside of a pass}
In addition to the \scheme{define-pass} form, it is possible to match a
language term using the \scheme{nanopass-case} form.
This can be useful when creating functional abstractions, such as predicates that
ask a question based on matching a language form.
For instance, suppose we write a \scheme{lambda?} predicate for the
\scheme{L8} language as follows:
{\small
\schemedisplay
(define lambda?
(lambda (e)
(nanopass-case (L8 Expr) e
[(lambda (,x* ...) ,abody) #t]
[else #f])))
\endschemedisplay
}
\noindent
The \scheme{nanopass-case} form has the following syntax:
{\small
\schemedisplay
(nanopass-case (\var{lang-name} \var{nonterminal-name}) \var{expr}
\var{matching-clause} ...)
\endschemedisplay
}
\noindent
where \var{matching-clause} has one of the following forms:
{\small
\schemedisplay
[\var{pattern} \var{guard-clause} \var{expr*} ... \var{expr}]
[\var{pattern} \var{expr*} ... \var{expr}]
[else \var{expr*} ... \var{expr}]
\endschemedisplay
}
\noindent
where the \var{pattern} and \var{guard-clause} forms have the same syntax as in
the \var{transformer-body} of a pass.
Similar to \scheme{with-output-language}, \scheme{nanopass-case} provides a
more succinct syntax for matching a language form than does the general
\scheme{define-pass} form.
Unlike the \scheme{with-output-language} form, however, the
\scheme{nanopass-case} form can be implemented in terms of the
\scheme{define-pass} form.
For example, the \scheme{lambda?} predicate also could have been written as:
{\small
\schemedisplay
(define-pass lambda? : (L8 Expr) (e) -> * (bool)
(Expr : Expr (e) -> * (bool)
[(lambda (,x* ...) ,abody) #t]
[else #f])
(Expr e))
\endschemedisplay
}
\noindent
This is, in fact, how the \scheme{nanopass-case} macro is implemented.
\chapter{Working with languages}
\section{Displaying languages}
The \scheme{language->s-expression} form can be used to print the full definition of a language by supplying it the language
name to be printed.
This can be helpful when working with extended languages, such as in the case of
\scheme{L1}:
{\small
\schemedisplay
(language->s-expression L1)
\endschemedisplay
}
\noindent
which returns:
{\small
\schemedisplay
(define-language L1
(entry Expr)
(terminals
(void+primitive (pr))
(symbol (x))
(constant (c))
(datum (d)))
(Expr (e body)
pr
x
c
(quote d)
(if e0 e1 e2)
(or e* ...)
(and e* ...)
(not e)
(begin e* ... e)
(lambda (x* ...) body* ... body)
(let ([x* e*] ...) body* ... body)
(letrec ([x* e*] ...) body* ... body)
(set! x e)
(e e* ...)))
\endschemedisplay
}
\section{Differencing languages}
The extension form can also be derived between any two languages by using the
\scheme{diff-languages} form.
For instance, we can get the differences between the \scheme{Lsrc} and
\scheme{L1} language (giving us back the extension) with:
{\small
\schemedisplay
(diff-languages Lsrc L1)
\endschemedisplay
}
\noindent
which returns:
{\small
\schemedisplay
(define-language L1
(extends Lsrc)
(entry Expr)
(terminals
(- (primitive (pr)))
(+ (void+primitive (pr))))
(Expr (e body)
(- (if e0 e1))))
\endschemedisplay
}
\section{Viewing the expansion of passes and transformers}
The \scheme{define-pass} form autogenerates both transformers and clauses
within transformers.
In simple passes, these are generally straightforward to reason about, but in
more complex passes, particularly those that make use of different arguments
for different transformers or include extra return values, it can become more
difficult to determine what code will be generated.
In particular, the experience of developing a full commercial compiler has
shown that the \scheme{define-pass} form can autogenerate transformers that
shadow those defined by the compiler writer.
To help the compiler writer determine what code is being generated,
there is a variation of the \scheme{define-pass} form, called
\scheme{echo-define-pass}, that will echo the expansion of \scheme{define-pass}.
For instance, we can echo the \scheme{remove-one-armed-if} pass to get the
following:
{\small
\schemedisplay
(echo-define-pass remove-one-armed-if : Lsrc (e) -> L1 ()
(Expr : Expr (e) -> Expr ()
[(if ,[e0] ,[e1]) `(if ,e0 ,e1 (void))]))
;=>
pass remove-one-armed-if expanded into:
(define remove-one-armed-if
(lambda (e)
(define who 'remove-one-armed-if)
(define-nanopass-record)
(define Expr
(lambda (e)
(let ([g221.159 e])
(let-syntax ([quasiquote '#<procedure tmp>]
[in-context '#<procedure>])
(begin
(let ([rhs.160 (lambda (e0 e1) `(if ,e0 ,e1 (void)))])
(cond
[(primitive? g221.159) g221.159]
[(symbol? g221.159) g221.159]
[(constant? g221.159) g221.159]
[else
(let ([tag (nanopass-record-tag g221.159)])
(cond
[(eqv? tag 4)
(let* ([g222.161 (Lsrc:if:Expr.387-e0 g221.159)]
[g223.162 (Lsrc:if:Expr.387-e1 g221.159)])
(let-values ([(e0) (Expr g222.161)]
[(e1) (Expr g223.162)])
(rhs.160 e0 e1)))]
[(eqv? tag 2)
(make-L1:quote:Expr.400
'remove-one-armed-if
(Lsrc:quote:Expr.386-d g221.159)
"d")]
[(eqv? tag 6)
(make-L1:if:Expr.401 'remove-one-armed-if
(Expr (Lsrc:if:Expr.388-e0 g221.159))
(Expr (Lsrc:if:Expr.388-e1 g221.159))
(Expr (Lsrc:if:Expr.388-e2 g221.159)) "e0" "e1"
"e2")]
[(eqv? tag 8)
(make-L1:or:Expr.402
'remove-one-armed-if
(map (lambda (m) (Expr m))
(Lsrc:or:Expr.389-e* g221.159))
"e*")]
[(eqv? tag 10)
(make-L1:and:Expr.403
'remove-one-armed-if
(map (lambda (m) (Expr m))
(Lsrc:and:Expr.390-e* g221.159))
"e*")]
[(eqv? tag 12)
(make-L1:not:Expr.404
'remove-one-armed-if
(Expr (Lsrc:not:Expr.391-e g221.159))
"e")]
[(eqv? tag 14)
(make-L1:begin:Expr.405 'remove-one-armed-if
(map (lambda (m) (Expr m))
(Lsrc:begin:Expr.392-e* g221.159))
(Expr (Lsrc:begin:Expr.392-e g221.159)) "e*"
"e")]
[(eqv? tag 16)
(make-L1:lambda:Expr.406 'remove-one-armed-if
(Lsrc:lambda:Expr.393-x* g221.159)
(map (lambda (m) (Expr m))
(Lsrc:lambda:Expr.393-body* g221.159))
(Expr (Lsrc:lambda:Expr.393-body g221.159)) "x*"
"body*" "body")]
[(eqv? tag 18)
(make-L1:let:Expr.407 'remove-one-armed-if
(Lsrc:let:Expr.394-x* g221.159)
(map (lambda (m) (Expr m))
(Lsrc:let:Expr.394-e* g221.159))
(map (lambda (m) (Expr m))
(Lsrc:let:Expr.394-body* g221.159))
(Expr (Lsrc:let:Expr.394-body g221.159)) "x*"
"e*" "body*" "body")]
[(eqv? tag 20)
(make-L1:letrec:Expr.408 'remove-one-armed-if
(Lsrc:letrec:Expr.395-x* g221.159)
(map (lambda (m) (Expr m))
(Lsrc:letrec:Expr.395-e* g221.159))
(map (lambda (m) (Expr m))
(Lsrc:letrec:Expr.395-body* g221.159))
(Expr (Lsrc:letrec:Expr.395-body g221.159)) "x*"
"e*" "body*" "body")]
[(eqv? tag 22)
(make-L1:set!:Expr.409 'remove-one-armed-if
(Lsrc:set!:Expr.396-x g221.159)
(Expr (Lsrc:set!:Expr.396-e g221.159)) "x" "e")]
[(eqv? tag 24)
(make-L1:e:Expr.410 'remove-one-armed-if
(Expr (Lsrc:e:Expr.397-e g221.159))
(map (lambda (m) (Expr m))
(Lsrc:e:Expr.397-e* g221.159))
"e" "e*")]
[else
(error 'remove-one-armed-if
"unexpected Expr"
g221.159)]))])))))))
(let ([x (Expr e)])
(unless ((lambda (x)
(or (L1:Expr.399? x)
(constant? x)
(symbol? x)
(void+primitive? x)))
x)
(error 'remove-one-armed-if
(format "expected ~s but got ~s" 'Expr x)))
x)))
\endschemedisplay
}
\noindent
This exposes the code generated by \scheme{define-pass} but does not expand
the language form construction templates.
The autogenerated clauses, such as the one that handles \scheme{set!}, have a form like the following:
{\small
\schemedisplay
[(eqv? tag 7)
(make-L1:set!:Expr.18
(Lsrc:set!:Expr.8-x g0.14)
(Expr (Lsrc:set!:Expr.8-e g0.14)))]
\endschemedisplay
}
\noindent
Here, the tag of the record is checked and a new output-language record constructed,
after recurring to the \scheme{Expr} transformer on the \scheme{e} field.
The body code also changes slightly, so that the output of the pass can be
checked to make sure that it is a valid \scheme{L1} \scheme{Expr}.
In addition to echoing the output of the entire pass, it is also possible to
echo just the expansion of a single transformer by prefixing the transformer
with the \scheme{echo} keyword.
{\small
\schemedisplay
(define-pass remove-one-armed-if : Lsrc (e) -> L1 ()
(echo Expr : Expr (e) -> Expr ()
[(if ,[e0] ,[e1]) `(if ,e0 ,e1 (void))]))
;=>
Expr in pass remove-one-armed-if expanded into:
(define Expr
(lambda (e)
(let ([g442.303 e])
(let-syntax ([quasiquote '#<procedure tmp>]
[in-context '#<procedure>])
(begin
(let ([rhs.304 (lambda (e0 e1) `(if ,e0 ,e1 (void)))])
(cond
[(primitive? g442.303) g442.303]
[(symbol? g442.303) g442.303]
[(constant? g442.303) g442.303]
[else
(let ([tag (nanopass-record-tag g442.303)])
(cond
[(eqv? tag 4)
(let* ([g443.305 (Lsrc:if:Expr.770-e0 g442.303)]
[g444.306 (Lsrc:if:Expr.770-e1 g442.303)])
(let-values ([(e0) (Expr g443.305)]
[(e1) (Expr g444.306)])
(rhs.304 e0 e1)))]
[(eqv? tag 2)
(make-L1:quote:Expr.783
'remove-one-armed-if
(Lsrc:quote:Expr.769-d g442.303)
"d")]
[(eqv? tag 6)
(make-L1:if:Expr.784 'remove-one-armed-if
(Expr (Lsrc:if:Expr.771-e0 g442.303))
(Expr (Lsrc:if:Expr.771-e1 g442.303))
(Expr (Lsrc:if:Expr.771-e2 g442.303)) "e0" "e1"
"e2")]
[(eqv? tag 8)
(make-L1:or:Expr.785
'remove-one-armed-if
(map (lambda (m) (Expr m))
(Lsrc:or:Expr.772-e* g442.303))
"e*")]
[(eqv? tag 10)
(make-L1:and:Expr.786
'remove-one-armed-if
(map (lambda (m) (Expr m))
(Lsrc:and:Expr.773-e* g442.303))
"e*")]
[(eqv? tag 12)
(make-L1:not:Expr.787
'remove-one-armed-if
(Expr (Lsrc:not:Expr.774-e g442.303))
"e")]
[(eqv? tag 14)
(make-L1:begin:Expr.788 'remove-one-armed-if
(map (lambda (m) (Expr m))
(Lsrc:begin:Expr.775-e* g442.303))
(Expr (Lsrc:begin:Expr.775-e g442.303)) "e*" "e")]
[(eqv? tag 16)
(make-L1:lambda:Expr.789 'remove-one-armed-if
(Lsrc:lambda:Expr.776-x* g442.303)
(map (lambda (m) (Expr m))
(Lsrc:lambda:Expr.776-body* g442.303))
(Expr (Lsrc:lambda:Expr.776-body g442.303)) "x*"
"body*" "body")]
[(eqv? tag 18)
(make-L1:let:Expr.790 'remove-one-armed-if (Lsrc:let:Expr.777-x* g442.303)
(map (lambda (m) (Expr m))
(Lsrc:let:Expr.777-e* g442.303))
(map (lambda (m) (Expr m))
(Lsrc:let:Expr.777-body* g442.303))
(Expr (Lsrc:let:Expr.777-body g442.303)) "x*" "e*"
"body*" "body")]
[(eqv? tag 20)
(make-L1:letrec:Expr.791 'remove-one-armed-if
(Lsrc:letrec:Expr.778-x* g442.303)
(map (lambda (m) (Expr m))
(Lsrc:letrec:Expr.778-e* g442.303))
(map (lambda (m) (Expr m))
(Lsrc:letrec:Expr.778-body* g442.303))
(Expr (Lsrc:letrec:Expr.778-body g442.303)) "x*" "e*"
"body*" "body")]
[(eqv? tag 22)
(make-L1:set!:Expr.792 'remove-one-armed-if (Lsrc:set!:Expr.779-x g442.303)
(Expr (Lsrc:set!:Expr.779-e g442.303)) "x" "e")]
[(eqv? tag 24)
(make-L1:e:Expr.793 'remove-one-armed-if
(Expr (Lsrc:e:Expr.780-e g442.303))
(map (lambda (m) (Expr m))
(Lsrc:e:Expr.780-e* g442.303))
"e" "e*")]
[else
(error 'remove-one-armed-if
"unexpected Expr"
g442.303)]))])))))))
\endschemedisplay
}
\section{Tracing passes and transformers}
Echoing the code generated by \scheme{define-pass} can help compiler writers
to understand what is happening at expansion time, but it does not help in determining
what is happening at run time.
To facilitate this type of debugging, passes and transformers can be
traced at run time.
The tracing system, similar to Chez Scheme's \scheme{trace-define-syntax},
unparses the input-language term and output-language term of the pass using the language unparsers to
provide the S-expression representation of the language term that is being transformed.
The \scheme{trace-define-pass} form works just like the \scheme{define-pass}
form but adds tracing for the input-language term and output-language term of the pass.
For instance, if we want to trace the processing of the input:
{\small
\schemedisplay
(let ([x 10])
(if (= (* (/ x 2) 2) x) (set! x (/ x 2)))
(* x 3))
\endschemedisplay
}
\noindent
the pass can be defined as a tracing pass, as follows:
{\small
\schemedisplay
(trace-define-pass remove-one-armed-if : Lsrc (e) -> L1 ()
(Expr : Expr (e) -> Expr ()
[(if ,[e0] ,[e1]) `(if ,e0 ,e1 (void))]))
\endschemedisplay
}
\noindent
Running the class compiler with \scheme{remove-one-armed-if} traced produces the following:
{\small
\schemedisplay
> (my-tiny-compile
'(let ([x 10])
(if (= (* (/ x 2) 2) x) (set! x (/ x 2)))
(* x 3)))
|(remove-one-armed-if
(let ([x.12 10])
(if (= (* (/ x.12 2) 2) x.12) (set! x.12 (/ x.12 2)))
(* x.12 3)))
|(let ([x.12 10])
(if (= (* (/ x.12 2) 2) x.12) (set! x.12 (/ x.12 2)) (void))
(* x.12 3))
15
\endschemedisplay
}
\noindent
The tracer prints the \emph{pretty} (i.e., S-expression) form of the language,
rather than the record representation, to allow the compiler writer to read the
terms more easily.
This does not trace the internal transformations that happen within the
transformers of the pass.
Transformers can be traced by adding the \scheme{trace} keyword in front of the
transformer definition.
We can run the same test with a \scheme{remove-one-armed-if} that traces the
\scheme{Expr} transformer, as follows:
{\small
\schemedisplay
> (my-tiny-compile
'(let ([x 10])
(if (= (* (/ x 2) 2) x) (set! x (/ x 2)))
(* x 3)))
|(Expr
(let ([x.0 10]) (if (= (* (/ x.0 2) 2) x.0) (set! x.0 (/ x.0 2))) (* x.0 3)))
| (Expr (* x.0 3))
| |(Expr x.0)
| |x.0
| |(Expr 3)
| |3
| |(Expr *)
| |*
| (* x.0 3)
| (Expr (if (= (* (/ x.0 2) 2) x.0) (set! x.0 (/ x.0 2))))
| |(Expr (= (* (/ x.0 2) 2) x.0))
| | (Expr (* (/ x.0 2) 2))
| | |(Expr (/ x.0 2))
| | | (Expr x.0)
| | | x.0
| | | (Expr 2)
| | | 2
| | | (Expr /)
| | | /
| | |(/ x.0 2)
| | |(Expr 2)
| | |2
| | |(Expr *)
| | |*
| | (* (/ x.0 2) 2)
| | (Expr x.0)
| | x.0
| | (Expr =)
| | =
| |(= (* (/ x.0 2) 2) x.0)
| |(Expr (set! x.0 (/ x.0 2)))
| | (Expr (/ x.0 2))
| | |(Expr x.0)
| | |x.0
| | |(Expr 2)
| | |2
| | |(Expr /)
| | |/
| | (/ x.0 2)
| |(set! x.0 (/ x.0 2))
| (if (= (* (/ x.0 2) 2) x.0) (set! x.0 (/ x.0 2)) (void))
| (Expr 10)
| 10
|(let ([x.0 10])
(if (= (* (/ x.0 2) 2) x.0) (set! x.0 (/ x.0 2)) (void))
(* x.0 3))
15
\endschemedisplay
}
\noindent
Here, too, the traced forms are the pretty representation and not
the record representation.
\bibliographystyle{abbrv}
\bibliography{user-guide}
\end{document}