<?xml version="1.0"?>
<article class="llncs">

<title>A Pattern Language for Language Implementation</title>

<date>
	<month>Jan</month>
	<year>2004</year>
	</date>

<institution>
	<addr>University of Alabama, Computer Science</addr>
	<addr>Tuscaloosa, AL USA 35487-0290</addr>
	</institution>

<author>
	<name>Joel Jones</name>
	<email>jones@cs.ua.edu</email>
	</author>

<author>
	<name>Crutcher Dunnavant</name>
	<email>crutcher@samedi-studios.com</email>
	</author>

<author>
	<name>Trevor Jay</name>
	<email>aka1@samedi-studios.com</email>
	</author>


<abstract>
<para>
Programming languages are typically considered to be difficult to implement.
However, a programming language tailored to an application domain can be an
extremely powerful productivity enhancer.  We present a pattern language for
implementing languages that gathers together many ideas that are known in the
language implementation community that should be more widely known.  By applying
this pattern language, productivity can be increased by the blossoming of more
programming languages tailored to a specific purpose.
	</para>
<para>
We begin by discussing some of the dichotomies that shape the language design
process.  The pattern language itself consists of patterns describing various
language flavors, and patterns for doing syntactic recognition, evaluation, and
source production.
	</para>
	</abstract>

<section numbered='false'><title>Note to Sheppard</title>
<para>
During the first round of comments we are more interested in comments on whether
or not this is the right set of patterns, rather than comments on the individual
patterns. Also, we would like some feedback as to rather or not an Alexandrian
style would be more appropriate.
	</para>
	</section>

<section><title>Introduction</title>
<para>
A specification or programming language can be a very powerful tool for
expression.  However, as with any powerful tool, it is not uniformly applicable.
While there has been many discussions of when introducing a new language is
appropriate, most of these discussions have been cursory and merely served as a
brief introduction to an explication of the language that the authors have
designed and its benefits.
	</para>
<para>
To decide whether or not to apply this pattern language, a better understanding
of more than one reason for using this pattern language is needed.  While this
is not a complete enumeration, it should serve to bring together in one place
some of the better known reasons for implementing a language as a means of
solving a problems in software systems.
	</para>
<ul>
	<li>SPOT: single point of truth (Kernighan: Practice) (XP: Once \&amp; only
	once) (Jones: tabular code) (ESR: Art of Unix Programming)</li>
	<li>Firm Theoretical/Conventions: RegEx, Drawing, SQL, tables, chem,
	grap, Parsers, equations, logic, document formatting</li>
	<li>Configuration: change behavior w/o GUI -> automation -> text editor
	simpler than anything else</li>
	<li>Glue: Perl, AWK, JSP, ASP, (embedded: JavaScript, AppleScript, TCL,
	elisp, lua, gdb)</li>
</ul>
	</section><!-- very rough introduction  -->


<section>
<title>Design Dichotomies</title>

<para>
Psychology, psychiatry, and sociology deal frequently with long standing
dichotomies of thought. Most famously discussed is the "Madonna-Whore
Complex" <b>[XXX find a citation]</b>, a dichotomy in western
culture's view of women. The human mind appears readily capable of handling
complex domains by splitting them into a pair of opposing, but
inter-related, paradigms.  At any time, its opinion will be dominated by
one side of a dichotomy, but will be informed by the other. The eastern
concepts of Yin and Yang are idealized expressions of this tendency.
	</para>

<section><title>Flavor</title>
<para>
The oral, newsgroup, and email tradition of programming talks about languages
in terms of their 'flavor' or their 'feel'. It is full of discussions of what
a given language 'wants to be', or of what a given language 'really is'.
Computer languages are processed by computers, but they are written, read, and
debugged by people. They are vehicles for expression, tools for understanding,
and shackles that must be worn for nearly 50 thousand hours over the life of a
programmer. Perhaps they should be comfortable? But what patterns need we use
to achieve a given flavor?
	</para>

<para>
A good pattern should always be a name for a means of resolving design
tensions which the community already knew, but didn't have a name for. For
some patterns, the community will need to identify previously unnamed
tensions, so that the pattern may be properly captured. We believe that we
have identified a pair of design dichotomies which exert tension on the flavor
of computer languages.
	</para>
	</section>

<section>
<title>The Definitional Dichotomy</title>
<para>
We have identified the Definitional dichotomy, a dichotomy which covers a
continuum of language semantics, holding at one extreme languages which
speak in terms of actions, and at the other languages which speak in terms
of truths. The primary semantic content of 'Algorithmic' languages is
instructional, specifying which actions to take, and how to take them. C,
Java, Scheme, Smalltalk, PostScript, and most other 'programming' languages
are dominated by the algorithmic side of the definitional dichotomy.  The
primary semantic content of 'Constraint' languages is descriptive,
specifying truths, constraints, and the nature of acceptable processing
results. VHDL, HTML, CSS, Prolog, SVG, and most 'configuration' or 'data'
languages are dominated by the constraint side of the definitional
dichotomy. Thus we have the definitional dichotomy of Algorithmic versus
Constraint languages.
	</para>
	</section>

<section>
<title>The Structural Dichotomy</title>
<para>
We have also identified the Structural dichotomy, a dichotomy which covers
another continuum of language semantics, holding at one extreme languages
which speak in terms of hierarchies, and at the other languages which speak
in terms of Expressions. The primary semantic content of 'Hierarchy'
languages is expressed by their containment relationships, wherein elements
derive their meaning relative to their containing parents and their
contained children. XML and Scheme are dominated by the hierarchy side of
the structural dichotomy. The primary semantic content of 'Expression'
languages is expressed through the interaction of operators and peer
sequence, often evolving into very complex sequential Expressions. C,
Java, SQL, and most 'imperative' languages are dominated by the expression
side of the structural dichotomy.
	</para>
	</section>

	</section><!-- end "Design Dichotomies" -->

<section><title>The Pattern Language</title>

<section><title>Strictly Contained Language</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
One valuable thing to recognize regarding the dichotomies is the fact that
languages that fall to an extreme are often very easy to process. One can
consciously take advantage of this fact when designing a language. A good example
are domains that lend themselves towards being described in a very hierarchal
manner. 
	</para>
<para>
<b>For languages or domains that are essentially hierarchal processing models
centered around this fact would be ideal. What sort of models would be useful?</b>
	</para>
<para>
Contained languages, are obviously enough, languages that feature one or more
container syntaxes. There is good or even direct mapping between containment and
most hierarchies which makes this type of language extremely useful in those
situations. If we restrict things even further, calling for <i>all</i>
expressions to be contained and all containers to be contained, with the
exception of the root; then we have a language that is very easy to parse
because it can only be a tree, not a complex graph.
	</para>
<para>
<b>Structure your language as a Strictly Contained Language. A Strictly
Contained Languages consist of one or more container types. With the exception
of the "root" container all content and containers are themselves contained.</b>
	</para>
<para>
Strict containment allows for some very easy to implement parsing models. As an
example the tree might be walked recursively with every container being
processed by  a function that handles its content and calls other functions to
handle its children.
	</para>
<para>
Strictly contained languages can be characterized by the kinds and number of
containers they feature. At their simplest, such languages only have a single
type of container. At their most complex, as seen in languages like SGML, they
may have an arbitrary number of containers which are considered to have
different types.
	</para>
<para>
In a language like C, the container types are fixed by the language 
definition.
The top container in C is the compilation unit (file), which in turn 
contains type, data, and function declarations.
Each of these serve as containers also.
        </para>
<para>
It is worth reiterating that the important dichotomy concerned with <b>Strictly
Contained Language</b>s is the structural one. <b>Strictly Contained
Language</b>s may be constraint or algorithm based.
	</para>
<para>
Two popular forms of <b>Strictly Contained Language</b>s are <b>XML
Language</b>s and <b>Parenthesis Language</b>s.
	</para>
	</section><!-- end 'Strictly Contained Language' -->

<section><title>Parenthesis Language</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
Even though a <b>Strictly Contained Language</b> is a rather restricted context,
there are still a number of syntactic choices that can be made based on the
nature of the domain the language is attempting to cover. Once a containment
based language has been chosen as appropriate, the remainder of decisions tend
to concern the level of complexity of that containment.
	</para>
<para>
<b>Strictly Contained Languages intended for domains which have very simple
containment or hierarchal models need to be able to take advantage of this fact
so as to ease processing and allow the language to map easily to the domain.</b>
	</para>
<para>
For simple hierarchy structures a unilateral containment model is appropriate and
only one "type" of container is needed. There needn't be any distinction between
the root and all its children save that the root is first. Parenthesis are a
logical and popular container. 
	</para>
<para>
<b>Use a Parenthesis Language. A Parenthesis Language is a Strictly Contained
Language where there is only one kind of container, the parenthesis.</b>
	</para>
<para>
While few utilities and libraries exist for explicitly handling parenthesis
languages the process is quite straight-forward. The usual approach is to
recursively "resolve" each containers content. This can be done with handwritten
or generated code quite easily. Not just constraint languages but relatively
robust <b>Parenthesis Language</b>s such as large subsets of LISP can be handled
this way.
	</para>
<para>
When designing a <b>Parenthesis Language</b> it is worth looking into
<b>Embedding a Language</b> within some of the containers for an increased
amount of flexibility. Because of the recursive nature of processing this does
not add exorbitantly to the complexity.
	</para>
	</section><!-- end 'Parenthesis Language' -->

<section><title>XML Language</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
Sometimes instead of a straight forward containment model a very complex hierarchy is
appropriate. Even seemingly simple domains can quickly spiral in ways that make
representing them in languages complex series of containment best suited for
multiple types.
	</para>
<para>
In cases where the structure is likely to be extremely complex it
may be best not to "reinvent the wheel" but instead use a preexisting
meta-language. 
	</para>
<para>
<b>Use an XML Language, that is a markup language that is valid XML.</b>
	</para>
<para>
There are several advantages to using an <b>XML Language</b>. Perhaps foremost
amongst these reasons is the fact that libraries such as Xerxes[cite] for Java
or libxml[cite] for C. These libraries mean that even lower level implementation
languages, such as C, won't require programmers to write their own parsers, or
in some cases even data structure access code.
	</para>
<para>
XML is by far the most popular data interchange language and as a result is
likely to be familiar to various possible users.  Tools for its use are
available on almost any system environment. XML is particularly well suited for
constraint languages, but it has been successfully used in more algorithmic ways
as well.
	</para>
<para>
Since XML content is so easy to transform lexically, and otherwise<b>Lexical
Transformation</b>s and <b>Tree Transformation</b>s are definitely an option for
processing when an <b>XML Language</b> is used. 
	</para>
	</section><!-- end 'XML Language' -->

<section><title>Record Language</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
Many times you need a description language for little more than being a form of
non-interactive input, a way of shoving data into a machine processable
environment. This comes up a lot in context like program configuration or in
certain kinds of data driven programs.
	</para>
<para>
<b>Some languages are needed only for simple data entry or configuration like
purposes. For these a large amount of processing is overkill and undesirable. A
simpler language paradigm is needed.</b>
	</para>
<para>
<b>Record Language</b>s are based on the idea that your needed input is mostly
constraint based and can be processing atomically in "records". <b>Record
Language</b>s are usually syntactically very simple and used for such tasks as
setting a series of values. Stereotypically <b>Record Language</b>s records are
single line but many forms have more complex record structures.
	</para>
<para>
<b>Structure your language's semantics and syntax into a record language.  A
record language has little to no hierarchic structure, and is processed one
record at a time.</b>
	</para>
<para>
A few common forms of <b>Record Language</b>s include: <b>Key-value Pairs</b>,
<b>Delimiter-separated</b> and <b>Stanza Records</b>.
	</para>
	</section><!-- end 'Record Languages' -->

<section><title>Key-value Pairs</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
You would think that the decision to use a <b>Record Language</b> would simplify
data processing enough, but there are domain specific syntaxes that can save
time and effort, even within such a restricted environment.
	</para>
<para>
Imagine for a moment that the main purpose of a dataset to be input as a
<b>Record Language</b> is to set properties or attributes on some structure or
process. One could have a set order for the properties to be set and have
whatever processes the records assume this order and assign values accordingly,
but this is far from human readable, and prone to error. What then can be done?
	</para>
<para>
<b>Some Record Languages are meant to set attributes or properties. Such a
language should take advantage of this fact so that it maps easily to this
domain for its human users.</b>
	</para>
<para>
A <b>Key-value Pair</b> syntax may be just what is needed in this situation.
Key-value pairs typically consist of a key, the actual attribute or property
name, a delimiter (popular is the colon equals) and a value for that property or
attribute.
	</para>
<para>
There are a number of advantages to such a syntax in certain situations. Order
no longer need be important, a great boon to human editors. Further, defaults
can be assumed for unlisted key value pairs, again a boon to those using the
language.
	</para>
<para>
Processing of <b>Key-value Pair</b> like languages should, if desired, be
possible almost completely with string manipulation. 
	</para>
<para>
<b>Use a key-value syntactic structure.  Each key-value pair consists of a
descriptive key that identifies which attribute's value is being set, some
separator, and the value the attribute is being set to.</b>
	</para>
<para>
It is worth noting that some standardized versions of <b>Key-value Pair</b>
languages exist, and that editors called property editors may be present on some
systems allowing for a great amount of power in editing these types of files.
	</para>
	</section><!-- end 'Key-value Pairs' -->

<section><title>Delimiter-separated</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
A main characteristic of a <b>Record Language</b> is its simplicity. Sometimes
however, a degree of flexibility is required. A common situation is one where a
<b>Record Language</b> seems appropriate because a property or attribute is
being set but this attribute/property may be more like a list, or a collection
of properties.
	</para>
<para>
<b>You have a record language and each record has a fixed number of attributes.
However, each attribute may have a varying amount of data associated with
it.</b>
	</para>
<para>
This kind of problem can often be solved with creative use of delimiters. Server
type programs often need lists of users to allow or deny. Such chores can be
handled by having each list appear as a "record" on a separate line with the
usernames separated by commas.  
	</para>
<para>
<b>Use a delimiter-separated syntax.  Each record is placed on a single line by
itself and the attributes of each record are separated by some delimiter,
typically a single character.</b>
	</para>
<para>
A language such as this can be processed largely by a tokenizer working on
individual records. Some minor complications may arise depending on whether
there are a variable number of records or if record order might need to change.
Many of these problems can be addressed by embedding <b>Delimiter-separated</b>
languages inside <b>Stanza formatted records</b>. <b>Stanza formatted
records</b> are also worth looking into in that they can solve similar
problems.
	</para>
<para>
Though it isn't a good idea to take it further than two levels, it should be
noted that this pattern can be applied somewhat recursively by using multiple
delimiters.
	</para>
	</section><!-- end 'Delimiter separated' -->

<section><title>Stanza formatted records</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
Just as much as its simplicity a <b>Record Language</b> is marked by its atomic
nature. For some problems a solution that is atomic but further along in
complexity is necessary. Records that may have too much information to fit on a
single line or data that may need to be grouped together are examples of this
kind of need.
	</para>
<para>
<b>Use a Stanza formatted record. Stanza records are multi-line records
separated by some delimiter. Usually this delimiter is accompanied by a label
for the record as well. </b>
	</para>
<para>
Most traditional unix configuration files use some form of Stanza format.
Usually this is of the labeled variety. Very common as a delimiter is percent
signs enclosing the record label. One of the easiest ways to implement a stanza
record reader is to have a two state state machine recognizing delimiters and
processing the records themselves respectively.
	</para>
<para>
Stanza formatting can be combined with other forms of <b>Record Language</b>s to
allow for a variable number of <b>Key-value Pairs</b> or <b>Delimiter
Separated</b> list for example. 
	</para>
<para>
<b>Stanza formated records</b> have been around a long time [cite! ESR] and
probably represent as far as <b>Record Language</b>s should be taken before more
serious processing should be considered.
	</para>
	</section><!-- end 'Stanza records' -->

<section><title>Record Consumer</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
For many kinds of input, especially that which might be found in a <b>Record
Language</b> even the most simple of "processing" methods is overkill. You are
simply looking for a way to shovel data into a machine processable environment.
An AST would clearly be overkill in this case, and in fact so would most
primitive forms of parsing.
	</para>
<para>
<b>A Record Language has a unique need for simplicity in processing in order to
preserve the labor savings its use is intended to win.</b>
	</para>
<para>
Imagine a bash script that removes batches of users from a system. It's input
might be a file containing a list of said users, separated by newlines. All the
script really does is consume a line at a time taking the string, the username,
and rewriting it into a command that it then runs.  
	</para>
<para>
Such a script is a good example of a <b>Record Consumer</b>. Its input file is
clearly a <b>Record Language</b> based on a <b>Delimiter-separated</b> list, it
consumes one record at a time and it does so with very little processing. 
	</para>
<para>
Many <b>Record Consumer</b>s take an immediate action as they consume a record.
They use it as a argument in a system command or a function call, they
initialize a variable to that value, or they might create a data object based on
the record. If anything but the most primitive of validity checking need be
done, perhaps you are not looking for a <b>Record Consumer</b>.
	</para>
<para>
<b>A Record Consumer should take in one record at a time, utilize it immediately
and then consume the next record. The word "utilize" is used here to exaggerate
how little processing must occur.</b>
	</para>
<para>
Naturally a <b>Record Consumer</b> must process a <b>Record Language</b> flavors
of which include <b>Key-value Pairs</b>, <b>Delimiter-separated</b> lists and
the now classic <b>Stanza formatted records</b>. <b>Record Consumer</b>s often
serve to perform tasks such as reading in configuration records, but they may
also be used as part of a more complex processing environment.
	</para>
	</section><!-- end pattern 'Record Consumer' -->


<section><title>Parsers</title>

<para>
In processing most languages, sooner or later you will need a Parser.
There is no One True Way in parser design, and so there are many parallel
patterns for Parsers. The largest split lies between the Generated Parsers,
and the Hand-written Parsers, though there are many distinctions at lower
levels.
  </para>


<section><title>Generated Parser</title>
<para>
Many languages have a structure which is easily describable in grammars,
the family of LALR languages, for example, is capable of some fairly complex and
articulated structures, while still being describable in terms of a
context-free grammar.
  </para>

<para><b>
When a stable grammar exists for a language, and when error recovery is of
minimal importance, a class of generated solutions becomes available for
constructing the Parser.
  </b></para>

<para>
Use a parser generator for your implementation language to produce a 
recognizer for the input language.
A parser generator takes a specification of the grammar of a language 
and generates a parser for this language.
Typically one also uses a lexer generator in conjunction with the use 
of a parser generator.
	</para>

<para>
The first generation of parser generator tools provide support for 
generating the tables and engine for a table driven parser and 
allow the firing off of action code when a production in the source 
language was recognized.
Also, most of these tools can utilize an associated lexer generator 
and generate a set of token type definitions from the grammar for the 
source language.
This class of tools includes yacc and lex <cite>Levine</cite>,  Bison and 
flex <cite>Levine</cite>, java\_cup and jlex, javacc <cite>javacc</cite>, yacc$++$ 
<cite>yaccPlusPlus</cite>.
	</para>

<para>
The second generation of parser generator tools adds to first 
generation by supporting other aspects of the translation process.
Sly <cite>SLY</cite> and LPT <cite>LPT</cite> generate code for doing 
source-to-source translation and the implicit generation of AST 
definitions. TXL<cite>TXL</cite> and eli<cite>eli</cite> also provide
tools for generating such additional pieces.
	</para>
<para>Therefore:</para>

<para><b>
When working with context-free languages, and when error recovery is of minimal
importance, use a Generated Parser if tools for doing so are available in
the implementation environment.
  </b></para>
	</section><!-- 'Generated Parser' -->



<section><title>Hand-written Parser</title>
<para>
It is not always possible or desirable to use a Generated Parser. Sometimes
the complexity of a language's grammar is so low that the development cost
of using a parser generator is greater than writting the parser by hand;
and sometimes the semantics of the language demand extensibilty mechanisms
which are not acheivable with a Generated Parser.
  </para>

<para><b>
When a language's parser cannot be readily handled by parser generation
tools, a substatially different class of parsers are needed.
  </b></para>

<para>
Extensible grammars and sophisticated error reporting and recovery are
difficult to achieve with Generated Parsers; and the grammars for some
languages are so simple as to not require the use of such tools.
  </para>

<para>Therefore:</para>
<para><b>
When a language's costs or processing semantics demand, use a parser
written directly in the implementation language.  There are many ways to
implement such a parser.  The choice is driven by the implementation
language and the complexity of the language.
  </b></para>

<para>
You may be able to implemnt <b>Hand-written Parser</b> using
<b>Cascade Parser</b>, <b>Per-type Parser</b>, or
<b>Recursive-descent Parser</b>.
  </para>
	</section><!-- end 'Hand-written Parser' -->





<section><title>Cascade Parser</title>
<para>
In many environments, such as parsing command lines, tokenization is
unneccassry, or rather, already provided.
  </para>

<para><b>
When lexing and first level structure are provided by a language's
environment, and when a language need not be extendable, the semantic
actions of the parser dominate the cost dynamics.
  </b></para>

<para>
The high cost of development of most parsers lies in the need to structure
a input stream. In some situations though (such as command lines), the
input is already structured into tokenized records of some form. In this
situation, all that the parser need do is detect the different
configurations a record may be in, and effect the semantics of the
language. This can be accomplished by a simple cascade of if-else statements,  
or a switch.
  </para>

<para>Therefore:</para>
<para><b>
When an environment provides token statements, and a language does not
require extensibility; use a Cascade Parser. Construct a cascade of
if-else statements or switch statements to handle the language's semantics.
  </b></para>

<para>
The dominant feature of a Cascade Parser is the cascade, which detects and
handles the various record configurations of the input. If the language
requires extensibility, consider a Per-type Parser instead.
  </para>
	</section><!-- end 'Cascade Parser' -->



<section><title>Per-type Parser</title>
<para>
In many environments, such as parsing command lines, tokenization is
unneccassry, or rather, already provided. Some languages for these
environments require extensibility, often during execution; a common way to
achieve this is by defining the semantics of records which begin with some
identifying keyword.
  </para>

<para><b>
When lexing and first level structure are provided by a language's
environment, and when a language is statement based, and structured in the
form KEYWORD ARGS*, but the language requires extensibility, handler dispatch
dominates the cost dynamics.
  </b></para>

<para>
The high cost of development of most parsers lies in the need to structure
a input stream. In some situations though (such as command lines), the
input is already structured into tokenized records of some form. In this
situation, all that the parser need do is detect the different
configurations a record may be in, and effect the semantics of the
language. If the language requires extensibility, a dispatcher will be
neccessary, to associate keywords with the appropriate handler for that
type.
  </para>

<para>Therefore:</para>
<para><b>
When an environment provides token statements, and a language is keyword
based, and requires extensibility; use a Per-type Parser.
  </b></para>

<para>
The dominant feature of a Per-type Parser is the dispatcher, which looks up
the appropriate Per-type handler for a given statemnt based upon its
keyword, and then dispatches the statement to the handler.  If the language
does not requires extensibility, consider a Cascade Parser instead.
  </para>
	</section><!-- end 'Cascade Parser' -->





<section><title>Recursive-descent Parser</title>
<para>
In some situations, error reporting and recovery is a very important
factor. With the Generated Parsers, it is often difficult to know where an
error happened in the parse tree.
  </para>

<para><b>
When error reporting and recovery is important, a parser must provide means
of passing through know states.
  </b></para>

<para>
LL(1) grammars can parsed by Recursive-descent Parsers, and it is always
possible to know "where you are" deterministicaly in the parse tree of a
Recursive-descent Parser, so it is always possible to provide good error
semantics.
  </para>

<para>Therefore:</para>
<para><b>
When a language must provide high quality error reporting and recovery,
and the grammar for the language is LL(1) and does not require
extensibility, use a Recursive-descent Parser.
  </b></para>

<para>
Build the parser by manually translating the grammar rules into recursive
code in the implementation language which recognizes the language by
identifying tokens and recursively identifying productions of the
grammar<cite>Fall2001JavaCompilerBook</cite>.
  </para>

	</section>


	</section><!-- end 'Parsers' -->


<section><title>High Order Semantics</title>


<section><title>High Order Features</title>
<para>
Most articulated languages posses many features which could be expressed as
applications of simpler features of the same language.
  </para>

<para><b>
When language features can be defined in terms of more primitive features
of the same language, these deffinitions can guide and reduce the cost of
implementation.
  </b></para>

<para>
Many language features can easily be defined as relatively simple
applications of other features of the language. 'Syntactic Sugar' are such
features, but some features go deeper. In most languages,
<b>for</b>-loops can be defined in terms of
<b>while</b>-loops and <b>if</b>-statements.
These features, defined in terms of other features, are <b>High Order
Features</b>.
  </para>

<para>Therefore:</para>
<para><b>
Seek to identify those portions of your language which can be defined as
high order features, and keep these definitions available at later design
stages.
  </b></para>

<para>
If you have enough high order features, implementing them using some
transformation pattern, such as <b>Lexical Transformation</b>,
<b>Deterministic Tree Transformation</b>, or <b>Non-Deterministic Tree
Transformation</b> may become cost effective.
  </para>
	</section><!-- end pattern 'High Order Features' -->

<section><title>Language Composition</title>
<para>
The semantics of many richly articulated languages are most properly seen
as compositions of simpler languages.
  </para>

<para><b>
When a language's features resist decision into directly processable forms,
the language's semantics are often compositional.
  </b></para>

<para>
We are quite good, as people, in dealing with compositions of language
strucutures and semantics, mixing and matching special purpose language,
such as medicine or methematics, with our general purpose languge. There is
no reason why we cannot build our processing environments to do the same.
  </para>

<para>
Some languages have rich semantic features, and you may find yourself
having difficulty planning an architecture for processing them. This
problem is frequently solved by composing the semantics of several
languages, and processing each layer's language into the next layer's.
	</para>

<para>Therefore:</para>
<para><b>
Implement rich and subtle languages by transforming them into simpler
languages. Your processing environment should first resolve any <b>High
Order Features</b> in your input language, and should then apply some
transformation technique to produce the composed language for processing.
  </b></para>
	</section><!-- end pattern 'Language Composition' -->


<section><title>Embedded Language</title>
<para>
You can't lick your elbow, but sometimes it is the best solution.
  </para>

<para><b>
Frequently, a given language is almost ideal, save for a small and critical
piece of missing functionality.
  </b></para>

<para>
You may find that for the most part one particular pattern/paradigm for your
language's syntax will work well, except for one or two specific and
self-contained needs. For example, the ProC language is a superset of C,
which is resolved by a pre-processor into C to provide database
functionality to C programmers.
	</para>

<para>Therefore:</para>
<para><b>
Rather than try and "fix" the language for which it is awkward to express
certain semantics, embed another language inside it.
  </b></para>

<para>
Write pre-processors which "resolve" the Embedded Language into
expressions in the base language, before further language processing. In
languages eventually destined to be interpreted or converted to some
"generic" programming language such as C, embedding this higer order
language can be almost trivial. At worst two parsers and language
processing steps must be written, but this cost is often mitigated by the
fact that the base language is often a popular language for
which many processing tools exist.
	</para>
<para>
Lex and Yacc's treatment of action code and C programs which allow for
scripting in some higher order language such as Perl are good examples of
this technique.
	</para>
	</section><!-- 'Embedded Language' -->

<section><title>Language Extension</title>
<para>
The forces which drive Language Composition and Embedded Languages are
common. As a result, some languages have been designed to permit
extension directly in their existing semantics.
  </para>

<para><b>
When a language provides extension mechanisms, it is natural to
implemnt additional semantics using those mechanisms.
  </b></para>

<para>
ML provides very robust mechanisms for type and operator extension;
Scheme provides powerful macros; C++ provides mechisms for operator
overloading; and even C provides a reasonably flexible lexical macro
system.
  </para>

<para>Therefore:</para>
<para><b>
When you need features which represent a superset of the semantics of an
existing language (which you are comfortable with), and the language provides
mechanisms for extending its semantics, produce those features directly in
the language's extension mechanisms.
  </b></para>
	</section><!-- 'Language Extension' -->
	</section><!-- end 'High Order Features' -->


<section><title>AST</title>
<!-- Joel rewrites -->
<!-- introduction, problem statement, discussion, solution, [other patterns] -->
<para>
You are implementing a language, and the language is sufficiently
complex that direct execution or source-to-source translation is not
desirable.  How do you represent the essential characteristics of the
structure of the input and avoid making errors in constructing this 
representation?
  </para>

<para>
Parsing a non-trivial language usually involves the implicit or explicit 
creation and traversal of a tree structure.
This tree is a consequence of the context-free language that is
being parsed.
However, the tree induced by parsing contains extraneous information and
may have a structure that is not convenient to deal with.
  </para>

<para>
<b>Therefore, implement an abstract syntax tree (AST) using implementation
language specific idioms.  </b>
An abstract syntax tree (AST) captures the essential structure of the 
input in a tree form, while omitting unnecessary syntactic details.
ASTs can be distinguished from concrete syntax trees by their omission
of tree nodes to represent punctuation marks such as semi-colons to
terminate statements or commas to separate function arguments.
ASTs also omit tree nodes that represent unary productions in the 
grammar.
Such information is directly represented in ASTs by the structure of 
the tree.
An AST is a tree that is specific to the language being 
represented, rather than a generic tree structure consisting of information 
about the node represented as a reference to "Object" and a collection of 
"Tree" nodes to represent the children.
The use of a specialized representation allows the implementation system to 
detect errors at translater build time, through the use of type-checking 
on the elements of the AST.
  </para>

<para>
When designing the nodes of the tree, a common design choice is 
determining the granularity of the representation of the AST.
That is, whether all constructs of the source language are 
represented as a different type of AST nodes or whether some constructs 
of the source language are represented with a common type of AST node 
and differentiated using a value.
One example of choosing the granularity of representation is
determining how to represent binary arithmetic operations.
One choice is to have a single binary operation tree node, which has 
as one of its attributes the operation, e.g. ``$+$''.
The other choice is to have a tree node for every binary operation.
In an object-oriented language, this would results in classes like: 
AddBinary, SubtractBinary, MultiplyBinary, etc. with an abstract 
super class of Binary.
The second form is preferred if there will be behavior associated with 
the tree nodes.
More information on how to implement ASTs can be found in 
<cite>JonesImpAST</cite>.
	</para>

<para>
ASTs can be implemented in one of the following ways---<b>Handwritten AST</b>,
<b>Generated AST</b>, or <b>Commodity AST</b>.
	</para>

	</section><!-- end 'AST' -->



<section><title>Handwritten AST</title>
<!-- Joel rewrites -->
<para>
You've decided that you need to implement an abstract syntax tree (AST) 
because the translation or execution process was too complex to perform 
directly from the input source.
  </para>

<para> 
How do you implement an AST? 
There are several factors to consider.
First, if the parser implementation is a parser generator, then the parser 
generator may have mechanisms for generating an AST.
In that case, consider <b>Generated AST</b>.
Second, if your implementation language is supported by an AST generator 
tool, then the code for the AST can be generated.
Again, consider <b>Generated AST</b>.
Third, if you intend to do fairly commonplace transformations on the tree, 
then a existing tree rewrite system like XSLT may be useful.
In that case, consider <b>Commodity AST</b>.
If none of these apply, then another approach to implementing the AST must 
be taken.
  </para>

<para>
<I>Therefore, produce the AST implementation directly by writing the
code that implements the desired structure and function.</I>
Use appropriate idioms for your implementation language for 
implementing the AST.
For imperative languages, such as Pascal or C, use a variant record 
structure with a variant for each AST node type.
In ML or Haskell, use a datatype declaration.
In an object-oriented language, use a class hierarchy with a class for 
each AST node type and an abstract base class for representing the 
general AST node.
	</para>

<para>
To produce the AST during parsing, the AST is built by having nodes 
added to the tree when a complete production is recognized.
        </para>

<para>
As an illustration of how to implement a handwritten AST in an imperative
language like C, we use the example from the discussion of
<B>Interpreter</B> from <cite>GoF</cite>.
First, the <pre>.h</pre> file:
<figure>
<verbatim>
typedef struct booleanExp *booleanExp_ty;
typedef char* identifier;
typedef int boolean;

enum booleanExp_type {
        VARIABLE, CONSTANT, OREXP, ANDEXP, NOTEXP
} ;

struct booleanExp {
        enum booleanExp_type kind;
        union {
            struct { identifier id; } variable;
            struct { boolean b; } constant;
            struct { 
                booleanExp_ty left; 
                booleanExp_ty right;
            } orExp;
            struct {
                booleanExp_ty left;
                booleanExp_ty right;
            } andExp;
            struct { booleanExp_ty exp; } notExp;
        } u;
};
</verbatim>
</figure>
Next, the <pre>.c</pre> file:
<figure>
<verbatim>
#include "booleanExp.h"
booleanExp_ty 
mkVariable(identifier id) {
        booleanExp_ty p;
    
        p = (booleanExp_ty) malloc(sizeof(*p));
        p->kind = VARIABLE;
        p->u.variable.id = id;
        return p;
}
booleanExp_ty 
mkConstant(boolean b) { /* ... */ }
/* ... */
</verbatim>
</figure>
  </para>

	</section><!-- end 'Handwritten AST' -->




<section><title>Generated AST</title>
<!-- Joel rewrites -->

<para>
You've decided that you need to implement an abstract syntax tree (AST)
because the translation or execution process was too complex to perform
directly from the input source.
Also, either your parser generator supports the generation of ASTs or your 
implementation language has tools for generating ASTs.
  </para>

<para>
How do you implement an AST?
As in any project, you want to reduce the implementation effort for your
AST.
The implementation of a an AST is mostly rote---once the structure of the 
desired AST is determined, the implementation code is straightforward.
Most node types of the AST have an invariant number of children, with the 
AST nodes represented as record structures and the children are 
represented as references to the appropriate node type.
Some nodes will have a variant number of children, e.g. argument lists to 
method calls.
These are represented using collections of references.
Given the rote nature of this process, you want to proceed through its 
implementation as quickly as possible.
  </para>

<para>
<I>Therefore, use an AST generator.</I>
There are two kinds of AST generators.
The first takes a specification of the AST and generates the 
necessary code for implementing the AST.
The second generates an AST from its implicit specification in a
grammar specification.
	</para>

<para>
The input language for the first kind of AST generator will typically
contain the name of the generic node's type, the name of all specific
node's types, and member names and types for the specific nodes.
One such tool is the generator for Abstract Definition Language 
(ASDL) <cite>Wang:1997:ZAS</cite> which includes C as one of its output 
languages.
It is also not hard to build a simple version of such a tool using a 
scripting language such as AWK or Perl.
In addition to generating the data type declarations, an AST code 
generator should also create ``constructor'' functions which take as 
arguments the children of the node and return initialized instances 
of the specific nodes of the AST.
	</para>

<para>
As an example of what the specification for an AST is like, we take the 
example from the discussion of <B>Interpreter</B> from <cite>GoF</cite>.
We use the Zephyr ASDL format.
<figure>
<verbatim>
booleanexp = Variable(identifer id)
           | Constant(boolean b)
           | OrExp(booleanexp left, booleanexp right)
           | AndExp(booleanexp left, booleanexp right)
           | NotExp(booleanexp exp)
</verbatim>
</figure>
	</para>
	</section><!-- end 'Generated AST' -->

<section><title>Commodity AST</title>
<!-- Joel rewrites -->
<para>
You've decided that you need to implement an abstract syntax tree (AST)
because the translation or execution process was too complex to perform
directly from the input source.
Also, the source language is in XML or easily lexically transformed into 
XML.
	</para>

<para>
How do you implement an AST?
If the source language is XML or in another format for which there is a 
pre-existing tree format, then the desired AST structure is likely very 
similar to the input structure.
There are many network protocols, such as SOAP, which are based upon XML.
Wrting an AST implementation in such a situation is a waste of effort, as 
there are already parsers for XML for various environments.
        </para>

<para>
<I>Therefore, use a commodity AST implementation, most commonly one 
supporting XML.</I>
Use a library for implementing tree structures.
Such a library will support node creation, iteration over children,
and access to a dictionary like structure to store node attributes
with name-value access patterns.
The library may provide memory management, validation, pickling, 
pretty-printing, etc.
        </para>
	</section><!-- end 'Commodity AST' -->


<section><title>Lexical Transformation</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
Some transforms on a language, even useful ones, do not require deep or even
shallow semantic understanding. Macro expansion, many forms of syntactic sugar
and sometimes even rewriting of <b>High Order Feature</b>s can often be
accomplished with simpler non-AST based transforms. 
	</para>
<para>
<b>Many languages require some degree of transformation or rewrite. In cases
where deeper understanding such as that represented by AST's is not needed
another method is desirable to avoid unneeded work and complexity.</b>
	</para>
<para>
Commonly <b>Lexical Transformation</b> is handled by some sort of regular
expression engine. This might be in the form of some mechanism such as a PERL
script run before other stages of language processing.
	</para>
<para>
<b>Lexical Transformation</b> isn't necessarily a means to an end itself. Many
tranforms simply make matters easier for other portions of the processing
toolchain. While it performs other functions the C pre-processor is a good
example of such a situation. 
	</para>
<para>
<b>Use a Lexical Transformation based purely on less than shallow semantic
understanding such as string manipulation through regular expressions.</b>
	</para>
<para>
While the technique should only be stretched so far, it should be kept in mind
that if a <b>Lexical Transformation</b> is being applied in a somewhat ad-hoc
manner, as for example through a PERL script; then slightly more advanced
features requiring mild understanding, such as brace counting, may be easy to
implement.
	</para>
<para>
If more advanced transformation is required a full <b>AST</b> as in a <b>Tree
Transformation</b> may be required.
	</para>
	</section><!-- end pattern 'Lexical Transformation' -->

<section><title>Tree Transformation</title>
<!-- Context, <b>Problem</b>, Discussion, <b>Prescription</b>, related -->
<!-- Trevor rewrites -->
<para>
Quite often the best way to view language processing, or just a portion of
processing, is as a transformation. This transformation may be simple, requiring
little semantic knowledge as is the case with a <b>Lexical Transformation</b> or
it may be more complex.
	</para>
<para>
<b>Some language processing is best achieved as a transformation and some kinds
of transformation require a reconfiguaration of the structure of the language
itself.</b>
	</para>
<para>
In order to restructure a language the structure itself must first be captured,
this is often done using an <b>AST</b>. Once some semantic statements are in
<b>AST</b> form than transformations on that <b>AST</b> become an option.
	</para>
<para>
<b>Use Tree Transformation. Read your input language into an AST and then
restructure it using some form of transformation engine.</b>
	</para>
<para>
While <b>Tree Transformation</b> may become a rather complicated enterprise as
when it is used for task such as optimization, it has simpler more approachable
forms as well. There are generic tree transformation engines available such as
the XSLT language available for transforming XML data structures. This level of
power alone makes it possible to perform a great number of transformation task
such as expansion, collapsing, reordering, and a great deal of restructuring. 
<b>Tree Transformation</b> forms the core of traditional non-optimizing compilers.
	</para>
<para>
By itself or when used in combination with <b>Lexical Transformation</b> <b>Tree
Transformation</b> can be used as a step in a more complex processing scheme or
may be all that is necessary to perform <b>Source Output</b> or <b>Embedding a
Language</b>. <b>Tree Transformation</b> is <i>real</i> language processing and
can accomplish most of what needs to be done with an input language.
	</para>
	</section><!-- end pattern 'Tree Transformation' -->

<section><title>Intermediate Representation Builder</title>
<!-- Joel rewrites -->

<para>
Only in the simplest cases can a program be executed directly as it is 
parsed.
These simple languages can be likened to a four-function calculator, doing
calculations one step at a time.
Most languages do not fit into this category and need a way to build an 
intermediate structure which is subsequently used to produce another source 
program or executed.
Such a form is called an intermediate representation (IR).
What sort of intermediate representation should be used and how is it 
initialized?
    </para>

<para>
<I>Use an intermediate representation that is tailored to the kind of 
<B>Record Language</B> that is being recognized and use the appropriate method for 
building it.</I>
    </para>

<para>
These IR builder techniques are commonly used with scripting languages such 
as AWK or Perl.
These languages are interpreted and loosely typed, traits which are well 
suited to text-to-text translators.
In addition to regular expression matching and string manipulation, both 
AWK and Perl have associate arrays <footnote>Dictionaries to the Smalltalk 
and Java programmer</footnote> which map values to values.
Our examples will be in AWK.
AWK in its simplest mode of operation reads standard input one line at a 
time, tokenizes the current line, and places the values of each column 
into the variables \$1, \$2, etc.
Then, the user supplied sequence of pattern/action pairs is evaluated, 
with the action fired when the pattern matches the current input line.
A pattern/action pair with no pattern is fired for every input line.
    </para>
<para>
For <b>Key-value Pairs</b>, build a dictionary of string to string mappings.
This separates the parsing from use, but has the disadvantage of moving
error-checking away from immediate notification.
    </para>

<para>
The following awk code:
<verbatim>
    { keyValueMap[$1] = $2; cnt++ } /* match every line */
END { 
        for (i in keyValueMap) { print "char *" i ";" }
        for (i in keyValueMap) { print i "=" "\"" keyValueMap[i] "\";" }
    }
</verbatim>
When given the input:
<verbatim>
a b
c d
</verbatim>
Produces the output:
<verbatim>
char *a;
char *c;
a="b";
c="d";
</verbatim>
	</para>

<para>
For <b>Delimiter-separated Values</b>, each record is stored as an object with
an array of Strings.  The client does the conversion to the needed structure.
In an object-oriented implementation language, typically a constructor will take
an argument of either an array of strings or a record structure holding the
delimiter separated values.
In a language like awk or FORTRAN, parallel arrays for each column are used.
	</para>
<para>
The following awk code:
\verbatiminput{delim.awk}
When given the input:
<verbatim>
Joel Jones
Trevor Jay
Crutcher Dunnavant
</verbatim>
Produces the output:
<verbatim>
Jones, Joel
Jay, Trevor
Dunnavant, Crutcher
</verbatim>
    </para>

<para>
For <b>Stanza Records</b>, processing can be done in one of two ways.  
The first approach is to process each record one at a time.  This has to be
done piece-meal, as more than one line must be read for each record.
<footnote>Neither of the following examples are idiomatic AWK usage, q.v. 
<cite>PracticeOfProgramming</cite></footnote>
	</para>

<para>
Except for the first record, the detection of the start of a new record triggers
the generation of a client record from gathered information.
The following AWK code illustrates.
\verbatiminput{stanza1.awk}
	</para>

<para>
The other approach is to collect all of the information and create the records
after the entire file has been processed.  In a scripting langauge, use a
counter to generate a record identifer and collection of associative arrays, one
per attribute, and use the record identifer as a key and the attribute value as
the indexed value.
This technique is useful if duplicates need to be removed or if records are 
filtered based upon aggregate information.
The following AWK code illustrates.
\verbatiminput{stanza2.awk}
	</para>
	</section><!-- end 'IR Builder' -->

<section><title>Execution Techniques</title>
<para>
In the continuum in this domain, measured on the extent of semantic
analysis, we will find that <b>Interpreters</b> lie roughly in the middle,
<b>VM</b>s lie towards the side of low semantic analysis, and <b>Semantic
Evaluators</b> lie towards the side of high semantic analysis.  These
evaluators are the portion of a language processing system which
<i>understands</i> the meaning of a language, and provide its
interpretation.
        </para>
        </section><!-- end Execution Techniques -->

<section><title>Immediate Execution</title>
<!-- Joel rewrites -->
<para>
You need a proccessing paradigm underneath a <textbf>Record Consumer</textbf> or
similiar mechanism.
Your language is mostly atomic and directive in nature. 
No further complex processing of the language is required save to carry out
a series of desired commands.
The language is not forward referenced.
Advanced semantic features such as side effects are either not present or
constrained such that they permit serial evaluation.
A source program in this simplest can be recognized and executed directly.
    </para>

<para>
<textbf>Therefore, evaluate the source program directly, without 
translation to an intermediate form.</textbf>
<textbf>Immediate Execution</textbf> is often used in situations such as
when records have a direct mapping to function calls within an API or some
other mechanism that needs to be "driven".
Input is parsed, and as soon as an executable portion is recognized, it is 
executed.
BASIC and dc are prototypical examples of this pattern.
        </para>
<para>
The action code will mostly consist of calls into a 
<textbf>Runtime</textbf>
	</para>
	</section><!-- end pattern 'Immediate Execution' -->

<section><title>VM</title>
<!-- Joel rewrites -->
<para>
You have an algorithmic language and need to evaluate it.
Your desired language has features which are not easily realized by your
implementation language.
Your desired language has a well defined clean fairly invariant processing
model.
This model is more well defined than that common in an <b>Interpreter</b>.
	</para>

<para>
<b>Therefore, use a virtual machine.</b>
A virtual machine is a simulated architecture realized within an
implementation environment.
It is much like an emulator except that the architecture it "emulates" may
not exist, or even be possible.
	</para>
<para>
By virtue of providing its own memory and processing environment the VM may be
able to provide features not native to the implementation language. The JavaVM
as an example is able to offer garbage collection to the Java language even
though C does not implement this paradigm. Further, as a result of the "virtual"
nature of the VM, environment code inside it is well insulated from the rest of
the system. This may be a cognitive interface advantage, or a security and
performance advantage as in sandboxes.
	</para>
<para>
Virtual machines have a substantial cost.
You should have a great deal of resources to commit to processing
development.
Platform portability should be important.
Complex features are required yet speed of language programs is important.
A virtual machine almost by definition is tied to some form of
<b>Runtime</b> and may require a <b>Semantic Evaluator</b> usually in the
form of a compiler because VM's are often based on bytecode or some other
form of atomic instruction set.
	</para>
	</section><!-- end pattern 'VM' -->

<section><title>Semantic Evaluator</title>
<!-- Joel rewrites -->
<para>
Constraint languages are composed of elements that are not directly mapped 
to a Von-Neuman architecture.
For example, Prolog programs do not explicitly specify an execution order.
<B>How do you provide impetus to constraint languages?</B>
	</para>
<para>
Processing of <b>Constraint Languages</b> requires a
<b>Semantic Evaluator</b>, as there are few instructions to drive either an
<b>Interpreter</b>, or a <b>VM</b>.
	</para>

<para>
How do you process a constraint language?
Constraint languages contains semantic constructs which cross-cut, and
in general defy a greedy or local context processing model.
The execution model involves search or matching some constraint and there 
is some freedom in choosing execution order.
This is not true of other execution techniques.
        </para>
<para>
<B>Therefore, construct a Semantic Evaluator for your language, and place the code
which must <i>understand</i> your language there.</B>
As a result of parsing the input a <b>Semantic Evaluator</b> build data 
structure that represent the program in a form that is closer to a 
semantics of variable assignment, function calls, and explicit ordering.
The function calls will incoude calls into a <b>Runtime</b> that contains 
significant encoding of the semantics of the input language, i.e. 
translated towards the algorithmic language side of the definitional 
dichotomy.
For example, in Prolog, a function call might be a call to a routine to 
unify variable assignments in a single term.
In SQL, a function call might be made to find all rows of a table with a 
column matching a given value.
These data structures may be evaluated using <b>Interpreter</b> or a 
translation into a  more sequential form and evaluted by a <b>Virtual 
Machine</b>.
SQL evaluators use the <b>Interpreter</b> approach using a query plan, a 
tree structure that is evaluted bottom up.
The <b>Virtual Machine</b> approach is the most common execution technique 
for Prolog.
The Warren Abstract Machine is the most used virtual machine model used by 
Prolog implementors<cite>WAM</cite>
	</para>
	</section><!-- end pattern 'Semantic Evaluator' -->

<section><title>Runtime</title>
<!-- Joel rewrites -->
<section numbered='false'><title>Problem</title>
<para>
You have an algorithmic language and need to evaluate it.
	</para>
	</section>

<section numbered='false'><title>Context</title>
<para>
You are developing a language which posses features or paradigms difficult to
realize in existing implementation languages due to low level limitations such
as memory allocation handling or stack treatment.
	</para>
	</section>

<section numbered='false'><title>Forces</title>
<ul>
	<li>You have a good number of development resources to devote to language
	processing.</li>
	<li>The existing runtime environments lacks desired robustness.</li>
</ul>
	</section>

<section numbered='false'><title>Solution</title>
<para>
Write your own <b>Runtime</b> environment. A <b>Runtime</b>
handles much of the lower level details of program execution such as memory
management, stack and binary layout, and library loading. 
	</para>
<para>
<b>Runtime</b>s require a great deal of programmer expertise and time
but they are one of the only methods for providing certain categories of
language features such as very specialized memory management or unusual
execution methods.
	</para>
	</section>
	</section><!-- end pattern 'Runtime' -->

<section><title>Interpreter</title>
<!-- Joel rewrites -->
<section numbered='false'><title>Problem</title>
<para>You have a algorithmic language and need to evaluate it.
	</para>
	</section>

<section numbered='false'><title>Context</title>
<para>The source program has been transformed into some internal
representation and it is to be executed immediately, rather than translated
into another form.
	</para>
	</section>

<section numbered='false'><title>Forces</title>
<ul>

    <li>The execution speed or memory space is not tightly constrained.</li>
    <li>Portability is important.</li>
    <li>The developers don't have knowledge of the processor.</li>
    <li>The problem domain of the language is straightforward and does not
    require semantic evaluation.</li>
    <li>The definition of the ``instructions'' of the interpreter is less
well-defined than that of a <bf>Virtual Machine</bf>.</li>
</ul>
	</section>

<section numbered='false'><title>Solution</title>
<para>Use an interpreter to evaluate the language.
An interpreter treats the intermediate representation (IR) as the
``instructions'' for an execution engine, rather than translating the IR
into another language.
These instructions are not necessarily linear, but can take other forms.
        </para>
<para>
This pattern was first described in <cite>GoF</cite> and elaborated on in
<cite>BrownSmalltalkCompanion</cite>.  However, in both cases, the definition of
interpreter is narrower than is usually meant in the language implementation
community.  The emphasis on ASTs in the pattern version slights other
techniques, such as linear forms <cite>HOCunixProgrammingEnvironment</cite> for
evaluating a language.
	</para>
	</section>
	</section><!-- end 'Interpreter' -->

<section><title>Language Output</title>
<!-- Joel rewrites -->
<section numbered='false'><title>Problem</title>
<para>
	</para>
	</section><!-- end 'Problem' -->

<section numbered='false'><title>Context</title>
<para>
<b>Language Output</b> is frequently used in combination with <b>Lexical
Transformation</b> in order to realize <b>Embedded Languages</b>. Many
multi-stage compilers use <b>Language Output</b> several times, some C
compilers sometimes go from C (with macros) to C (without macros) to
Assembly. In addition, the natural results of some systems, such as those
which produce PostScript or PDF documents, are Language based.
	</para>
	</section><!-- end 'Context' -->

<section numbered='false'><title>Forces</title>
<para>
<ul>
	<li>Your language processing environment needs to produce output in
some other language.</li>
</ul>
        </para>
	</section><!-- end 'Forces' -->

<section numbered='false'><title>Solution</title>
<para>
If the output language supports some form of comment feature, add a note
that the code was generated, and if possible, by what processing system,
and from what input files.
	</para>
	</section><!-- end 'Solution' -->
	</section><!-- end pattern 'Language Output' -->

	</section><!-- end "The Pattern Language" -->


<bibliography>
	</bibliography>

</article>
