NAME
acd - a compiler driver
SYNOPSIS
acd -v[n] -vn[n] -name name -descr descr -T dir [arg ...]
DESCRIPTION
Acd is a compiler driver, a program that calls the several
passes that are needed to compile a source file. It keeps
track of all the temporary files used between the passes.
It also defines the interface of the compiler, the options
the user gets to see.
This text only describes acd itself, it says nothing about
the different options the C-compiler accepts. (It has noth-
ing to do with any language, other than being a tool to give
a compiler a user interface.)
OPTIONS
Acd itself takes five options:
-v[n]
Sets the diagnostic level to n (by default 2). The
higher n is, the more output acd generates: -v0 does
not produce any output. -v1 prints the basenames of
the programs called. -v2 prints names and arguments of
the programs called. -v3 shows the commands executed
from the description file too. -v4 shows the program
read from the description file too. Levels 3 and 4 use
backspace overstrikes that look good when viewing the
output with a smart pager.
-vn[n]
Like -v except that no command is executed. The driver
is just play-acting.
-name name
Acd is normally linked to the name the compiler is to
be called with by the user. The basename of this, say
cc, is the call name of the driver. It plays a role in
selecting the proper description file. With the -name
option one can change this. Acd -name cc has the same
effect as calling the program as cc.
-descr descr
Allows one to choose the pass description file of the
driver. By default descr is the same as name, the call
name of the program. If descr doesn't start with /,
./, or ../ then the file /usr/lib/descr/descr will be
used for the description, otherwise descr itself. Thus
cc -descr newcc calls the C-compiler with a different
description file without changing the call name.
Finally, if descr is "-", standard input is read. (The
default lib directory /usr/lib, may be changed to dir
at compile time by -DLIB=\"dir\". The default descr
may be set with -DDESCR=\"descr\" for simple installa-
tions on a system without symlinks.)
-T dir
Temporary files are made in /tmp by default, which may
be overridden by the environment variable TMPDIR, which
may be overridden by the -T option.
THE DESCRIPTION FILE
The description file is a program interpreted by the driver.
It has variables, lists of files, argument parsing commands,
and rules for transforming input files.
Syntax
There are four simple objects:
Words, Substitutions, Letters, and Operators.
And there are two ways to group objects:
Lists, forming sequences of anything but letters,
Strings, forming sequences of anything but Words and
Operators.
Each object has the following syntax:
Words
They are sequences of characters, like cc,
-I/usr/include, /lib/cpp. No whitespace and no special
characters. The backslash character (\) may be used to
make special characters common, except whitespace. A
backslash followed by whitespace is completely removed
from the input. The sequence \n is changed to a new-
line.
Substitutions
A substitution (henceforth called 'subst') is formed
with a $, e.g. $opt, $PATH, ${lib}, $*. The variable
name after the $ is made of letters, digits and under-
scores, or any sequence of characters between
parentheses or braces, or a single other character. A
subst indicates that the value of the named variable
must be substituted in the list or string when fully
evaluated.
Letters
Letters are the single characters that would make up a
word.
Operators
The characters =, +, -, *, <, and > are the operators.
The first four must be surrounded by whitespace if they
are to be seen as special (they are often used in argu-
ments). The last two are always special.
Lists
One line of objects in the description file forms a
list. Put parentheses around it and you have a sub-
list. The values of variables are lists.
Strings
Anything that is not yet a word is a string. All it
needs is that the substs in it are evaluated, e.g.
$LIBPATH/lib$key.a. A single subst doesn't make a
string, it expands to a list. You need at least one
letter or other subst next to it. Strings (and words)
may also be formed by enclosing them in double quotes.
Only \ and $ keep their special meaning within quotes.
Evaluation
One thing has to be carefully understood: Substitutions are
delayed until the last possible moment, and description
files make heavy use of this. Only if a subst is tainted,
either because its variable is declared local, or because a
subst in its variable's value is tainted, is it immediately
substituted. So if a list is assigned to a variable then
this list is only checked for tainted substs. Those substs
are replaced by the value of their variable. This is called
partial evaluation.
Full evaluation expands all substs, the list is flattened,
i.e. all parentheses are removed from sublists.
Implosive evaluation is the last that has to be done to a
list before it can be used as a command to execute. The
substs within a string have been evaluated to lists after
full expansion, but a string must be turned into a single
word, not a list. To make this happen, a string is first
exploded to all possible combinations of words choosing one
member of the lists within the string. These words are
tried one by one to see if they exist as a file. The first
one that exists is taken, if none exists than the first
choice is used. As an example, assume LIBPATH equals (/lib
/usr/lib), key is (c) and key happens to be local. Then we
have:
"$LIBPATH/lib$key.a"
before evaluation,
"$LIBPATH/lib(c).a"
after partial evaluation,
"(/lib/libc.a /usr/lib/libc.a)"
after full evaluation, and finally
/usr/lib/libc.a
after implosion, if the file exists.
Operators
The operators modify the way evaluation is done and perform
a special function on a list:
* Forces full evaluation on all the list elements follow-
ing it. Use it to force substitution of the current
value of a variable. This is the only operator that
forces immediate evaluation.
+ When a + exists in a list that is fully evaluated, then
all the elements before the + are imploded and all ele-
ments after the + are imploded and added to the list if
they are not already in the list. So this operator can
be used either for set addition, or to force implosive
expansion within a sublist.
- Like +, except that elements after the - are removed
from the list.
The set operators can be used to gather options that exclude
each other or for their side effect of implosive expansion.
You may want to write:
cpp -I$LIBPATH/include
to call cpp with an extra include directory, but $LIBPATH is
expanded using a filename starting with -I so this won't
work. Given that any problem in Computer Science can be
solved with an extra level of indirection, use this instead:
cpp -I$INCLUDE
INCLUDE = $LIBPATH/include +
Special Variables
There are three special variables used in a description
file: $*, $<, and $>. These variables are always local and
mostly read-only. They will be explained later.
A Program
The lists in a description file form a program that is
executed from the first to the last list. The first word in
a list may be recognized as a builtin command (only if the
first list element is indeed simply a word.) If it is not a
builtin command then the list is imploded and used as a UNIX
command with arguments.
Indentation (by tabs or spaces) is not just makeup for a
program, but are used to group lines together. Some builtin
commands need a body. These bodies are simply lines at a
deeper indentation.
Empty lines are not ignored either, they have the same
indentation level as the line before it. Comments (starting
with a # and ending at end of line) have an indentation of
their own and can be used as null commands.
Acd will complain about unexpected indentation shifts and
empty bodies. Commands can share the same body by placing
them at the same indentation level before the indented body.
They are then "guards" to the same body, and are tried one
by one until one succeeds, after which the body is executed.
Semicolons may be used to separate commands instead of new-
lines. The commands are then all at the indentation level
of the first.
Execution phases
The driver runs in three phases: Initialization, Argument
scanning, and Compilation. Not all commands work in all
phases. This is further explained below.
The Commands
The commands accept arguments that are usually generic
expressions that implode to a word or a list of words. When
var is specified, then a single word or subst needs to be
given, so an assignment can be either name = value, or $name
= value.
var = expr ...
The partially evaluated list of expressions is assigned
to var. During the evaluation is var marked as local,
and after the assignment set from undefined to defined.
unset var
Var is set to null and is marked as undefined.
import var
If var is defined in the environment of acd then it is
assigned to var. The environment variable is split
into words at whitespace and colons. Empty space
between two colons (::) is changed to a dot.
mktemp var [suffix]
Assigns to var the name of a new temporary file, usu-
ally something like /tmp/acd12345x. If suffix is
present then it will be added to the temporary file's
name. (Use it because some programs require it, or
just because it looks good.) Acd remembers this file,
and will delete it as soon as you stop referencing it.
temporary word
Mark the file named by word as a temporary file. You
have to make sure that the name is stored in some list
in imploded form, and not just temporarily created when
word is evaluated, because then it will be immediately
removed and forgotten.
stop suffix
Sets the target suffix for the compilation phase.
Something like stop .o means that the source files must
be compiled to object files. At least one stop command
must be executed before the compilation phase begins.
It may not be changed during the compilation phase.
(Note: There is no restriction on suffix, it need not
start with a dot.)
treat file suffix
Marks the file as having the given suffix for the com-
pile phase. Useful for sending a -l option directly to
the loader by treating it as having the .a suffix.
numeric arg
Checks if arg is a number. If not then acd will exit
with a nice error message.
error expr ...
Makes the driver print the error message expr ... and
exit.
if expr = expr
If tests if the two expressions are equal using set
comparison, i.e. each expression should contain all the
words in the other expression. If the test succeeds
then the if-body is executed.
ifdef var
Executes the ifdef-body if var is defined.
ifndef var
Executes the ifndef-body if var is undefined.
iftemp arg
Executes the iftemp-body if arg is a temporary file.
Use it when a command has the same file as input and
output and you don't want to clobber the source file:
transform .o .o
iftemp $*
$> = $*
else
cp $* $>
optimize $>
ifhash arg
Executes the ifhash-body if arg is an existing file
with a '#' as the very first character. This usually
indicates that the file must be pre-processed:
transform .s .o
ifhash $*
mktemp ASM .s
$CPP $* > $ASM
else
ASM = $*
$AS -o $> $ASM
unset ASM
else Executes the else-body if the last executed if, ifdef,
ifndef, iftemp, or ifhash was unsuccessful. Note that
else need not immediately follow an if, but you are
advised not to make use of this. It is a "feature"
that may not last.
apply suffix1 suffix2
Executed inside a transform rule body to transform the
input file according to another transform rule that has
the given input and output suffixes. The file under $*
will be replaced by the new file. So if there is a .c
.i preprocessor rule then the example of ifhash can be
replaced by:
transform .s .o
ifhash $*
apply .c .i
$AS -o $> $*
include descr
Reads another description file and replaces the include
with it. Execution continues with the first list in
the new program. The search for descr is the same as
used for the -descr option. Use include to switch in
different front ends or back ends, or to call a shared
description file with a different initialization. Note
that descr is only evaluated the first time the include
is called. After that the include has been replaced
with the included program, so changing its argument
won't get you a different file.
arg string ...
Arg may be executed in the initialization and scanning
phase to post an argument scanning rule, that's all the
command itself does. Like an if that fails it allows
more guards to share the same body.
transform suffix1 suffix2
Transform, like arg, only posts a rule to transform a
file with the suffix suffix1 into a file with the suf-
fix suffix2.
prefer suffix1 suffix2
Tells that the transformation rule from suffix1 to suf-
fix2 is to be preferred when looking for a transforma-
tion path to the stop suffix. Normally the shortest
route to the stop suffix is used. Prefer is ignored on
a combine, because the special nature of combines does
not allow ambiguity.
The two suffixes on a transform or prefer may be the
same, giving a rule that is only executed when pre-
ferred.
combine suffix-list suffix
Combine is like transform except that it allows a list
of input suffixes to match several types of input files
that must be combined into one.
scan The scanning phase may be run early from the initiali-
zation phase with the scan command. Use it if you need
to make choices based on the arguments before posting
the transformation rules. After running this, scan and
arg become no-ops.
compile
Move on to the compilation phase early, so that you
have a chance to run a few extra commands before exit-
ing. This command implies a scan.
Any other command is seen as a UNIX command. This is where
the < and > operators come into play. They redirect stan-
dard input and standard output to the file mentioned after
them, just like the shell. Acd will stop with an error if
the command is not successful.
The Initialization Phase
The driver starts by executing the program once from top to
bottom to initialize variables and post argument scanning
and transformation rules.
The Scanning Phase
In this phase the driver makes a pass over the command line
arguments to process options. Each arg rule is tried one by
one in the order they were posted against the front of the
argument list. If a match is made then the matched argu-
ments are removed from the argument list and the arg-body is
executed. If no match can be made then the first argument
is moved to the list of files waiting to be transformed and
the scan is restarted.
The match is done as follows: Each of the strings after arg
must match one argument at the front of the argument list.
A character in a string must match a character in an argu-
ment word, a subst in a string may match 1 to all remaining
characters in the argument, preferring the shortest possible
match. The hyphen in a argument starting with a hyphen can-
not be matched by a subst. Therefore:
arg -i
matches only the argument -i.
arg -O$n
matches any argument that starts with -O and is at least
three characters long. Lastly,
arg -o $out
matches -o and the argument following it, unless that argu-
ment starts with a hyphen.
The variable $* is set to all the matched arguments before
the arg-body is executed. All the substs in the arg strings
are set to the characters they match. The variable $> is
set to null. All the values of the variables are saved and
the variables marked local. All variables except $> are
marked read-only. After the arg-body is executed is the
value of $> concatenated to the file list. This allows one
to stuff new files into the transformation phase. These
added names are not evaluated until the start of the next
phase.
The Compilation Phase
The files gathered in the file list in the scanning phase
are now transformed one by one using the transformation
rules. The shortest, or preferred route is computed for
each file all the way to the stop suffix. Each file is
transformed until it lands at the stop suffix, or at a com-
bine rule. After a while all files are either fully
transformed or at a combine rule.
The driver chooses a combine rule that is not on a path from
another combine rule and executes it. The file that results
is then transformed until it again lands at a combine rule
or the stop suffix. This continues until all files are at
the stop suffix and the program exits.
The paths through transform rules may be ambiguous and have
cycles, they will be resolved. But paths through combines
must be unambiguous, because of the many paths from the dif-
ferent files that meet there. A description file will usu-
ally have only one combine rule for the loader. However if
you do have a combine conflict then put a no-op transform
rule in front of one to resolve the problem.
If a file matches a long and a short suffix then the long
suffix is preferred. By putting a null input suffix ("") in
a rule one can match any file that no other rule matches.
You can send unknown files to the loader this way.
The variable $* is set to the file to be transformed or the
files to be combined before the transform or combine-body is
executed. $> is set to the output file name, it may again
be modified. $< is set to the original name of the first
file of $* with the leading directories and the suffix
removed. $* will be made up of temporary files after the
first rule. $> will be another temporary file or the name
of the target file ($< plus the stop suffix), if the stop
suffix is reached.
$> is passed to the next rule; it is imploded and checked to
be a single word. This driver does not store intermediate
object files in the current directory like most other com-
pilers, but keeps them in /tmp too. (Who knows if the
current directory can have files created in?) As an exam-
ple, here is how you can express the "normal" method:
transform .s .o
if $> = $<.o
# Stop suffix is .o
else
$> = $<.o
temporary $>
$AS -o $> $*
Note that temporary is not called if the target is already
the object file, or you would lose the intended result! $>
is known to be a word, because $< is local. (Any string
whose substs are all expanded changes to a word.)
Predefined Variables
The driver has three variables predefined: PROGRAM, set to
the call name of the driver, VERSION, the driver's version
number, and ARCH, set to the name of the default output
architecture. The latter is optional, and only defined if
acd was compiled with -DARCH=\"arch-name\".
EXAMPLE
As an example a description file for a C compiler is given.
It has a front end (ccom), an intermediate code optimizer
(opt), a code generator (cg), an assembler (as), and a
loader (ld). The compiler can pre-process, but there is
also a separate cpp. If the -D and options like it are
changed to look like -o then this example is even as
required by POSIX.
# The compiler support search path.
C = /lib /usr/lib /usr/local/lib
# Compiler passes.
CPP = $C/cpp $CPP_F
CCOM = $C/ccom $CPP_F
OPT = $C/opt
CG = $C/cg
AS = $C/as
LD = $C/ld
# Predefined symbols.
CPP_F = -D__EXAMPLE_CC__
# Library path.
LIBPATH = $USERLIBPATH $C
# Default transformation target.
stop .out
# Preprocessor directives.
arg -D$name
arg -U$name
arg -I$dir
CPP_F = $CPP_F $*
# Stop suffix.
arg -c
stop .o
arg -E
stop .E
# Optimization.
arg -O
prefer .m .m
OPT = $OPT -O1
arg -O$n
numeric $n
prefer .m .m
OPT = $OPT $*
# Add debug info to the executable.
arg -g
CCOM = $CCOM -g
# Add directories to the library path.
arg -L$dir
USERLIBPATH = $USERLIBPATH $dir
# -llib must be searched in $LIBPATH later.
arg -l$lib
$> = $LIBPATH/lib$lib.a
# Change output file.
arg -o$out
arg -o $out
OUT = $out
# Complain about a missing argument.
arg -o
error "argument expected after '$*'"
# Any other option (like -s) are for the loader.
arg -$any
LD = $LD $*
# Preprocess C-source.
transform .c .i
$CPP $* > $>
# Preprocess C-source and send it to standard output or $OUT.
transform .c .E
ifndef OUT
$CPP $*
else
$CPP $* > $OUT
# Compile C-source to intermediate code.
transform .c .m
transform .i .m
$CCOM $* $>
# Intermediate code optimizer.
transform .m .m
$OPT $* > $>
# Intermediate to assembly.
transform .m .s
$CG $* > $>
# Assembler to object code.
transform .s .o
if $> = $<.o
ifdef OUT
$> = $OUT
$AS -o $> $*
# Combine object files and libraries to an executable.
combine (.o .a) .out
ifndef OUT
OUT = a.out
$LD -o $OUT $C/crtso.o $* $C/libc.a
FILES
/usr/lib/descr/descr - compiler driver description file.
SEE ALSO
cc(1).
ACKNOWLEDGEMENTS
Even though the end result doesn't look much like it, many
ideas were nevertheless derived from the ACK compiler driver
by Ed Keizer.
BUGS
POSIX requires that if compiling one source file to an
object file fails then the compiler should continue with the
next source file. There is no way acd can do this, it
always stops after error. It doesn't even know what an
object file is! (The requirement is stupid anyhow.)
If you don't think that tabs are 8 spaces wide, then don't
mix them with spaces for indentation.
AUTHOR
Kees J. Bot (kjb@cs.vu.nl)