As a general rule, it's a good idea to avoid controlling
automatically generated files. Instead, it's often best to have
the generation of those files be part of the build process.
Sometimes, however, you might find yourself in a situation where
the tool used to create the generated file may not always be
available. Perhaps it's a specialized tool that requires
separate installation or licensing but whose output is generally
usable. In cases such as this, it would be helpful if the build
system would cache the generated files and use the cached files
if all the input files are up to date. This is the functionality
provided by codegen-wrapper, located in
abuild's util
directory, and accessible
through use of the $(CODEGEN_WRAPPER)
variable
within user-supplied make rules.
The codegen-wrapper command can handle the situation described above for relatively simple cases, but it is likely to be good enough for many situations. For details on its syntax, please run it with no options to get a summary. It works as follows:
The codegen-wrapper command the following inputs:
a cache directory, which must exist in advance
a list of input files
a list of output files
a command to generate the output files from the input files
The codegen-wrapper checks the following prerequisites:
For each input file infile
, see if the
file infile.md5
exists in the cache
directory and contains the md5 checksum of
infile
. You may pass the
--normalize-line-endings
flag to
codegen-wrapper to have it disregard
differences in line endings (carriage return + newline
vs. newline) when computing checksums.
For each output file outfile
, see if
a file called outfile
exists in the
cache directory.
If all of the above prerequisites are satisfied, codegen-wrapper copies the output files from the cache directory into the output directory. Otherwise, codegen-wrapper runs the specified command. If the command succeeded and generated all the expected output files, codegen-wrapper updates the checksums of the input files and copies all the generated files into the cache directory. Note that the cache directory is expected to be a controlled directory that is part of your source tree. As such, it is likely that codegen-wrapper will actually update files in the cache directory which you will subsequently have to check into your version control system.
Let's now look at an example. We have an example that provides a simple code generator. This generator reads an input file and, based on annotations in the file, repeats some input lines into an output file. However, its exact functionality is not important; for purposes of this example, all we need to care about is that it generates some output file from an input file.
To use this code generator, we'll adopt a convention that any
input file passed to the code generator will generate a file by
the same name appended with the .rpt
suffix. The code generator build item will require that any
input files be named in the variable INPUT
.
For each file named in $(INPUT)
, it will
the corresponding .rpt
file using the code
generator. If the variable REPEATER_CACHE
is
defined, the build item will use that as the cache directory. We
implement that with the following rule fragment:
codegen-wrapper/repeater/rules/all/repeater.mk
_UNDEFINED := $(call undefined_vars,\ INPUT) ifneq ($(words $(_UNDEFINED)),0) $(error The following variables are undefined: $(_UNDEFINED)) endif all:: $(foreach I,$(INPUT),$(I).rpt) define rpt_command perl $(abDIR_repeater)/repeater.pl -i $< -o $@ endef $(INPUT:%=%.rpt): %.rpt: % @$(PRINT) Generating $@ from $< with repeater ifdef REPEATER_CACHE $(CODEGEN_WRAPPER) --cache $(REPEATER_CACHE) \ --input $< --output $@ --command $(rpt_command) else $(rpt_command) endif
There's a lot here, so let's go through it line by line. At the
beginning, we see the normal check for undefined variables. We
want to make sure that the INPUT
variable is
defined. (Obviously, a real build item would have to come up
with a better, less generic name than this.) Next, we add all
the .rpt
lines to the all
target, as usual, by adding them as dependencies of
all specified with two colons, indicating that
there are multiple all targets. So far,
there's nothing different from any other code generator.
Next, we define a macro rpt_command
which
actually runs the command to generate the files. Note that, in
this case, the code generator lives right in the build item, so
there's really not much reason to use
codegen-wrapper with it. But our purpose here
is to demonstrate codegen-wrapper, so we'll
use it! When defining this macro, we make use of the variables
$<
and $@
. These are
predefined make variables that, when evaluated in the context of
a rule, refer to the first prerequisite and the target
respectively. They aren't valid at the point where the macro is
defined, but they are valid at the point where it is expanded,
which is what's relevant. We don't really have to define a macro
for this, but doing so helps us to avoid having to repeat the
invocation of the code generator, which might be involved in some
cases.
Finally, there's the rule itself. This is a typically GNU Make
pattern rule that generates a .rpt
file from
an input file without the suffix. The complete rule is prefixed
with the list of output files, thus restricting it to only apply
on this files. Within the rule definition itself, we make the
generation step conditional upon whether the
REPEATER_CACHE
variable is defined. The
effect of the ifdef is applied at the time the
file is read, no at the time the rule is run, but this is okay
because the rule implementation file is always loaded after
Abuild.mk
. When
REPEATER_CACHE
is not defined, we just run the
repeater command normally. When it is defined, we run it with
$(CODEGEN_WRAPPER)
, specifying the cache
directory, the inptu files, the output files, and the commands
using arguments to the codegen-wrapper command
as invoked through the $(CODEGEN_WRAPPER)
variable.
Let's look at two build items that use these rules. They both
set their RULES
variable to include
repeater
. Both build items set the
INPUT
variable. Only the second one sets the
REPEATER_CACHE
variable. Here are the
Abuild.mk
file:
codegen-wrapper/user1/Abuild.mk
INPUT := file1 file2 RULES := repeater
codegen-wrapper/user2/Abuild.mk
REPEATER_CACHE := cache INPUT := file1 file2 RULES := repeater
Assuming that we start off with an empty cache directory, here is what the first build from scratch with abuild -b all would generate:
repeater-pass1.out
abuild: build starting abuild: user1 (abuild-indep): all make: Entering directory `--topdir--/codegen-wrapper/user1/abuild-indep' Generating file1.rpt from ../file1 with repeater Generating file2.rpt from ../file2 with repeater make: Leaving directory `--topdir--/codegen-wrapper/user1/abuild-indep' abuild: user2 (abuild-indep): all make: Entering directory `--topdir--/codegen-wrapper/user2/abuild-indep' Generating file1.rpt from ../file1 with repeater codegen-wrapper: generation succeeded; cache updated Generating file2.rpt from ../file2 with repeater codegen-wrapper: generation succeeded; cache updated make: Leaving directory `--topdir--/codegen-wrapper/user2/abuild-indep' abuild: build complete
Note that, for the build item user1
, we just
saw the messages that the output files were generated from the
input files. For user2
, you can see
messages from codegen-wrapper indicating that
generation succeeded and that it has updated the cache.
If we built again right away, the output files would already
exist and be newer than the input files, so the rule wouldn't
even trigger. Therefore we have to first clean everything with
abuild -c all to demonstrate the cache
functionality. If you're following along, you'll notice that the
directory codegen-wrapper/user2/cache
now
contains four files: file1.md5
,
file1.rpt
, file2.md5
,
and file2.rpt
. Here's the output of a
second build from clean with abuild -b all:
repeater-pass2.out
abuild: build starting abuild: user1 (abuild-indep): all make: Entering directory `--topdir--/codegen-wrapper/user1/abuild-indep' Generating file1.rpt from ../file1 with repeater Generating file2.rpt from ../file2 with repeater make: Leaving directory `--topdir--/codegen-wrapper/user1/abuild-indep' abuild: user2 (abuild-indep): all make: Entering directory `--topdir--/codegen-wrapper/user2/abuild-indep' Generating file1.rpt from ../file1 with repeater codegen-wrapper: files are up to date; using cached output files Generating file2.rpt from ../file2 with repeater codegen-wrapper: files are up to date; using cached output files make: Leaving directory `--topdir--/codegen-wrapper/user2/abuild-indep' abuild: build complete
This time, the build of user1
looks the
same, but the build of user2
is different.
Instead of actually running the command to generate the output,
we see codegen-wrapper telling us that files
are up to date and that it is using the cached files.
The best part about this is that if we modify one of the input
files, the cache will get automatically updated. Without doing a
clean, we can add some line to the end of
codegen-wrapper/user2/file2
and run another
build with abuild -b all. That generates the
following output:
repeater-mod-pass1.out
abuild: build starting abuild: user1 (abuild-indep): all abuild: user2 (abuild-indep): all make: Entering directory `--topdir--/codegen-wrapper/user2/abuild-indep' Generating file2.rpt from ../file2 with repeater codegen-wrapper: generation succeeded; cache updated make: Leaving directory `--topdir--/codegen-wrapper/user2/abuild-indep' abuild: build complete
Nothing happened in build item user1
at all
since everything was up to date. Likewise, we see no mention of
file1
in user2
.
However, for file2
in
user2
, we once again see the output from
codegen-wrapper indicating that generation
succeeded and that it has updated the cache. Doing another clean
build abuild -c all followed by
abuild -b all, we once again see that files
from the cache are used:
repeater-pass2.out
abuild: build starting abuild: user1 (abuild-indep): all make: Entering directory `--topdir--/codegen-wrapper/user1/abuild-indep' Generating file1.rpt from ../file1 with repeater Generating file2.rpt from ../file2 with repeater make: Leaving directory `--topdir--/codegen-wrapper/user1/abuild-indep' abuild: user2 (abuild-indep): all make: Entering directory `--topdir--/codegen-wrapper/user2/abuild-indep' Generating file1.rpt from ../file1 with repeater codegen-wrapper: files are up to date; using cached output files Generating file2.rpt from ../file2 with repeater codegen-wrapper: files are up to date; using cached output files make: Leaving directory `--topdir--/codegen-wrapper/user2/abuild-indep' abuild: build complete
There's a lot to swallow here, but you will hopefully recognize the power and usefulness of such an approach. Hopefully, the codegen-wrapper tool will meet some of your needs. Even if it doesn't, it may provide a starting point. Here are a few things to take away from this example:
Writing code generators is always going to require some advanced make coding. The incremental complexity added by codegen-wrapper is relatively low, so for simple code generators, enhancing them to use this utility should be reasonably straightforward.
The codegen-wrapper tool doesn't do
anything fancy with respect to knowing how to generate output
file names from input file names. Instead, we just pass the
actual names to it on the command line. Using the make
variables $<
and $@
makes this easy. Sometimes there may be multiple input files
and/or multiple output files. Handling multiple input files
is fairly easy. The make variable $^
contains all the prerequisites for a given target while
$<
contains the first prerequisite.
Using $<
or $^
for
your input files and $@
for your output
files is nice when you can get away with it because all the
handling of finding input files in ..
(through make's VPATH
feature) is handled
for you automatically.
Handling multiple output files may be a bit trickier, but it
can still be done. You may need to experiment a little.
Often you will find that make will pick whichever target it
tries to create first as $@ and that the
rule will be invoked only one time. In this case, you may
have to generate your output file names yourself. Sometimes
you can do this by defining them relative to
$@, which you should do if at all
possible. For an example of this, you can look at
make/standard-code-generators.mk
in your
abuild distribution. This code uses
codegen-wrapper for flex and bison. The
bison rules generate multiple output files from a single input
file and generate the multiple output names from
$@
in this way.
In our little example, the code generator was always available, so when we modified the input file, everything worked. If the code generator were not available or if it failed, codegen-wrapper would fail with the same exit status and would not updated the cache.