22.6. Caching Generated Files

22.6. Caching Generated Files
Prev	Chapter 22. Build Item Rules and Automatically Generated Code	Next

As a general rule, it's a good idea to avoid controlling automatically generated files. Instead, it's often best to have the generation of those files be part of the build process. Sometimes, however, you might find yourself in a situation where the tool used to create the generated file may not always be available. Perhaps it's a specialized tool that requires separate installation or licensing but whose output is generally usable. In cases such as this, it would be helpful if the build system would cache the generated files and use the cached files if all the input files are up to date. This is the functionality provided by codegen-wrapper, located in abuild's util directory, and accessible through use of the $(CODEGEN_WRAPPER) variable within user-supplied make rules.

The codegen-wrapper command can handle the situation described above for relatively simple cases, but it is likely to be good enough for many situations. For details on its syntax, please run it with no options to get a summary. It works as follows:

The codegen-wrapper command the following inputs:
- a cache directory, which must exist in advance
- a list of input files
- a list of output files
- a command to generate the output files from the input files
The codegen-wrapper checks the following prerequisites:
- For each input file infile, see if the file infile.md5 exists in the cache directory and contains the md5 checksum of infile. You may pass the --normalize-line-endings flag to codegen-wrapper to have it disregard differences in line endings (carriage return + newline vs. newline) when computing checksums.
- For each output file outfile, see if a file called outfile exists in the cache directory.

If all of the above prerequisites are satisfied, codegen-wrapper copies the output files from the cache directory into the output directory. Otherwise, codegen-wrapper runs the specified command. If the command succeeded and generated all the expected output files, codegen-wrapper updates the checksums of the input files and copies all the generated files into the cache directory. Note that the cache directory is expected to be a controlled directory that is part of your source tree. As such, it is likely that codegen-wrapper will actually update files in the cache directory which you will subsequently have to check into your version control system.

22.6.1. Caching Generated Files Example

Let's now look at an example. We have an example that provides a simple code generator. This generator reads an input file and, based on annotations in the file, repeats some input lines into an output file. However, its exact functionality is not important; for purposes of this example, all we need to care about is that it generates some output file from an input file.

To use this code generator, we'll adopt a convention that any input file passed to the code generator will generate a file by the same name appended with the .rpt suffix. The code generator build item will require that any input files be named in the variable INPUT. For each file named in $(INPUT), it will the corresponding .rpt file using the code generator. If the variable REPEATER_CACHE is defined, the build item will use that as the cache directory. We implement that with the following rule fragment:

codegen-wrapper/repeater/rules/all/repeater.mk

_UNDEFINED := $(call undefined_vars,\
                INPUT)
ifneq ($(words $(_UNDEFINED)),0)
$(error The following variables are undefined: $(_UNDEFINED))
endif

all:: $(foreach I,$(INPUT),$(I).rpt)

define rpt_command
        perl $(abDIR_repeater)/repeater.pl -i $< -o $@
endef

$(INPUT:%=%.rpt): %.rpt: %
        @$(PRINT) Generating $@ from $< with repeater
ifdef REPEATER_CACHE
        $(CODEGEN_WRAPPER) --cache $(REPEATER_CACHE) \
            --input $< --output $@ --command $(rpt_command)
else
        $(rpt_command)
endif

There's a lot here, so let's go through it line by line. At the beginning, we see the normal check for undefined variables. We want to make sure that the INPUT variable is defined. (Obviously, a real build item would have to come up with a better, less generic name than this.) Next, we add all the .rpt lines to the all target, as usual, by adding them as dependencies of all specified with two colons, indicating that there are multiple all targets. So far, there's nothing different from any other code generator.

Next, we define a macro rpt_command which actually runs the command to generate the files. Note that, in this case, the code generator lives right in the build item, so there's really not much reason to use codegen-wrapper with it. But our purpose here is to demonstrate codegen-wrapper, so we'll use it! When defining this macro, we make use of the variables $< and $@. These are predefined make variables that, when evaluated in the context of a rule, refer to the first prerequisite and the target respectively. They aren't valid at the point where the macro is defined, but they are valid at the point where it is expanded, which is what's relevant. We don't really have to define a macro for this, but doing so helps us to avoid having to repeat the invocation of the code generator, which might be involved in some cases.

Finally, there's the rule itself. This is a typically GNU Make pattern rule that generates a .rpt file from an input file without the suffix. The complete rule is prefixed with the list of output files, thus restricting it to only apply on this files. Within the rule definition itself, we make the generation step conditional upon whether the REPEATER_CACHE variable is defined. The effect of the ifdef is applied at the time the file is read, no at the time the rule is run, but this is okay because the rule implementation file is always loaded after Abuild.mk. When REPEATER_CACHE is not defined, we just run the repeater command normally. When it is defined, we run it with $(CODEGEN_WRAPPER), specifying the cache directory, the inptu files, the output files, and the commands using arguments to the codegen-wrapper command as invoked through the $(CODEGEN_WRAPPER) variable.

Let's look at two build items that use these rules. They both set their RULES variable to include repeater. Both build items set the INPUT variable. Only the second one sets the REPEATER_CACHE variable. Here are the Abuild.mk file:

codegen-wrapper/user1/Abuild.mk

INPUT := file1 file2
RULES := repeater

codegen-wrapper/user2/Abuild.mk

REPEATER_CACHE := cache
INPUT := file1 file2
RULES := repeater

Assuming that we start off with an empty cache directory, here is what the first build from scratch with abuild -b all would generate:

repeater-pass1.out

abuild: build starting
abuild: user1 (abuild-indep): all
make: Entering directory `--topdir--/codegen-wrapper/user1/abuild-indep'
Generating file1.rpt from ../file1 with repeater
Generating file2.rpt from ../file2 with repeater
make: Leaving directory `--topdir--/codegen-wrapper/user1/abuild-indep'
abuild: user2 (abuild-indep): all
make: Entering directory `--topdir--/codegen-wrapper/user2/abuild-indep'
Generating file1.rpt from ../file1 with repeater
codegen-wrapper: generation succeeded; cache updated
Generating file2.rpt from ../file2 with repeater
codegen-wrapper: generation succeeded; cache updated
make: Leaving directory `--topdir--/codegen-wrapper/user2/abuild-indep'
abuild: build complete

Note that, for the build item user1, we just saw the messages that the output files were generated from the input files. For user2, you can see messages from codegen-wrapper indicating that generation succeeded and that it has updated the cache.

If we built again right away, the output files would already exist and be newer than the input files, so the rule wouldn't even trigger. Therefore we have to first clean everything with abuild -c all to demonstrate the cache functionality. If you're following along, you'll notice that the directory codegen-wrapper/user2/cache now contains four files: file1.md5, file1.rpt, file2.md5, and file2.rpt. Here's the output of a second build from clean with abuild -b all:

repeater-pass2.out

abuild: build starting
abuild: user1 (abuild-indep): all
make: Entering directory `--topdir--/codegen-wrapper/user1/abuild-indep'
Generating file1.rpt from ../file1 with repeater
Generating file2.rpt from ../file2 with repeater
make: Leaving directory `--topdir--/codegen-wrapper/user1/abuild-indep'
abuild: user2 (abuild-indep): all
make: Entering directory `--topdir--/codegen-wrapper/user2/abuild-indep'
Generating file1.rpt from ../file1 with repeater
codegen-wrapper: files are up to date; using cached output files
Generating file2.rpt from ../file2 with repeater
codegen-wrapper: files are up to date; using cached output files
make: Leaving directory `--topdir--/codegen-wrapper/user2/abuild-indep'
abuild: build complete

This time, the build of user1 looks the same, but the build of user2 is different. Instead of actually running the command to generate the output, we see codegen-wrapper telling us that files are up to date and that it is using the cached files.

The best part about this is that if we modify one of the input files, the cache will get automatically updated. Without doing a clean, we can add some line to the end of codegen-wrapper/user2/file2 and run another build with abuild -b all. That generates the following output:

repeater-mod-pass1.out

abuild: build starting
abuild: user1 (abuild-indep): all
abuild: user2 (abuild-indep): all
make: Entering directory `--topdir--/codegen-wrapper/user2/abuild-indep'
Generating file2.rpt from ../file2 with repeater
codegen-wrapper: generation succeeded; cache updated
make: Leaving directory `--topdir--/codegen-wrapper/user2/abuild-indep'
abuild: build complete

Nothing happened in build item user1 at all since everything was up to date. Likewise, we see no mention of file1 in user2. However, for file2 in user2, we once again see the output from codegen-wrapper indicating that generation succeeded and that it has updated the cache. Doing another clean build abuild -c all followed by abuild -b all, we once again see that files from the cache are used:

repeater-pass2.out

abuild: build starting
abuild: user1 (abuild-indep): all
make: Entering directory `--topdir--/codegen-wrapper/user1/abuild-indep'
Generating file1.rpt from ../file1 with repeater
Generating file2.rpt from ../file2 with repeater
make: Leaving directory `--topdir--/codegen-wrapper/user1/abuild-indep'
abuild: user2 (abuild-indep): all
make: Entering directory `--topdir--/codegen-wrapper/user2/abuild-indep'
Generating file1.rpt from ../file1 with repeater
codegen-wrapper: files are up to date; using cached output files
Generating file2.rpt from ../file2 with repeater
codegen-wrapper: files are up to date; using cached output files
make: Leaving directory `--topdir--/codegen-wrapper/user2/abuild-indep'
abuild: build complete

There's a lot to swallow here, but you will hopefully recognize the power and usefulness of such an approach. Hopefully, the codegen-wrapper tool will meet some of your needs. Even if it doesn't, it may provide a starting point. Here are a few things to take away from this example:

Writing code generators is always going to require some advanced make coding. The incremental complexity added by codegen-wrapper is relatively low, so for simple code generators, enhancing them to use this utility should be reasonably straightforward.
The codegen-wrapper tool doesn't do anything fancy with respect to knowing how to generate output file names from input file names. Instead, we just pass the actual names to it on the command line. Using the make variables $< and $@ makes this easy. Sometimes there may be multiple input files and/or multiple output files. Handling multiple input files is fairly easy. The make variable $^ contains all the prerequisites for a given target while $< contains the first prerequisite. Using $< or $^ for your input files and $@ for your output files is nice when you can get away with it because all the handling of finding input files in .. (through make's VPATH feature) is handled for you automatically.
Handling multiple output files may be a bit trickier, but it can still be done. You may need to experiment a little. Often you will find that make will pick whichever target it tries to create first as $@ and that the rule will be invoked only one time. In this case, you may have to generate your output file names yourself. Sometimes you can do this by defining them relative to $@, which you should do if at all possible. For an example of this, you can look at make/standard-code-generators.mk in your abuild distribution. This code uses codegen-wrapper for flex and bison. The bison rules generate multiple output files from a single input file and generate the multiple output names from $@ in this way.
In our little example, the code generator was always available, so when we modified the input file, everything worked. If the code generator were not available or if it failed, codegen-wrapper would fail with the same exit status and would not updated the cache.

Prev	Up	Next
22.5. Dependency on a Make Variable	Home	Chapter 23. Interface Flags