20.4. Parsing Output

20.4. Parsing Output
Prev	Chapter 20. Controlling and Processing Abuild's Output	Next

A principal goal of adding output capture modes, output prefixes, and error prefixes to abuild was to make it easier to programmatically parse abuild's output. By combining these features, it is possible to run abuild in batch mode and to then unambiguously associate each line of abuild's output with the specific platform build of the specific build item that was responsible for producing that line of output.

This section describes how such a parser could be implemented. You can also find an example parser implementation in misc/parse-build-output relative to the top of your abuild distribution. (You can always find the top of the abuild distribution by running abuild --print-abuild-top.) Since a Perl script is worth a thousand words (as they say), and since the parse-build-output script is actually tested in abuild's test suite, it can serve as a tool for helping you understand the details of abuild's output as well as being a great starting point for writing your own parser.

When abuild performs a build, the overall build consists of a check phase, a build phase, and a summary phase. In the check phase, abuild reads and validates Abuild.conf files, performs integrity checks, and so forth. Under normal conditions, the check phase doesn't produce any output. If everything is in order at the end of the check phase, the build phase begins. Immediately before beginning the build phase, abuild always outputs the line

abuild: build starting

Immediately following the build phase, abuild outputs the line

abuild: build complete

After the build phase is complete, abuild will output a summary of any failures that may have occurred as well as a report of the total duration of the build. Parsers may use the build starting and build complete lines as shown above to demarcate the build phase.

Within the build phase, output can be associated with a build item/platform pair (referred to here as a job) in the following way:

If output/error prefixes are specified, they always precede any job prefixes generated in interleaved mode. Strip them from the beginning of each line. For this to work unambiguously, it is easiest if you use output and error prefixes of the same length.
In interleaved mode, all lines of output that are part of a build start with a number enclosed in square brackets and followed by a single space. It is possible for some lines not to start this way, but such cases indicate an unusual error or failure condition and are discussed later in this section.
The first line of output from a build of a given item on a given platform will always start with
```
abuild: item-name (abuild-output-directory)
```
possibly followed by other text or punctuation. This will always be at the beginning of the line, after removing any output, error, or job prefixes.
In interleaved mode, the above can be parsed the first time a line appears with a given job prefix to associate the job prefix with the job.
In buffered mode, if a line that matches the above pattern is the first line to mention a specific item/platform pair, it marks the beginning of output for that job, and all subsequent lines until a line that indicates the start of a different job or the end of the build phase belong to that job.

There are a few exceptions to the above rules, but they only happen in cases of serious errors, and most parsers can safely ignore them, as long as they treat unexpected input as general error conditions. (The sample parser actually does take these cases into consideration.) Specifically, in both buffered and interleaved mode, certain major errors from the java builder process, such as abnormal termination or “rogue output” from the java backend, can result in asynchronous output from the java builder. ^[41] In interleaved builds with multiple threads, this output is prefixed with the string “[JavaBuilder] ”. In buffered builds, it is not marked in any way, but will always appear between the uninterrupted outputs from individual jobs. Most parsers would probably end up associating such output with the job that had most recently completed, which would probably be wrong, but again, this is a very rare case. In a single-threaded build, any rogue output from the java builder process would have to be related to the job that is in progress, so the fact that it is unmarked doesn't pose any problems.

In any case, any line of output that doesn't conform to the output that the parser expects should just be treated as a general error from abuild. Such a line either indicates a serious problem with abuild itself (such as an assertion failure or abnormal termination, probably indicating a bug in abuild or a system error) or a bug in the parser. Either way, the output should be preserved.

^[41] The java builder process may run multiple ant jobs in separate threads. It separates the output of different projects by creating each ant thread in a separate thread group and associating a job identifier with the thread group. There are two ways the java builder process could create rogue output: one is for an ant task to create a thread in a separate thread group and to have that thread write something to standard output or standard error, and the other is for the java builder process itself to generate output. The former case is very unlikely, and the latter case would indicate a bug in the java builder process, or a severe error such as failure of the JVM. Additionally, if the java builder process crashes, abuild will generate a message that indicates this, and that message would not be associated with any build.

Prev	Up	Next
20.3. Output Prefixes	Home	20.5. Caveats and Subtleties of Output Capture