Building code (Part II) – dependency generation Wednesday 27th June 2007
Automatic dependency generation can make a huge difference on productivity. If you have a large project then building every source file, every time in a code and fix cycle can grind the process to a halt. Likewise, if your build process doesn’t rebuild any object file that already exists, or only rebuilds it when the corresponding source file has been updated without taking into account updated header files, then you can end up chasing phantom bugs due to incompatible object files. To get around this, without taking the expensive hit of a full rebuild, you tend to end up manually deleting groups of critical object files which you think are the affected ones and attempting to use an incremental build.
A working autodependency system should make incremental builds as minimal as possible, but no more minimal than that. Every time you hit make, everything that should be rebuilt is, and there is no manual upkeep of complex source file dependencies.
For some time now, many compilers have provided an alternative preprocessing switch which, instead of outputting the normal preprocessed code, outputs a makefile fragment which describes the object file dependencies on the source file and all the header files which are included, both directly and indirectly. This fragment, which contains dependency only rules (i.e. they do not specify a build command) can then be included in a larger makefile to form a functioning makefile with complete dependencies.
gcc has the -M switch, which works as described, and the -MM which works similarly but omits system header files. I tend to favour the latter since system header files change infrequently and you usually know when they have (e.g. a major system upgrade). When such an event occurs, usually every file in the project is outdated anyway, so a manually clean is no particular hardship. The generated makefiles without the system header files are usually a lot more compact.
For a file test.c that includes test.h but no other non-system header files, you usually get a rule in the generated test.d makefile which is something like this:
test.o: test.c test.h
This is exactly what is required so usually you place a rule in the project makefile along the lines of:
test.d: test.c gcc -M -o test.d test.c include test.d
Due to the magic way make works, make will spot that while it can’t directly include test.d as it doesn’t yet exist, there is a rule to make it. Make will then make it – and any other included makefile that is not up to date that it has a rule for making – and restart the original makefile parsing step so that it can now include this makefile.
This is all well and good, as when you change test.h to #include “test2.h” make knows that test.o is out of date and needs to be rebuilt. The problem is that the test.d makefile has not been rebuilt so now there is an indirect dependency from test.o to test2.h, but test.d has not been rebuilt to reflect this. Previously, in the dark ages, the standard technique was to change the rule from generating the test.d file directly, to pass the output through a really ugly sed script instead. The script would replace the occurrences of ‘test.o’ with ‘test.d test.o’ so that test.d had the same dependencies as test.o and was correctly updated when the dependencies of test.o were updated. (What makes the sed script really ugly is usually the fact that it is defined in a pattern rule such as %.d: %.c and has to work with the make automatic variables like $@ and $< as well as regular expression syntax to do its magic. Sometimes the rule uses a temporary file for the original compiler dependency output, sometimes you are able to get the compiler to filter it straight into sed.)
The final thing that always used to irritate me about the sed script is that usually the compiler generates a makefile where the lines are all kept neatly to 80 column lines and line continuation syntax (‘\’ newline) is used to avoid line wrapping. Passing this through a sed script adding a dependency almost always causes the first line to wrap. Who cares? Nobody reads automatically generated dependency makefiles and it doesn’t affect their functionality! I know it shouldn’t matter, but I don’t need to look at them, I know that they’re badly formatted makefiles sitting there and it irritates me.
Fortunately, with modern gcc, there is a better way. You can use -MF to specify the output file to write the dependency rule and then you can use either -MT or -MQ to specify what you want to appear as targets to the generated dependency rule in the makefile that is written out. In our case we want the following:
gcc -MM -MF test.d -MT test.o -MT test.d
The -MQ option does the same thing as -MT but automatically escapes any characters that are special to make, so that when the resulting makefile is parsed by make everything reads correctly. Sounds like a really useful feature, but none of my source files have a $ in their name, so I haven’t really found a need for it.
The final icing on the cake with gcc is that you can use the -MD and -MMD options. These do exactly the same thing as the -M and -MM options except that they don’t stop the rest of the processing, so you can go on and complete the compilation of the source file in question. In effect, they make the dependency generation a side effect of the compile step.
test.o test.d: test.c gcc -MMD -MT test.d -c -o test.o test.c
In this case, gcc automatically includes an implied -MQ with whatever the output file is, so any other -MQ and -MT options are added to this.
In practice, I don’t really like this last form of dependency generation. Every time you run make all the *.d files are made up to date (they are included in the overall makefile so they must be remade before any specified targets are built). As the *.d files are a side effect of compilation, this means that all the out of date object files are automatically remade. In effect, compilation becomes a side effect of dependency generation. This has the slightly bizarre effect that if you have no generated dependencies and no object files (a build from ‘distclean’), then running a ‘make clean’ – just to be sure – compiles all your object files and generates all your dependencies and then immediately deletes all the object files again.
So I use the explicit “dependency only” method to generate dependency makefiles.
[Note: For simplicity I have only really described explicit rules. Typically dependency makefiles are made though pattern rules of the form %d: %c using automatic variables such as $@ in the command syntax. Also for simplicity I have left out the compiler preprocessor and include options. For reliable dependency generation these should be identical to the options used the the compile step itself.]