restrict, part II = x86_64 Thursday 26th July 2007 Comments Off on restrict, part II = x86_64

Just to make some use of my cross compiler, here’s how the assembler for x86_64 stacks up for the restrict example which I posted before.

Without restrict:

    movl    (%rdi), %eax  # eax = *a    A
    addl    (%rsi), %eax  # eax += *b   A + B
    movl    %eax, (%rdi)  # *a = eax    A + B
    movl    (%rsi), %edx  # edx = *b    B
    subl    %eax, %edx    # edx -= eax  -A
    movl    %edx, (%rsi)  # *b = edx    -A
    movl    (%rdi), %eax  # eax = *a    A + B
    subl    %edx, %eax    # eax -= edx  2A + B
    movl    %eax, (%rdi)  # *a = eax    2A + B
    addl    %eax, (%rsi)  # *b += eax   A + B

6 mov, 2 sub (2 register only), 2 add

With restrict:

    movl    (%rdi), %eax  # eax = *a    A
    movl    %eax, %edx    # edx = eax   A
    addl    (%rsi), %edx  # edx += *b   A + B
    negl    %eax          # eax = -eax  -A
    movl    %eax, (%rsi)  # *b = eax    -A
    subl    %eax, %edx    # edx -= eax  2A + B
    addl    %edx, (%rsi)  # *b += edx   A + B
    movl    %edx, (%rdi)  # *a = edx    2A + B

4 mov (1 register only), 1 sub (1 register only), 2 add, 1 neg (1 register only)

Alternate algorithm:

    movl    (%rdi), %edx
    leal    (%rdx,%rdx), %eax
    addl    (%rsi), %eax
    addl    %edx, (%rsi)
    movl    %eax, (%rdi)

2 mov, 2 add, 1 lea.

So the instruction counts stay exactly the same for the algorithms but as the x86_64 calling convention uses registers to pass variables we have no additional overhead for stack accessing instructions.

More obscure places to find the template keyword Comments Off on More obscure places to find the template keyword

Consider the following ill-formed c++ code:

class A
    template< class U > U f();

template< class T , class U >
class B
    T g()
        // error: expected primary-expression before ‘>’ token
        return u.f< T >();

    U u;

template<> int A::f<int>() { return 0; }

int main()
    B< int, A > b;
    return b.g();

The class template B is “obviously” designed for use with a class (such as class A) which has a member function template f() which takes no parameters. Because the template argument for f cannot be deduced, we need to explicitly provide it with the f<type> syntax. However, if we try to do that, as f is dependent on the template argument U of B, the compiler cannot deduce whether f is a template type or not. If not, the token < really could just mean “less”. In order to disambiguate this the template keyword must be used in definition of B::g thus:

template< class T , class U >
class B
    T f()
        return u.template f< T >();

    U u;

A lot of compilers accept the ill-formed code without the template keyword, so you might not have noticed this. Fortunately most compilers do accept the correct code as well, so there should be no reason not to be correct.

Note that it is illegal to use the template keyword in this context (member function specifier) outside of a template declaration. Outside of a template definition there are no dependent types so the compiler can always resolve whether an identifier refers to a template or a non-template, so there can be no ambiguity.

Cross-compiling gcc, glibc and all that. (Part II – the script) Wednesday 18th July 2007 Comments Off on Cross-compiling gcc, glibc and all that. (Part II – the script)

So here it is, the script that does it all. Although it is a shell script, I thorougly recommend not running it, but cutting and pasting it, section by section, into a terminal session so that you can fix up any environmental issues that cause it to fall over. There are some paths at the top of the script that you probably want to fix before starting. I ran the whole script on a PIII system and it churned through the entire process in just under an hour.

The basic procedure is to install a “cross” binutils (gas, ld, etc.), install kernel headers and glibc headers into a fake x86_64 system root, then compile a minimal gcc cross compiler without making any target libraries, then use this to compile a glibc for the target platform, then use the resulting glibc to build a full gcc with c++ support and the required target libraries. The reason for making two gcc is that making a full gcc requires some parts of the target glibc to be built with the cross-compiler that you haven’t yet built.

The script has six steps and three “fudges”, which I consider to be quite an achievement on the fudge count.

Important note: building glibc with nptl (the native Posix threads library for linux) does not work with the “stage one” minimal gcc. This is a big problem as glibc-2.6 only comes with nptl, as far as I’m aware, and you are stuck. glibc-2.5 has an add-on for the “old” linuxthreads support and you can build this with the “stage one” gcc compiler. Once you’ve built the full gcc with the “old” threads supported glibc-2.5 you can go back and build a full version of glibc-2.6 with nptl with this gcc. Once you’ve done this, if you’re feeling uncertain, you can go back and completely rebuild a full gcc on top of this glibc-2.6, just in case it makes a difference. Once you’ve done this, if you’re feeling very uncertain, you can completely rebuild glibc-2.6 with the brand new gcc, just in case in makes a difference. Once you’ve done this…

I’d really love to find a fix for getting the whole process to work without having to fall back to an old version of glibc. It feels (ha!) even more hacky than the rest of the process feels.

You’ll need:


Like all such documents, this list will seem out of date before I’ve posted. By all means try the process with gcc 4.4, linux 2.8, glibc 2.7 and binutils 2.18, but the fudges will probably all have to be updated before it works.

Cross-compiling gcc, glibc and all that. Monday 16th July 2007 Comments Off on Cross-compiling gcc, glibc and all that.

Yippee! I finally managed to get a working cross-compiler x86 -> x86_64 on one of my linux boxes setup with c++ and shared library support. gcc, the kernel and glibc have the most annoying set of interdependencies so doing a ground up build is horribly painful, consisting of several rounds of bootstrapping. Like most people who manage this I’m so fed up with the whole process that I can’t be bothered to document what I did right now… however, I want to do it all again so that I can verify that I can do it from clean. When I do, I shall attempt to document the cleanest way to do this.

Just so this post contains one thing useful, this was the best resource that i found, so thank you to “vapier” @ “gentoo”.

make – thank you for the music Wednesday 11th July 2007 Comments Off on make – thank you for the music

Make doesn’t have to be just for complex code building applications.

I have my own postgres database of the albums that I own and have ripped, mainly because I’m a bit picky about the formatting of tag metadata.

I have a script that you can point at a musicbrainz album entry and it will download it in xml format an insert it into my metadata repository (utf-8, of course) from where I can tweak it as I like in a perl base webapp which has the one really cool killer feature. You can download a zip file with a windows .cmd file and a bunch of metadata text files which, when unzipped in the same directory as a bunch of Track??.wav files, will losslessly compress all of the wav files into my music directory (under appropriately named artist and album sub-directories), tag them all with the correct metadata and finally run a “flac -t” test decompression over all the newly compressed files.

The cool thing is that I can rip my newly arrived album and start listening to in wav format in the temporary directory into which I’ve ripped it, and take the time to sort out the metadata (spacing, capitalization, special characters) at my leisure and compress, tag and store replaygain data when I’m ready.

The one thing that has been bugging me since I set up my dual core machine is that the compression process uses only one core as flac runs as a single thread and the cmd file just runs commands sequentially. I’ve popped the lid off the flac libraries before so I vaguely considered hacking in some multithread, multifile feature but this seemed like hard work. I thought about getting the cmd script to spawn to seperate command processors each with half the files, but then one might finish early if the track lengths or complexity happened to be radically different. Perhaps I could spawn each compression instance in the background and run them all in parallel, but it doesn’t seem very neat spawning of a dozen processes all at once and I’d have not neat way of waiting for them all to finish and run the test. Finally I realised that this is really a job for make.

So I tweaked the cmd file generating webapp to create me a makefile instead and now I run “make -j 2” from the working directory. Hey presto, both cores are used and the compression takes about half the time. Here’s a quick sample:

.PHONY: all test

all: test

$(MYMUSIC)/Artist/Album/01_TrackOneTitle.flac: Track01.wav
    flac $< -o $@
    metaflac --remove-all-tags --no-utf8-convert --import-tags-from=track01.txt

$(MYMUSIC)/Artist/Album/01_TrackTwoTitle.flac: Track02.wav
    flac $< -o $@
    metaflac --remove-all-tags --no-utf8-convert --import-tags-from=track02.txt

# etc, etc, ...

OUTFILES := $(MYMUSIC)/Artist/Album/01_TrackOneTitle.flac \\
# etc, etc, ...

test: $(OUTFILES)
    flac -t $(OUTFILES)

You can add tags with the flac commandline but I use metaflac and text files because the commandline doesn't handle utf-8 so well, it's more reliable to generate tag import files and tell metaflac to import them straight into the vorbis comment block of the flac file without translation. It just works, and now at twice the speed.

Building code (Part II) – dependency generation Wednesday 27th June 2007 Comments Off on Building code (Part II) – dependency generation

Automatic dependency generation can make a huge difference on productivity. If you have a large project then building every source file, every time in a code and fix cycle can grind the process to a halt. Likewise, if your build process doesn’t rebuild any object file that already exists, or only rebuilds it when the corresponding source file has been updated without taking into account updated header files, then you can end up chasing phantom bugs due to incompatible object files. To get around this, without taking the expensive hit of a full rebuild, you tend to end up manually deleting groups of critical object files which you think are the affected ones and attempting to use an incremental build.

A working autodependency system should make incremental builds as minimal as possible, but no more minimal than that. Every time you hit make, everything that should be rebuilt is, and there is no manual upkeep of complex source file dependencies.

For some time now, many compilers have provided an alternative preprocessing switch which, instead of outputting the normal preprocessed code, outputs a makefile fragment which describes the object file dependencies on the source file and all the header files which are included, both directly and indirectly. This fragment, which contains dependency only rules (i.e. they do not specify a build command) can then be included in a larger makefile to form a functioning makefile with complete dependencies.

gcc has the -M switch, which works as described, and the -MM which works similarly but omits system header files. I tend to favour the latter since system header files change infrequently and you usually know when they have (e.g. a major system upgrade). When such an event occurs, usually every file in the project is outdated anyway, so a manually clean is no particular hardship. The generated makefiles without the system header files are usually a lot more compact.

For a file test.c that includes test.h but no other non-system header files, you usually get a rule in the generated test.d makefile which is something like this:

test.o: test.c test.h

This is exactly what is required so usually you place a rule in the project makefile along the lines of:

test.d: test.c
   gcc -M -o test.d test.c

include test.d

Due to the magic way make works, make will spot that while it can’t directly include test.d as it doesn’t yet exist, there is a rule to make it. Make will then make it – and any other included makefile that is not up to date that it has a rule for making – and restart the original makefile parsing step so that it can now include this makefile.

This is all well and good, as when you change test.h to #include “test2.h” make knows that test.o is out of date and needs to be rebuilt. The problem is that the test.d makefile has not been rebuilt so now there is an indirect dependency from test.o to test2.h, but test.d has not been rebuilt to reflect this. Previously, in the dark ages, the standard technique was to change the rule from generating the test.d file directly, to pass the output through a really ugly sed script instead. The script would replace the occurrences of ‘test.o’ with ‘test.d test.o’ so that test.d had the same dependencies as test.o and was correctly updated when the dependencies of test.o were updated. (What makes the sed script really ugly is usually the fact that it is defined in a pattern rule such as %.d: %.c and has to work with the make automatic variables like $@ and $< as well as regular expression syntax to do its magic. Sometimes the rule uses a temporary file for the original compiler dependency output, sometimes you are able to get the compiler to filter it straight into sed.)

The final thing that always used to irritate me about the sed script is that usually the compiler generates a makefile where the lines are all kept neatly to 80 column lines and line continuation syntax (‘\’ newline) is used to avoid line wrapping. Passing this through a sed script adding a dependency almost always causes the first line to wrap. Who cares? Nobody reads automatically generated dependency makefiles and it doesn’t affect their functionality! I know it shouldn’t matter, but I don’t need to look at them, I know that they’re badly formatted makefiles sitting there and it irritates me.

Fortunately, with modern gcc, there is a better way. You can use -MF to specify the output file to write the dependency rule and then you can use either -MT or -MQ to specify what you want to appear as targets to the generated dependency rule in the makefile that is written out. In our case we want the following:

gcc -MM -MF test.d -MT test.o -MT test.d

The -MQ option does the same thing as -MT but automatically escapes any characters that are special to make, so that when the resulting makefile is parsed by make everything reads correctly. Sounds like a really useful feature, but none of my source files have a $ in their name, so I haven’t really found a need for it.

The final icing on the cake with gcc is that you can use the -MD and -MMD options. These do exactly the same thing as the -M and -MM options except that they don’t stop the rest of the processing, so you can go on and complete the compilation of the source file in question. In effect, they make the dependency generation a side effect of the compile step.

test.o test.d: test.c
    gcc -MMD -MT test.d -c -o test.o test.c

In this case, gcc automatically includes an implied -MQ with whatever the output file is, so any other -MQ and -MT options are added to this.

In practice, I don’t really like this last form of dependency generation. Every time you run make all the *.d files are made up to date (they are included in the overall makefile so they must be remade before any specified targets are built). As the *.d files are a side effect of compilation, this means that all the out of date object files are automatically remade. In effect, compilation becomes a side effect of dependency generation. This has the slightly bizarre effect that if you have no generated dependencies and no object files (a build from ‘distclean’), then running a ‘make clean’ – just to be sure – compiles all your object files and generates all your dependencies and then immediately deletes all the object files again.

So I use the explicit “dependency only” method to generate dependency makefiles.

[Note: For simplicity I have only really described explicit rules. Typically dependency makefiles are made though pattern rules of the form %d: %c using automatic variables such as $@ in the command syntax. Also for simplicity I have left out the compiler preprocessor and include options. For reliable dependency generation these should be identical to the options used the the compile step itself.]

Building Code (Part I) – Makefiles Tuesday 26th June 2007 Comments Off on Building Code (Part I) – Makefiles

Makefiles are one of the things that I don’t like spending a lot of time working with, they’re there to support writing software so they should be easy and simple to work with, but they should “just work”.

So naturally, I’ve spent a lot of time working on a setup that “just works”.

There are a number of approaches to make systems which have various strengths and weaknesses.

One approach is to have the makefiles generated by a higher level system, usually written in a scripting language, which generates some makefiles. Makefile syntax isn’t a programming language, so if you attempt to do too much programming with it you can end up fighting against the tool and not working with it so this initially seems like an attractive idea. The downside is that you have to “regenerate” makefiles and this can be slower than necessary and, being outside of make, it is not easy to check when the makefiles need to be generated or to regenerate them incrementally. Some systems also generate makefiles which have poor dependency support, or which use make in a recursive fashion. Most systems are more restrictive than using make directly as they enforce a pattern. This can be considered a strength, as not following a set pattern can lead to a maintenance nightmare, but the down side is that you can end up fighting the system to do anything slightly out of the ordinary.

Oh, and if you haven’t read Paul Miller’s 1997 paper “Recursive Make Considered Harmful”, then I thoroughly recommend it.

The alternative is to manage one or more makefiles manually. Evidently makefiles can grow into unmaintainable monsters if they are not carefully managed so the usual way to manage this is by using a project “template” with well defined places where component specific files and settings are specified. This is my preferred approach but getting the template correct is the trick.

There are a number of features that I would like to see in a makefile system.

  • Dependencies for C and C++ sources should be generated automatically.
  • It must be be easy to add new projects and new files to existing projects.
  • It needs to support alternative source files, and generated source files (e.g. assembler files, [f]lex, yacc/bison files)
  • It must be possible to build different configurations from the same source. Which implies…
  • It must build automatically into a separate build directory, so .o files shouldn’t appear in the same directories as .c files.

Is it possible to do all this with a “templated” make system?

(Hint: The answer is yes.)

Ramble on Monday 25th June 2007 No Comments

Well the upgrade to WordPress 2.2.1 seemed to go relatively harmlessly and while I was at it I set up a cron job to backup my mysql database (not my choice, no postgresql option) to my home computer.

ssh is a very powerful tool. My backup across the network is a really simple shell script along the lines of:

FNAME=${HOME}/mysqlbackup_`date +%F_%H_M_S`.sql.bz2
ssh myispaccount "mysqldump -h ispdbserver -u ispusername -pmindyourown\
    mydbname | bzip2 -9" >${FNAME} 2>/dev/null

ssh makes things really easy as when you specifiy a command, the things you want to happen to stdout and stdin actually happen. So in this example, bzip2 is part of the command sent to the server so the bzipping happens on the remote end, and ssh forwards the stdout of the remote command back to the local machine where I redirect it straight to a file. If I wanted I could have bunzipped it on the local end, but there’s no need. Having the backup saved as a timestamped bzip2ped sql file suits my needs.

Try this for funky mirroring:

ssh remotehost "tar -c -C remotefolder . | bzip2 -9" | tar -x -j -C localfolder

There’s a slight issue with tar and ssh which means that when you use a compression filter as a tar option (-j or -z) on the remote side it seems to send some trailing garbage at the end, rounding the communication up to some minimum block size. The untar deals quite happily with it, but bzip2 does warn. If you use a pipe at the remote end, this doesn’t seem to happen.

restrict, yes it does make a difference Friday 22nd June 2007 Comments Off on restrict, yes it does make a difference

restrict is a new keyword in C99 (ISO/IEC 9899:1999) designed to enable the compiler to make more optimizations in response to programmers’ guarantees. You can read the full meaning of the keyword in the standard but an oversimplification would be that if a pointer is marked as restrict then it is a guarantee that the objects referred to through that pointer are not accessible in any other way in the block containing the pointer’s scope.

Here’s an example. Suppose we have a function that takes a pointer to two ints which are consecutive members of the Fibonacci sequence and the function is required to change the values of the two ints to the next distinct pair of numbers in the sequence. In other words, if the initial terms are A and B, with A being the later member of the sequence, then the next two terms are A + B and (A + B) + A = 2A + B. You might do this as follows:

void fibincr2(int *a, int *b)
    *a += *b; // *a contains A + B
    *b -= *a; // *b contains -A
    *a -= *b; // *a contains 2A + B
    *b += *a; // *b contains A + B

Here is the assembly generated by gcc -O3 -S -std=c99 -pedantic -Wall -fomit-frame-pointer with my comments added. I’ve taken A and B to be the initial values of *a and *b and put the value of the result of each instruction in the right hand column in terms of A and B.

Note where we reload eax from *a despite the fact that eax was stored to *a earlier and eax was not change in the intervening code.

    pushl   %ebx            # Save ebx register
    movl    8(%esp), %ebx   # ebx is a (type int*)
    movl    12(%esp), %ecx  # ecx is b
    movl    (%ebx), %eax    # eax = *a      A
    addl    (%ecx), %eax    # eax += *b     A + B
    movl    %eax, (%ebx)    # *a = eax      A + B
    movl    (%ecx), %edx    # edx = *b      B
    subl    %eax, %edx      # edx -= eax    -A
    movl    %edx, (%ecx)    # *b = edx      -A
    movl    (%ebx), %eax    # eax = *a      A + B (so long as a and b are distinct)
    subl    %edx, %eax      # eax -= edx    2A + B
    movl    %eax, (%ebx)    # *a = eax      2A + B
    addl    %eax, (%ecx)    # *b += eax     A + B
    popl    %ebx            # restore ebx register

Of course, this all breaks down when a and b are pointing at the same int, but then the function is useless in this case so we may as well state this.

void fibincr2(int * restrict a, int * restrict b)
    *a += *b; // *a contains A + B
    *b -= *a; // *b contains -A
    *a -= *b; // *a contains 2A + B
    *b += *a; // *b contains A + B

And now the new assembly:

    pushl   %ebx            # Save ebx register
    movl    8(%esp), %ebx   # ebx is a (type int*)
    movl    12(%esp), %ecx  # ecx is b
    movl    (%ebx), %eax    # eax = *a      A
    movl    %eax, %edx      # edx = eax     A
    negl    %eax            # eax = -eax    -A
    addl    (%ecx), %edx    # edx += *b     A + B
    movl    %eax, (%ecx)    # *b = eax      -A
    subl    %eax, %edx      # edx -= eax    2A + B
    addl    %edx, (%ecx)    # *b += edx     A + B
    movl    %edx, (%ebx)    # *a = edx      2A + B
    popl    %ebx            # restore ebx register

Note that now the pointers are marked as restrict the compiler no longer has to generate a read for the *a after write to *b if it already has the result of a previous read (or write to) *a in a register, as the compiler now knows that the write to *b cannot have affected *a.

So previously, ignoring the initial load of parameters, we had six “mov” (one register only), two “add” (both involving memory) and two “sub” (both register only).

After adding restrict we now have four “mov” (one register only), two “add” (both involving memory), one “sub” and one “neg” (both register only). In other words we’ve save two memory access instructions in a ten instruction algorithm. Go us.

Of course, in this case, the use of a temporary would produce even more efficient code.

void fibincr2(int * restrict a, int * restrict b)
    int c = *a;   //  c contains A
    *a += c + *b; // *a contains 2A + B
    *b += c;      // *b contains A + B
    pushl   %ebx               # save ebx register
    movl    8(%esp), %ebx      # ebx is a (type int*)
    movl    12(%esp), %edx     # edx is b
    movl    (%ebx), %ecx       # ecx = *a           A
    leal    (%ecx,%ecx), %eax  # eax = ecx + ecx    2A
    addl    (%edx), %eax       # eax += *b          2A + B
    addl    %ecx, (%edx)       # *b += ecx          A + B
    movl    %eax, (%ebx)       # *a = eax           2A + B
    popl    %ebx               # restore ebx register

Two “mov”, two “add” and a “lea”. Go us.

The example function and original algorithm was chosen for this entry because it exhibits behaviour that is affected by using the restrict keyword and because the function performs a readily comprehensible action that isn’t stretching the bounds of “plausibly useful” too much. The final algorithm shows that this particular problem can be solved in a more efficient way, but I was struggling to find an example that was succinct enough without being overly artificial.

Two things. Oh, and one other thing. Wednesday 20th June 2007 Comments Off on Two things. Oh, and one other thing.

There are two things you need to remember about smart pointers. First, they are not pointers, and second, they are not smart.

One thing you should know about typedefs. They don’t define new types.