restrict, yes it does make a difference Friday 22nd June 2007
restrict
is a new keyword in C99 (ISO/IEC 9899:1999) designed to enable the compiler to make more optimizations in response to programmers’ guarantees. You can read the full meaning of the keyword in the standard but an oversimplification would be that if a pointer is marked as restrict then it is a guarantee that the objects referred to through that pointer are not accessible in any other way in the block containing the pointer’s scope.
Here’s an example. Suppose we have a function that takes a pointer to two ints which are consecutive members of the Fibonacci sequence and the function is required to change the values of the two ints to the next distinct pair of numbers in the sequence. In other words, if the initial terms are A and B, with A being the later member of the sequence, then the next two terms are A + B and (A + B) + A = 2A + B. You might do this as follows:
void fibincr2(int *a, int *b) { *a += *b; // *a contains A + B *b -= *a; // *b contains -A *a -= *b; // *a contains 2A + B *b += *a; // *b contains A + B }
Here is the assembly generated by gcc -O3 -S -std=c99 -pedantic -Wall -fomit-frame-pointer
with my comments added. I’ve taken A and B to be the initial values of *a and *b and put the value of the result of each instruction in the right hand column in terms of A and B.
Note where we reload eax from *a despite the fact that eax was stored to *a earlier and eax was not change in the intervening code.
fibincr2: pushl %ebx # Save ebx register movl 8(%esp), %ebx # ebx is a (type int*) movl 12(%esp), %ecx # ecx is b movl (%ebx), %eax # eax = *a A addl (%ecx), %eax # eax += *b A + B movl %eax, (%ebx) # *a = eax A + B movl (%ecx), %edx # edx = *b B subl %eax, %edx # edx -= eax -A movl %edx, (%ecx) # *b = edx -A movl (%ebx), %eax # eax = *a A + B (so long as a and b are distinct) subl %edx, %eax # eax -= edx 2A + B movl %eax, (%ebx) # *a = eax 2A + B addl %eax, (%ecx) # *b += eax A + B popl %ebx # restore ebx register ret
Of course, this all breaks down when a and b are pointing at the same int, but then the function is useless in this case so we may as well state this.
void fibincr2(int * restrict a, int * restrict b) { *a += *b; // *a contains A + B *b -= *a; // *b contains -A *a -= *b; // *a contains 2A + B *b += *a; // *b contains A + B }
And now the new assembly:
fibincr2: pushl %ebx # Save ebx register movl 8(%esp), %ebx # ebx is a (type int*) movl 12(%esp), %ecx # ecx is b movl (%ebx), %eax # eax = *a A movl %eax, %edx # edx = eax A negl %eax # eax = -eax -A addl (%ecx), %edx # edx += *b A + B movl %eax, (%ecx) # *b = eax -A subl %eax, %edx # edx -= eax 2A + B addl %edx, (%ecx) # *b += edx A + B movl %edx, (%ebx) # *a = edx 2A + B popl %ebx # restore ebx register ret
Note that now the pointers are marked as restrict
the compiler no longer has to generate a read for the *a after write to *b if it already has the result of a previous read (or write to) *a in a register, as the compiler now knows that the write to *b cannot have affected *a.
So previously, ignoring the initial load of parameters, we had six “mov” (one register only), two “add” (both involving memory) and two “sub” (both register only).
After adding restrict we now have four “mov” (one register only), two “add” (both involving memory), one “sub” and one “neg” (both register only). In other words we’ve save two memory access instructions in a ten instruction algorithm. Go us.
Of course, in this case, the use of a temporary would produce even more efficient code.
void fibincr2(int * restrict a, int * restrict b) { int c = *a; // c contains A *a += c + *b; // *a contains 2A + B *b += c; // *b contains A + B }
fibincr2: pushl %ebx # save ebx register movl 8(%esp), %ebx # ebx is a (type int*) movl 12(%esp), %edx # edx is b movl (%ebx), %ecx # ecx = *a A leal (%ecx,%ecx), %eax # eax = ecx + ecx 2A addl (%edx), %eax # eax += *b 2A + B addl %ecx, (%edx) # *b += ecx A + B movl %eax, (%ebx) # *a = eax 2A + B popl %ebx # restore ebx register ret
Two “mov”, two “add” and a “lea”. Go us.
Addendum
The example function and original algorithm was chosen for this entry because it exhibits behaviour that is affected by using the restrict keyword and because the function performs a readily comprehensible action that isn’t stretching the bounds of “plausibly useful” too much. The final algorithm shows that this particular problem can be solved in a more efficient way, but I was struggling to find an example that was succinct enough without being overly artificial.