Implementation of operator+ Monday 18th May 2009

Previously I’ve written about the performance implications of passing and returning values. It turns out that it needn’t be too expensive and that passing temporaries or initialising objects from function returns need not entail object copies.

class Num
{
public:
    Num( const Num& );
    Num& operator=( const Num& );
    Num& operator+=( const Num& );

private:
    long _data[4];
};

Here’s a test class the might represent some sort of number. As discussed, here’s one possible implementation of operator+.

Num operator+( Num l, const Num& r )
{
        return l += r;
}

It’s probably about the shortest implementation in number of characters and, superficially, it looks quite good on keeping object copies to a minimum. There’s a single return statement so the compiler should be able to use the ‘return value optimization’ to avoid the copy on return. We also need to copy (or at least construct a new object) at least once, and that happens when we copy the parameter l.

There are however some subtleties.

Here’s a function that uses operator+ in a reasonable simple way that shouldn’t add any extra copies:

void TakeNum( const Num& r ) throw();

void F( const Num& l, const Num& r )
{
        TakeNum( l + r );
}

I’ll show the assembler for the calling routine and for operator+ so that we can account for all the copies necessary in calling operator+. I’ve ‘demangled’ the function names so while it’s no longer valid assembler, it is at least human-readable. I also took the liberty of pruning some alignment directives and exception handling labels. They’re very important, but not relevant to this discussion.

Caller:

F(Num const&, Num const&):
        movq    %rbx, -24(%rsp)
        movq    %rbp, -16(%rsp)
        movq    %r12, -8(%rsp)
        subq    $88, %rsp
        movq    %rsi, %r12
        leaq    32(%rsp), %rbp
        movq    %rdi, %rsi
        movq    %rbp, %rdi
        call    Num::Num(Num const&)
        movq    %r12, %rdx
        movq    %rbp, %rsi
        movq    %rsp, %rdi
        call    operator+(Num, Num const&)
        movq    %rsp, %rdi
        call    TakeNum(Num const&)
        movq    64(%rsp), %rbx
        movq    72(%rsp), %rbp
        movq    80(%rsp), %r12
        addq    $88, %rsp
        ret

OK, so we’re calling a copy constructor and then calling operator+, then we can call ‘TakeNum’ on the return value. This makes sense as the first parameter is taken by value so we need to copy it.

Callee:

operator+(Num, Num const&):
        pushq   %rbx
        movq    %rdi, %rbx
        movq    %rsi, %rdi
        movq    %rdx, %rsi
        call    Num::operator+=(Num const&)
        movq    %rbx, %rdi
        movq    %rax, %rsi
        call    Num::Num(Num const&)
        movq    %rbx, %rax
        popq    %rbx
        ret

Well, we have a second copy inside the operator+, in addition to the one outside the function for constructing the first parameter. This isn’t ideal. We know that we want to just return the first parameter as the return value, but because the caller arranged for the parameter copy and they don’t know that we want it constructed in the return value we can’t avoid this extra copy.

Perhaps we should go back to taking the first parameter by const reference again so that we can at least control where we make the copy.

Take 2:

Num operator+( const Num& l, const Num& r )
{
        return Num( l ) += r;
}

Caller (same code, different signature for operator+):

F(Num const&, Num const&):
        pushq   %rbx
        movq    %rsi, %rdx
        movq    %rdi, %rsi
        subq    $32, %rsp
        movq    %rsp, %rdi
        call    operator+(Num const&, Num const&)
        movq    %rsp, %rdi
        call    TakeNum(Num const&)
        addq    $32, %rsp
        popq    %rbx
        ret

That’s good we now have no copies occuring outside operator+, just the call to operator+ and the TakeNum call.

Callee:

operator+(Num const&, Num const&):
        movq    %rbx, -24(%rsp)
        movq    %rbp, -16(%rsp)
        movq    %rdi, %rbx
        movq    %r12, -8(%rsp)
        subq    $56, %rsp
        movq    %rdx, %r12
        movq    %rsp, %rdi
        call    Num::Num(Num const&)
        movq    %r12, %rsi
        movq    %rsp, %rdi
        call    Num::operator+=(Num const&)
        movq    %rbx, %rdi
        movq    %rax, %rsi
        call    Num::Num(Num const&)
        movq    %rbx, %rax
        movq    40(%rsp), %rbp
        movq    32(%rsp), %rbx
        movq    48(%rsp), %r12
        addq    $56, %rsp
        ret

OK, we’ve still got two copies going on. First the new temporary is constructed from l, then, after calling +=, we’re copying the return value again. What’s going on here?

Look carefully at the definition of operator+=. It returns a reference to a Num. We know it returns a reference to its left hand operand, or it really ought to, but there’s not anything in the function signature that actually specifies this. For all the information in the signature, it might return a reference to a completely unrelated Num and the compiler is not allowed to ‘take a chance’.

We know that we just want to return the copy of the parameter that we just made, so lets make that explicit.

Take 3:

Num operator+( const Num& l, const Num& r )
{
    Num n( l );
    n += r;
    return n;
}

The calling code remains the same as we haven’t changed the signature of operator+ this time.

Callee:

operator+(Num const&, Num const&):
        movq    %rbx, -16(%rsp)
        movq    %rbp, -8(%rsp)
        movq    %rdi, %rbx
        subq    $24, %rsp
        movq    %rdx, %rbp
        call    Num::Num(Num const&)
        movq    %rbp, %rsi
        movq    %rbx, %rdi
        call    Num::operator+=(Num const&)
        movq    %rbx, %rax
        movq    16(%rsp), %rbp
        movq    8(%rsp), %rbx
        addq    $24, %rsp
        ret

That’s better! We just have the single copy. In the use of operator+ we’ve only had to construct one new Num object which is as minimal as it gets, having been passed two const references to the incoming Num values.

Sometimes it pays to be explicit over being succinct.

2 Comments
Andy Balaam May 19th, 2009

I must say I much prefer Take 3 over the original implementation, just for its clarity (to me). I could easily miss an & and be confused.

Andy Balaam May 19th, 2009

Or rather, miss the lack of an &.