Well, technically…

Don’t try for exception safety Thursday 20th December 2007 3 Comments

Achieve it!

“Do or do not, there is no try” – Yoda, on strong exception safety.

I’ve decided that I like exceptions a lot more now that I know how to use them to my advantage. Take the following piece of code:


#include <cstring>
using std::memcmp;

size_t DoConversion( char* dst, size_t dst_len, const char* src, size_t src_len );
size_t DoConversion2( char* dst, size_t dst_len, const char* src, size_t src_len );

void test_conv( const char* test_data, size_t test_len )
{
        size_t bufsize = DoConversion( NULL, 0, test_data, test_len );

        char* buffer = new char[ bufsize ];

        DoConversion( buffer, bufsize, test_data, test_len );

        size_t bufsize2 = DoConversion2( NULL, 0, buffer, bufsize );

        if( bufsize2 != bufsize )
                throw TestFailed();

        char* buffer2 = new char[ bufsize2 ];

        DoConversion2( buffer2, bufsize2, buffer, bufsize );

        if( memcmp( buffer, buffer2, bufsize ) != 0 )
                throw TestFailed();

        delete[] buffer2;

        delete[] buffer;
}

Assume that DoConversion and DoConversion2 are “traditional” character string conversion functions. They take a source and destination buffer which they don’t memory manage and convert one to the other. If you supply a null destination buffer then they will tell you how big the destination buffer would have to be to complete the conversion without actually performing the conversion. Assume that they are less traditional in that they may throw a BadThing exception if something doesn’t work.

The test_conv function is obviously not exception safe, and in multiple ways. Trying to make it exception safe in a naive way – by adding some try/catch pairs – is verbose and error prone. I came up with this, but I have low confidence in the result (and I actually know of one definite reason why it is not exception safe).

void test_conv( const char* test_data, size_t test_len )
{
        size_t bufsize = DoConversion( NULL, 0, test_data, test_len );

        char* buffer = new char[ bufsize ];

        size_t bufsize2;

        try
        {
                DoConversion( buffer, bufsize, test_data, test_len );

                bufsize2 = DoConversion2( NULL, 0, buffer, bufsize );
        }
        catch( ... )
        {
                delete[] buffer;
                throw;
        }

        if( bufsize2 != bufsize )
        {
                delete[] buffer;
                throw TestFailed();
        }

        char* buffer2 = new char[ bufsize2 ];

        try
        {
                DoConversion2( buffer2, bufsize2, buffer, bufsize );
        }
        catch( ... )
        {
                delete[] buffer2;
                delete[] buffer;
                throw;
        }

        int res = memcmp( buffer, buffer2, bufsize );

        delete[] buffer2;
        delete[] buffer;

        if( res != 0 )
                throw TestFailed();
}

This is ugly in so many ways. buffer is allocated in one place and deallocated in one of four places (or even not at all!), depending on the particular path followed; bufsize2 now has to be declared before we can sensibly initialize it; the result of memcmp is cached so that deallocation can take place before deciding whether to throw or not (this was mildy shorter that duplicating the two delete statements yet again).

So here’s the answer: write a new class.

class AutoCharArray
{
public:
        AutoCharArray( size_t s ) : _buffer( new char[s] ) {}

        ~AutoCharArray() { delete[] _buffer; }

        operator char*() const { return _buffer; }

private:
        // No copying
        AutoCharArray( const AutoCharArray& );
        AutoCharArray& operator=( const AutoCharArray& );

        char* _buffer;
};

void test_conv( const char* test_data, size_t test_len )
{
        size_t bufsize = DoConversion( NULL, 0, test_data, test_len );

        AutoCharArray buffer( bufsize );

        DoConversion( buffer, bufsize, test_data, test_len );

        size_t bufsize2 = DoConversion2( NULL, 0, buffer, bufsize );

        if( bufsize2 != bufsize )
                throw TestFailed();

        AutoCharArray buffer2( bufsize2 );

        DoConversion2( buffer2, bufsize2, buffer, bufsize );

        if( memcmp( buffer, buffer2, bufsize ) != 0 )
                throw TestFailed();

}

AutoCharArray just manages the life time of the dynamically allocated char array as a C++ object. Because of this, we never have to worry about catching an rethrowing foreign exceptions. Because it is a C++ object, if it has been successfully constructed as a local object then it will be destroyed when the function exits, whether conventionally or via an exception. We don’t even have to worry about “new” failing. If new throws, the constructor will not have completed so the destructor will not be called on a bad pointer.

As well of all these advantages, the control flow for the optimistic ‘success’ use case is obvious and easy to follow. It is not cluttered with a ton of “but in case that didn’t work” catch blocks. Overall, including the class definition, the entire code is no longer than the long winded “try/catch” quagmire of the first attempt.

A simple “AutoArray” class is very useful for this type of application, although I tend to prefer it as a template:

template< class T >
class AutoArray
{
public:
        AutoArray( size_t s ) : _buffer( new T[s] ) {}

        ~AutoArray() { delete[] _buffer; }

        operator T*() const { return _buffer; }

private:
        // No copying
        AutoArray( const AutoArray& );
        AutoArray& operator=( const AutoArray& );

        T* _buffer;
};

Author: charles
Filed Under Category: c++
Article
Comments: 3 Comments

git backups interacting with git Wednesday 5th December 2007 1 Comment

This is really important!

git as a generalized backup utility interacts with and git repositories that it finds in an ‘interesting’ way.

It treats them as a submodule, so instead of backing up the git repository, it just records a reference to the current HEAD of the submodule.

I believe that this is “by design”, but if you don’t set up the submodule configuration your backup repository won’t know where to find the correct repository with the recorded commit.

It also means that you need to be git pushing your precious repository data somewhere safe in any case.

git backup is also about three times slower than my tar based incremental backup, although incrementally saving the backups to a remote machine is quicker and backup browsing and recovery is a little easier.

Author: charles
Filed Under Category: Random
Article
Comments: 1 Comment

git as a general purpose backup utility Monday 3rd December 2007 Comments Off on git as a general purpose backup utility

When it was first suggested to me that you could just use git for backup I was not convinced. You would have these massive .git directories in high level places on your filesystem for one.

Now that I’ve had some time to reflect on the possibility I think that perhaps it isn’t such a crazy idea. It’s not actually true that you have to have a .git directory in the place that you want to back up. In fact, I am even trialling git alongside by regular “tar” based backup.

Here’s what I do. Suppose, for the sake of example, that I’m going to backup /home onto a separate backup partition called /backup.

Step 1 – Create a git repository for the backup

mkdir /backup/home.git
git --git-dir=/backup/home.git --work-tree=/home init

[
I used to do this as follows before I discovered about the –work-tree option to git. It has the same effect.

git --bare init
git config core.bare false
git config core.worktree /home

]

Step 2 – Initial backup

cd /home
git --git-dir=/backup/home.git add .
git --git-dir=/backup/home.git commit -m "Initial /home backup"

Step 3 – Copy backups to a safe remote machine
Assuming that you have a second machine where you want to store your backups to which you have ssh access (and has git installed), you can initialize a new empty git repository for this purpose.
Suppose that this machine is called other-machine and the repository is located at /backup/first-machine/home.git.

The initial remote backup is performed thus.

cd /backup/home.git
git remote add other-machine ssh://other-machine/backup/first-machine/home.git
git gc
git push other-machine master

The git gc seems fairly important. At this stage you have a massive git repository that hasn’t yet been packed. When you attempt to push it, git will want to perform a big “Deltifying” step to create a pack on the remote side. If you perform the git gc on the local side first it will perform the big “Deltifying” step and effectively store the results as a pack on the local side. The git push can use this and, having done the gc, subsequent local operations can also take advantage of the local pack whereas just letting the push do the pack would lose the work done from the local side.

Step 4 – Incremental backup

cd /home
git --git-dir=/backup/home.git add .
git --git-dir=/backup/home.git commit -a -m "Initial /home backup"

Performing both an “add” and a “commit -a” looks repetitive but is required as “commit -a” does not add new untracked files and “add” doesn’t ‘add’ file deletions to the index.

Step 5 – Push incremental backup to remote machine

cd /backup/home.git
git push other-machine master

Well, that was easy.

Disadvantages
The initial “git gc” step can be very slow.
git does not store owner/group information or atime and utime information. The backup is content only.
“git add .” is not robust against files that disappear while git’s looking at them (e.g. lock files). It tends to fail with a “cannot stat” message when you really want it to not bother with that file and carry on.

Author: charles
Filed Under Category: Random
Article
Comments: Comments Off

extern arrays and pointers Saturday 17th November 2007 Comments Off on extern arrays and pointers

Quick, what’s the difference!

extern char data[];

extern char *data;

Well the first one’s and array and the second one is a pointer. You can treat them in the same way, but they are completely different things.

Here’s a function which looks the same with both declarations of data:

char f(int a)
{
    return data[a];
}

But if data is an array it compiles to the following. (gcc -O3 -fomit-frame-pointer)

f:
    # move the function parameter from the stack into $eax
    movl    4(%esp), %eax

    # access the byte at data + $eax and store into $eax
    movsbl  data(%eax),%eax

    ret

And if it’s a pointer it compiles to this.

f:
    # move the pointer value at the memory location data into $eax
    movl    data, %eax

    # move the function parameter from the stack into $edx
    movl    4(%esp), %edx  

    # access the byte at $edx + $eax and store into $eax
    movsbl  (%eax,%edx),%eax

    ret

It’s an easy enough mistake to make but I find it unintuitive the way that the change in external declaration silently changes the effect of the function without any change to the function definition itself.

It’s important to get external declarations consistent and this can be a problem if the array data lives in a non-C (or non-C++) file, e.g. test data included via an assembly file.

Author: charles
Filed Under Category: c, c++
Article
Comments: Comments Off

A mini test framework in a single header file Wednesday 31st October 2007 2 Comments

After trying it on a number of projects, I’m now very enthusiastic about test driven development. At home I’ve rather missed the minimal support code that I had at my old job, so I’ve rewritten a miniature test framework in a single header file.

Only this time, it’s better. Framework implies something rather big. This isn’t. The design goal was to make something really simple and lightweight that just makes the process of writing tests as simple as possible with as few overheads as possible.

In this framework a test function is just a void function returning void. If the function doesn’t throw an exception when it is called then it has passed.

Here are some example tests, which show the three different type of assert macros provided. (Yes all the tests are broken!)

void strlen_test()
{
    HSHG_ASSERT( strlen( "1245" ) == 5 );
}

void sum_test()
{
    int total = 0;

    for( int i = 1; i <= 15; ++i )
    {
        total += 1;
        total += 7;
    }

    for( int i = 2; i <= 7; ++i )
    {
        total += i;
    }

    HSHG_ASSERT_DESC( total == 148, "Maximum break is 148" );
}

class MyThrowable
{
public:
    static void ThrowMe()
    {
        throw MyThrowable();
    }
};

void MyThrowable_test()
{
    HSHG_ASSERT_THROWS( true ? 0 : (MyThrowable::ThrowMe(), 0), MyThrowable );
}

The test namespace is HSHGTest and there is a struct TestFn for a test function which contains a char* for the function name and a pointer to the function itself.

There is then an inline function called RunTests that takes an array of these structs (terminated by one with a null function pointer); it runs each test in the array and reports to a given std::ostream; it then returns EXIT_SUCCESS if it ran some tests and they all passed, and EXIT_FAILURE otherwise. This makes it suitable for returning the result directly from a main function. Here is an example report.

testtest.cc:6: Test strlen_test failed. ( strlen( "1245" ) == 5 )
testtest.cc:24: Test sum_test failed. ( Maximum break is 148 )
testtest.cc:38: Test MyThrowable_test failed. ( Exception MyThrowable expected. )

If this sounds a bit laborious then there are some helper macros to set a suitable array up, and even a macro for a default main function:

HSHG_BEGIN_TESTS
HSHG_TEST_ENTRY( strlen_test )
HSHG_TEST_ENTRY( sum_test )
HSHG_TEST_ENTRY( MyThrowable_test )
HSHG_END_TESTS

HSHG_TEST_MAIN

These macros translate (roughly) into:

namespace
{

HSHGTest::TestFn tests[] =
{
    { "strlen_test", strlen_test },
    { "sum_test", sum_test },
    { "MyThrowable_test", MyThrowable_test },
    { NULL, NULL }
};

}

int main()
{
    return HSHGTest::RunTests( tests, std::cout );
}

The "framework" is available for download here. HSHGTest frawework

Author: charles
Filed Under Category: c++
Article
Comments: 2 Comments

Address Space Monitor vs Google Sunday 28th October 2007 1 Comment

In an unusual coincidence, googlebot visited my homepage the day after I put the link to Address Space Monitor live, and the next day it was the top search result for the query: ‘Address Space Monitor’. It stayed there for a few days but now the asm homepage is not in Google’s index at all. I’ve logged in to Google’s Webmaster Tools but so far they haven’t shed any light on the situation.

[Edit: Monday morning] And now it’s back in at #1. It must be a glitch in the google…

Author: charles
Filed Under Category: c++
Article
Comments: 1 Comment

Address Space Monitor Tuesday 23rd October 2007 Comments Off on Address Space Monitor

Finally, I manage to release software on the unsuspecting world!

I wrote this tool in response to spending some painful time debugging a process which seemed unable to allocate a chunk of memory when most conventional tools were showing that the process wasn’t at the limit in terms of memory usage and the system hadn’t run out of swap space. The problem was virtual address space fragmentation.

Address Space Monitor is a windows tool that shows graphically how a process’ address space has been carved up and how big and where the biggest blobs of contiguous free memory are in the address space.

Naturally, if you are using the tool in earnest, the process which is giving you trouble will inevitably be resource heavy and slow. Hence, Address Space Monitor (ASM hence forth – not to be confused with assembly language source files) has been written to minimise its own resource usage while retaining boredom alleviating features such as fun colours and a bouncy CPU meter. You cannot yet use it to read mail, so it is still classed as “in development”. Oh yes, and the ‘a’ at the end of 0.5a means that it is alpha software.

Oh, where is it?

Author: charles
Filed Under Category: c++
Article
Comments: Comments Off

Template Metaprogramming Errors Friday 14th September 2007 Comments Off on Template Metaprogramming Errors

I’ve been having a very limited play with template metaprogramming and at one point I managed to end up with what appeared to be an infinitely recurring error message. It turns out that it wasn’t, as I was able to redirect all of the stderr output from the compiler to a file. It was, however, 22 megabytes.

$ g++ -Wall -pedantic -std=c++98 -O2 metatest.cc 2>dmp
$ wc -c dmp
22163606 dmp

Not bad for a 2k source file.

$ wc -c metatest.cc
2280 metatest.cc

I think a sensible compiler limit prevented the error message from getting “too” large.

metatest.cc:81: error: template instantiation depth exceeds maximum of 500
    (use -ftemplate-depth-NN to increase the maximum)

Author: charles
Filed Under Category: c++
Article
Comments: Comments Off

bashrc magic vs. terminfo Tuesday 28th August 2007 Comments Off on bashrc magic vs. terminfo

A while ago, I was fiddling around with a custom terminfo entry to try and get 256 colour mode working for angband with PuTTY. (No , I don’t have time to play; I do sometimes have enough time to compile the latest version and fiddle around with terminfo settings.)

So I managed to get it to work (incorrect initc settings), but at the same time I seemed to lose the capability where the current user, machine and working directory were set in the titlebar. It worked when putty declared itself as being “xterm”, but not when it declared itself as “putty-256color”.

I thought there must be something missing from the putty-256color terminfo entry that I had created, that existed in the xterm terminfo entry. So I trawled through and copied every setting that was even vaguely related across but without success.

As far as I could work out, the settings that should be being used were to do with the “status line”. This is an extra information line that some terminals have that is not part of the main terminal display area. If the terminal has this feature it should have the boolean feature hs (also hs in termcap speak). Then there are three other related features: tsl (or ts in termcap) moves the output location to the status line (to status line), fsl (fs) moves the output location back to the main terminal area (from status line) and dsl (ds) should clear the status line (disable status line). xterm uses the “status line” feature to refer to its window title.

These were all set identically in both the xterm and putty-256color terminfo entries, and yet for some reason it wasn’t working in the putty-256color setup. A check of the bash variables revealed that the PROMPT_COMMAND just wasn’t being set in the putty case, despite the terminal supporting all the necessary features.

Finally, I find the source (literally!) of my woes. For some completely unfathomable reason the bashrc supplied with current fedora releases includes this horrible kludge:

# are we an interactive shell?
if [ "$PS1" ]; then
    case $TERM in
        xterm*)
                if [ -e /etc/sysconfig/bash-prompt-xterm ]; then
                        PROMPT_COMMAND=/etc/sysconfig/bash-prompt-xterm
                else
                PROMPT_COMMAND='echo -ne "\\033]0;${USER}@${HOSTNAME%%.*}:\\
                                        ${PWD/#$HOME/~}"; echo -ne "\\007"'
                fi
                ;;
# etc, etc.

(backslash newline continuations added for blog readability.)

Argghhh! Hardcoded ANSI escape sequences and terminal names when there is a perfectly reasonable alternative.

I have since replaced the offending part of the script with this, more flexible – if mildly less readable, version:

# are we an interactive shell?
if [ "$PS1" ]; then
        if tput hs; then
                PROMPT_COMMAND="echo -n \\"$(tput tsl)${USER}@${HOSTNAME%%.*}:\\
                                      \${PWD/#${HOME//\\//\\\\/}/~}$(tput fsl)\\""
        else
# etc, etc.

It also has the added advantage of only evaluating one shell variable (PWD) each time, the theory being that if you change machines, user or home directory you are probably going to be spawning a new shell in any case.

Author: charles
Filed Under Category: Random
Article
Comments: Comments Off

Spam, now assassinated Friday 10th August 2007 Comments Off on Spam, now assassinated

I hadn’t quite realised how much spam was irritating me until I got rid of it all. Previously I had a fixed procmail filter for info@, sales@ addresses for my domain and then let thunderbird perform its adaptive junk mail filtering on anything left. Unfortunately, my primary email address must have been exposed a few months ago as I started receiving a lot more spam.

I thought that I was relatively happy with thunderbird perform the spam filtering task. It correctly detected 95% of spam, and didn’t generate any false positives. However, it is slow. I run thunderbird from a number of different locations, some over slow connections. When an email arrives thunderbird flags up a new mail icon, then downloads it and decides whether it’s spam. Too late! I’ve already seen the new mail icon; I have been disturbed my spam.

I installed spam assassin on my mail handling box, satisfying as many of the optional perl modules as I reasonably could.

I pointed sa-learn at my collection of 3,213 spam messages and at my last year’s worth of legitimate email for comparison.

procmail now puts all of the messages detected as spam by spam-assassin in my “assassinated-spam” folder. I’ve switched off thunderbirds junk detection and told it not to check the spam folder for new messages. So far spam-assassin has got every single message correct and I am no longer disturbed my spam. I still go into the “assassinated-spam” folder once in a while to check for false positives and admire how much annoyance I have been spared.

spam-assassin rules, go spam-assassin.

Author: charles
Filed Under Category: Random
Article
Comments: Comments Off

Well, technically…

Don’t try for exception safety Thursday 20th December 2007 3 Comments

git backups interacting with git Wednesday 5th December 2007 1 Comment

git as a general purpose backup utility Monday 3rd December 2007 Comments Off on git as a general purpose backup utility

extern arrays and pointers Saturday 17th November 2007 Comments Off on extern arrays and pointers

A mini test framework in a single header file Wednesday 31st October 2007 2 Comments

Address Space Monitor vs Google Sunday 28th October 2007 1 Comment

Address Space Monitor Tuesday 23rd October 2007 Comments Off on Address Space Monitor

Template Metaprogramming Errors Friday 14th September 2007 Comments Off on Template Metaprogramming Errors

bashrc magic vs. terminfo Tuesday 28th August 2007 Comments Off on bashrc magic vs. terminfo

Spam, now assassinated Friday 10th August 2007 Comments Off on Spam, now assassinated

Browse by Categories

Browse by Tag

Blogroll

Meta