Fail Fast

One of the more generally applicable ideas that comes from programming is “fail fast”.

“Fail fast” is pretty simple. If your program can’t handle some state it ended up in, immediately bailing out with relevant data will make debugging and recovering from the error much easier. In case you haven’t encountered this before, your other option is to return some known-good default since you can’t do anything meaningful. Here’s a quick example.


def divide(x, y):

    ''' Divide x by y. '''

    if y == 0:
        # Return a known good default so that we don't crash the program
        return 0
    else:
        return x / y


def divide(x, y):
    ''' Divide x by y. '''

    # Fail immediately rather than continiuing if we get bad input.
    return x / y

If you think of what code might be calling these functions, you can see why there’s merit to the fail-fast case. If we return the zero, the program continues as if the correct answer to the division was 0, possibly causing more subtle errors in the future that could have been prevented. In the fail-fast case, the program crashes when it supplies y = 0 to this function. Crucially, it’s already done something wrong by that point - we’ve already executed past the bug that we need to fix. We should never be calling this function in a way that causes a crash. By failing here, we are left with a stack trace showing what function calls led to the incorrect one, giving us a finite list of places to look to find our bug. In general, failing fast is a good way to write code.

There are exceptions to this. The most obvious is in situations where we can never tolerate a crash: in firmwares for traffic lights, engine control modules, industrial systems, and other places where safety is on the line. Notably though, writing code to fail fast is going to get your code more resilient faster - you’ll discover and fix bugs faster - so there may be a case for prototyping such applications using this style of programming.

Another thing - you are less likely to see code that doesn’t fail fast in the pattern above. More often, you’ll see it as suppression of an exception:


def divide(x, y):

    ''' Divide x by y. '''

    try:
        return x / y
    except Exception, e:
        return 0

This function will never crash, but it’s also unafraid to return incorrect results for incorrect input rather than telling the caller it’s wrong.

If you’re not a programmer and you made it this far - or you’re skimming but you’ve got the general idea - here’s where it gets practical for you. I also think failing fast is applicable to individuals working on a team towards a common goal. Problems don’t get smaller over time; the schedule won’t carefully get back on track. Making others aware there is a problem, as early as possible, will let them work around the smaller, easier to handle early version of the problem rather than the massive overrun right at the end of the project. To support this we need to work towards a blame-free culture (or if there is blame placed, it must be placed on the methods & systems being used - I’ll write on this another time). Team members can’t be afraid to be blamed for the issues they bring up - if they are, they won’t bring them up until it’s too late. Help your team fail fast so you can fix what’s wrong early and get back on track.