Fuzz Testing vs Property-Based Testing

Posted on July 3, 2021 by Brian Jaress

Tags: code, advice, reference

When I see property-based testing and fuzz testing discussed online, there’s confusion and debate over what they are and how they differ. I’ve used and liked both of them, so I’m adding my thoughts here.

What Fuzz Testing and Property-Based Testing Are

It’s easiest to think about property-based testing and fuzz testing at the level of tests, so that an individual test can be a property test or a fuzz test, depending on what the test expects and what range of inputs is considered.

Fuzz Tests: expect that some disaster never occurs on any input.
Property Tests: expect that the output has some desirable property for some broad range of input.

A few key points:

The range of inputs considered by a property test can be all inputs, but sometimes narrowing it down supports more helpful properties.
Narrowing down the inputs considered in a fuzz test doesn’t improve the test, unless you’re OK with disasters for some inputs.
Random (or partly random) sampling is a practical substitute for trying all the inputs considered.

An Example

Suppose you want to test a lifespan predictor, which takes as input someone’s age and outputs the age at which they are expected to die.

Fuzz Test

For a fuzz test, we have to define what disaster we are trying to avoid. I’ll say that the lifespan predictor is a Python function that we are trying to unit test, and that this function is expected to reject invalid input by raising a ValueError. Based on that, I’ll also decide that a disaster means any other error besides a ValueError.

def test_with_fuzz():
    # In real life, testing tools handle repetition for you
    for _repeat in range(100):
        try:
            lifespan(any_input())
        except ValueError:
            pass

I also have to decide what I mean by “any input.” Python is flexible about what you can pass to a function, and I’ll include not just numbers but also some inputs of other built-in types that are obviously not ages.

import random

def any_input():
    """A crude random input generator, for demonstration purposes only.

    In real life, testing tools provide something better.
    """
    options = [
        # integer
        random.randint(-2**70, 2**70),

        # number with decimal point
        random.random(),

        # text
        ''.join(chr(random.randint(0, 0x10FFFF))
            for _ in range(random.randrange(100))),

        # the usual suspects
        random.choice([None, 1, 0, -1, "", []]),
    ]
    return random.choice(options)

If I were testing a whole program, “any input” might mean any byte sequence and “disaster” might mean a crash, or data loss.

Property Test

For a property test, I’ll have to choose a property and the range of input considered. There aren’t many interesting properties shared by both a ValueError and an acceptable life span prediction, so I’ll narrow things down to valid input and assert some property related to the correctness of the output.

To help define what is valid input and what is correct output, I’ll decide that the lifespan function is for people who are alive at the age used as input. Based on that, I’ll decide that valid input is a non-negative integer, and the predicted lifespan should be at least as high as the input age. People aren’t alive yet at negative ages, and anyone alive at a certain age must be at least that old when they die.

We could imagine designing things differently and saying that someone with a very high age is probably already dead or that people born a thousand years in the future ought to live to a hundred and forty-nine. But based on the decisions I’ve made, we can write a property test that supplies non-negative integers as input and expects the output to be greater than or equal to the input.

import random

def test_a_property():
    # In real life, testing tools handle repetition for you
    for _repeat in range(100):
        age = random.randint(0, 2**70)
        prediction = lifespan(age)
        assert prediction >= age

The Contrast Between Fuzz Tests and Property Tests

To contrast these two tests, consider implementations of lifespan that pass one but not the other. Here’s an implementation that always raises a ValueError:

def lifespan(age):
    raise ValueError()

This passes the fuzz test, and it should: It never causes a disaster by raising an error that is not a ValueError. It will fail the property test because it raises a ValueError for valid inputs, incorrectly rejecting them as invalid.

Here is an implementation that passes the property test, but fails the fuzz test:

def lifespan(age):
    return age + 1

It’s a very pessimistic lifespan predictor, predicting that everyone will die soon by just adding one to the age to get the predicted lifespan. But pessimism is not the reason that it fails the fuzz test. It fails because some invalid inputs, such as None or the empty string, cannot be added to one. Trying to do so causes a TypeError, which is not a ValueError.

It’s worth noticing that the fuzz test caught a failure to safeguard against violated assumptions (that age is a number) while the property test caught a failure to produce correct answers, even when basic assumptions are met. Also, an even more pessimistic implementation that returns the age itself as the prediction, expecting everyone to die before their next birthday, passes both tests.

Is This Distinction Real?

If you think of “not being a disaster” as a desirable property, you can think of fuzz tests as a special type of property test, but even people who agree with that are unlikely to give you a fuzz test as an example of a property test. There’s at least a sense that the distinction matters.

I’ve been using the loose term “disaster” to emphasize that the line defining a disaster is usually practical, cultural, and very straightforward. Python has special language constructs for handling and distinguishing among errors. When testing a whole program, crashes usually show up differently than wrong answers and are qualitatively very different. Fuzz tests tend to enforce simple, bright lines, drawn without using detailed knowledge of the code’s purpose.

A typical property test includes carefully chosen expectations based on understanding what a correct answer looks like for the specific code being tested. That can be difficult, but it allows you to confirm things that you couldn’t with a fuzz test.

Those practical differences between fuzz tests and property tests can translate into different concerns in research or the development of testing tools.

A Gotcha Around Combining Property Tests and Fuzz Tests

I sometimes see people creating a misleading combination of a property test and a fuzz test, like so:

import random

def test_that_is_misleading():
    # In real life, testing tools handle repetition for you
    for _repeat in range(100):
        age = any_input()
        try:
            prediction = lifespan(age)
        except ValueError:
            continue # skip to next random age
        assert prediction >= age

That might appear to be a replacement for both test_a_property and test_with_fuzz, but it is not. For example, this test is passed by the implementation that always raises a ValueError, even though that implementation fails test_a_property.

It is possible to correctly combine test_with_fuzz and test_a_property, but it involves explicitly stating the distinction between valid and invalid input, which people who combine the tests in a misleading way are usually trying to avoid.

Conclusion

The distinction between a property test and a fuzz test is mostly in what is expected by the test and the associated differences in what range of input is considered. Although fuzz tests can be seen as a type of property test, there are practical differences related to the designation of some outcomes as disastrous and never acceptable.