Sorting Colors by Similarity

Posted on November 27, 2012 by Brian Jaress

I’ve written a small script to sort colors by visual similarity. Here is the output as tab-separated values and as HTML when sorting the results of the color survey by xkcd.

(He showed people random colors and asked them to name the color, then he averaged together colors given the same name. The results are striking. Many of the blues, for example, look more blue than a numerically pure blue.)

Why?

The original reason for this was to make differences between similar colors more visible. If you have a list of colors several pages long, it’s hard to tell the difference between a yellow on page one and a yellow on page three, so it’s much easier if all the yellows are together.

The way I’ve actually ended up using it is to word-search through the results for similar colors. For example, if you search for “off white,” similar colors like pale grey and eggshell will be right there with it.

When you’re squeezing something a complex as color into a flat list, perfection is probably not possible. There are places where the list switches abruptly from one group of similar colors to another very different group of similar colors, such as from navy to burnt yellow.

Overall, though, I’m pleased with the results. Pretty much every color is part of a continuous group of similar colors, and most of the groups are near similar groups.

It’s also interesting to look at which human-assigned names ended up near each other. For example, there’s an area with a lot of wine-related names like “merlot” and “bordeaux” and another area with intense green names like “radioactive green” and “poison green.” There’s also an area with a lot of disgusting names, like “vomit green,” “booger,” and “bile” (though that same area contains “pea soup green” and “avocado”).

Code

The code is actually pretty simple, thanks to some nice libraries.

The main goal here is to avoid doing clearly wrong things on longer lists, ones that would be too much work for a human to organize. We’re not looking for a one, true, best order but something pretty good.

The solution I finally stumbled on has two parts:

Put the colors in a format where their numerical differences are at least somewhat a reflection of how they differ visually. For this, I used Juan Quiroz’s code to put the colors in CIELAB format.
Do hierarchical cluster analysis and walk the tree, both using the amazing SciPy.

Part one is just grabbing colors from the input and calling Juan’s function, but part two is actually even simpler:

def cluster(points, descriptions):
    """Sort descriptions based on point similarity.

    points should be a list of multidimensional points.
    descriptions should be a list with the same length
    as the points list.

    The descriptions will be returned in an order based
    on which corresponding points are similar --
    descriptions of points that are similar will be near
    each other in the returned order.

    """
    for index in leaves_list(linkage(array(points),
            method='weighted', metric='mahalanobis')):
        yield descriptions[index]

SciPy’s scipy.cluster.hierarchy module has a linkage function to do the hierarchical cluster analysis and a leaves_list function that walks the hierarchy tree and gives you an order. Easy as pie, except I didn’t even think of cluster analysis at first and tried a lot of dead ends.

What about weighted and mahalanobis? Well, the linkage function gives you a bunch of options for how it does the analysis, and the defaults were unfortunately not the best of the bunch. Those two are the best combination I found, but someone with actual expertise in clustering (or a lot more patience for trying different combinations) could probably do better.

Update 2015-12-12: Split off the color list into separate files and rewrote the the first half of the post to be clearer while I was at it. Also changed the title.