Saturday, 13 April 2013

A pure Python kd-tree implementation

kd-trees are an efficient way to store data that is associated with a location in any number of dimensions up to twenty or so. Specifically, kd-trees allow for nearest neighbor searches in O(log n) time, something I desperately needed for my Blender tree generation add-on. In this article I highlight some of the design decisions that that shaped my pure Python implementation of a kd-tree module.

kd-trees

There exist quite a few implementations in C or C++ that are part of robust and well tested libraries but for my Blender add-on I needed a platform neutral (= pure python) implementation that I could ship with the add-on. There are a few Python implementations but because I could not determine how well tested they were or because they did not quite meet my requirements I decided to develop an implementation myself.

Requirements

My requirements might not be everybody's requirements so besides supporting adding nodes and providing a nearest neighbor search what were the issues that were important to me?

    Comes with test-suite
    It might seem strange to make this the first requirement but because it's a part that is so essential to my add-on and because the code is quite tricky, I opted for test driven development. Adding unit tests to a python module is quite simple and although I can't claim that the tests cover all edge cases, they are nevertheless quite comprehensive. There is also a small performance test included.
    Works with any iterable as position
    A node in a kd-tree consists besides other things primarily of a position attribute. For use in Blender this typically would be a Vector instance but in other situations you might want to use other vector implementations. This kd-tree implementation assumes very little about its position attributes, just that it is subscriptable and that it supports a .dot() member function that returns the sum of the pairwise multiplication of its items (in other words, the dot product which is equal to the distance squared). Individual items should be floats (or anything that supports substraction, addition and comparisons). Probably any vector implementation will meet these requirements. The test-suite implements its own, based on Python's list class.
    Allows easy deletion
    Deleting nodes from a kd-tree is tricky and costly because of the rebalancing that has to be performed. I therefor opted for a much simpler solution: Instead of deleting a node we set its data attribute to None and add an option to the .nearest() member function to ignore nodes without data. Of course this won't make the tree any smaller but even when we delete a third of the nodes in random fashion (a typical use case for my tree add-on) the performance impact on nearest neighbor searches is minimal.

Availability

The module is licensed as GPL and available on Github as part of the spacetree add-on for Blender. It is contained in a single source file that can be downloaded separately (click Raw to download).