Open sourcing Zoba's Julia geohashing package

by Evan Fields, lead data scientist at Zoba. Evan holds a PhD in Operations Research from MIT, believes aggressively in cities, bakes enthusiastically, and can be found on Twitter at @evanjfields.

Zoba provides demand forecasting and optimization tools to shared mobility companies, from micromobility to car shares and beyond.

At Zoba, we deal with a lot of event data that happens at discrete points: a user unlocks a scooter, a vehicle has its battery swapped out, a user ends a one-way car-share trip, etc. When analyzing these data, it’s often useful to group events spatially. Typically this grouping is accomplished via a grid system. A grid is a set of non-overlapping polygons which covers a region, usually the whole world. Each point is therefore associated with a single polygon (a “grid cell”), and we can discuss statistics like “the average number of rides on a Wednesday” for each grid cell. Many grid systems also support varying granularity so that the world may be divided into a few large polygons or many small polygons.

The most common grid systems — also known as geocoding systems and closely related to spatial indices — include Uber’s H3, Google’s S2, and variations on the geohash system. Every grid system involves trade-offs between desiderata such as simplicity, speed, low spatial distortion, equally sized cells, consistent distances between neighboring cells, and so forth. Stay tuned for a blog post exploring the trade-offs between these systems!

Geohash based systems are among the simplest grid systems; they “unwrap” the surface of the Earth to a longitude-latitude rectangle and then subdivide that rectangle. This procedure introduces nontrivial spatial distortion, especially over large regions. Nonetheless, at Zoba we’re big fans of geohash systems because they’re so simple and work well in most cases where the cells are small. In particular, geohash grid cells are intuitive and easy to explain: they’re just latitude-longitude rectangles. Geohash cells at different granularities also nest perfectly (i.e. a granularity-n cell can be perfectly divided into granularity-(n+1) cells), a useful property not shared by all grid systems.

Accordingly, many of our analyses have used Hilbert-curve flavored geohashing¹ as provided by a small open-source Python library. In addition to Python, we use Julia heavily, especially for computationally-intensive data science work. So we decided to write a Julia implementation of Hilbert curve geohashing, which would allow us to use geohashing in Julia without relying on language bridges. Further, we’re big fans of open source — much of our tech stack from LibGEOS on up to Django is open source software — and we wanted to give something back to the community.

To that end, today we’re delighted to announce the initial release of GeohashHilbert.jl, a small pure Julia package providing Hilbert-curve geohashing. This package is fully interoperable with the aforementioned Python package; encoding locations as geohash cells or decoding cells to locations will provide the same result in either package. We welcome use, feedback, and contributions, and hope to release more Zoba-built open source geospatial tools in the future.


[1] A Hilbert curve is used to order the grid cells in the subdivided longitude-latitude rectangle; this has the nice property that nearby grid cells tend to have similar identifiers.

Zoba is developing the next generation of spatial analytics in Boston. If you are interested in spatial data, urban tech, or mobility, reach out at zoba.com/careers.