Geohub

Sat, May 22, 2021 tags: [ programming rust ]

Geohub on Github

A couple months ago I finally realized an idea that I’d had for several years. Not a breakthrough idea, mind you, but something I thought would be nice to have, and that to me didn’t exist yet.

Specifically, I’d wanted a system for e.g. my phone to send location data to, which could then both be stored in a structured form and be available for later analysis, and also be available live for tracking applications. Thus, Geohub was born, which I (somewhat sarcastically) call a real-time geo data framework.

It basically does what I set out to create: it presents a simple REST API for ingestion that is compatible with existing applications such as the GPSLogger Android app and the Overland app. It also is simple enough to implement a client for it in a few lines of Python code. The API is described in detail in the README.md file I provide.

I wrote Geohub in Rust, using the Rocket 0.4 framework. In hindsight, this was not the perfect choice, as Rocket has some limits regarding scalability: it uses static worker threads, not the hot new Tokio asynchronous event loop model. I will explain below why this matters, especially for the live tracking feature. Otherwise, Rocket is very pleasant to work with, however, so it has proved to also not be the worst choice either.

Ingested data points are stored in a PostgreSQL database. Here I also went for utmost simplicity; note that Geohub doesn’t use PostGIS, for the simple reason that it just doesn’t make use of any advanced features yet. For the future, it’s definitely a good extension to consider though.

> \d geodata
+----------+--------------------------+-------------------------------------------------------+
| Column   | Type                     | Modifiers                                             |
|----------+--------------------------+-------------------------------------------------------|
| id       | integer                  |  not null default nextval('geodata_id_seq'::regclass) |
| client   | text                     |  not null                                             |
| secret   | bytea                    |                                                       |
| t        | timestamp with time zone |                                                       |
| lat      | double precision         |                                                       |
| long     | double precision         |                                                       |
| spd      | double precision         |                                                       |
| ele      | double precision         |                                                       |
| accuracy | double precision         |                                                       |
| note     | text                     |                                                       |
+----------+--------------------------+-------------------------------------------------------+

For ingestion, Geohub currently provides two APIs: One to ingest a single point, encoded as URL parameters; and another one to ingest points in bulk, as JSON document. Depending on the client application, either one can be used. The single-point ingestion API is in general called like this, where most fields are self-explanatory:

  POST /geo/<client>/log?lat=<latitude>&longitude=<longitude>&time=<time>&s=<speed>&ele=<elevation>&secret=<secret>

An important aspect of the API that you can see in this endpoint – which by itself only deals in geographical points, making it very simple – is the client/secret system. In order to provide separation between individual users, and different recordings of each user, every ingested point is tagged with a client and a secret field. The client is any alphanumeric ID identifying a user; for example, alice. The secret is used to protect ingested points from being publicly available, and also allows alice to record different activities in her name. The idea is very simple, but effective (and useful for building abstractions on top of): The secret for each point is hashed and stored with the geographical information. Anyone wanting to retrieve this point needs to know the client and secret of that point; Geohub then filters the points by those two fields, only returning points with matching client and secret. Points can also be stored without secret, making them available to anyone knowing the client ID.

On the consumer side, Geohub presents an export API, allowing download of geographical data as GeoJSON or GPX, as well as a live tracking API:

  # EXPORT
  GET /geo/<client>/retrieve/gpx?secret=<secret>&from=<from_timestamp>&to=<to_timestamp> \
        &limit=<maximum number of entries returned>&last=<id of last known entry>

  # LIVE
  GET /geo/<client>/retrieve/live?secret=<secret>&timeout=<timeout in sec>

Both APIs adhere to the client/secret system, only returning “authorized” points. The export API is conceptually simple, as it only renders the geographical points as GPX or GeoJSON. The live API is slightly more tricky to implement efficiently: Geohub is using a background listener thread, occupying one PostgreSQL connection. Every time a new point is added, a web serving thread will issue a NOTIFY command on a special channel for the given client and secret. Conversely, as long as a client is hanging on a retrieve/live API call, the background thread will issue a LISTEN command on the appropriate channel. As soon as a NOTIFY is sent on that channel, the background notifier thread will notify the web serving thread handling the live tracking request, causing it to return the new point. This will result in updates being delivered mostly instantly. Essentially, inside Geohub there is a client/server architecture to enable live tracking, with frontend threads being the clients to a background server thread dispatching notifications to the frontend threads. The background thread routes all notifications through PostgreSQL; frontend threads ingesting points will issue a notification through the database. This could also have been solved within Geohub (eliminating database roundtrips), but this architecture enables us to deploy multiple Geohub instances sharing the same database – for example, have separate ingestion and live tracking instances – while delivering updates consistently. Also, I like PostgreSQL, and obviously want to make use of its features.

The live tracking API is also where the limitations of Rocket come into play: as this API operates with hanging requests for simplicity, each livetracking client consumes one thread. This will scale to hundreds of clients (because threads are not as inefficient as often assumed, it turns out that the kernel is not bad at doing what Tokio does, namely I/O event multiplexing and dispatching), but probably not to tens of thousands, mostly due to memory requirements of threads. I haven’t really benchmarked this so far, as the limitations are quite clear. Another issue of this approach is that live tracking clients will potentially starve ingestion clients, once no threads are left over – this would be a good first issue to worry about before enhancing scalability.

In any case, Geohub has been working very reliably for me and some friends. No crashes have been reported, and I don’t have to care about day-to-day maintenance.

Ingestion

Now with all the technical stuff out of the way, let’s take a look at how Geohub is actually used. I mentioned above that there are various apps already allowing ingestion, which is pleasant, as I didn’t have to write my own. Apart from that, I wrote two Python clients that can be found in the examples directory.

One is called track_ICE, and does just that: German high speed trains (ICE, InterCity Express) and most IC (InterCity) trains present a location API on their on-board WiFi. Then, even with poor GPS reception, we can log reliable location updates to a Geohub instance. Due to some sampling issues (the reported location is only updated every couple seconds), this is slightly worse than an actual GPS receiver – but obviously much better than the spotty GPS signal inside a train.

The other one connects to gpsd and enables high-frequency high-accuracy ingestion of NMEA reported points. We can either use a computer-attached GPS receiver and report the gpsd data, or use a gpsd forwarding app on a phone sending NMEA UDP packets to a gpsd instance which makes it available to the gpsd.py script.

Both scripts are relatively short, showing how easy it is to ingest points to Geohub: all that’s needed is import json and import requests!

There is also a web-based ingestion script called trackme.html, which uses XHR requests to deliver Location API data from the browser to the API: I mostly used this for debugging, as it’s not very reliable for long-term data recording.

TrackMe

UI

Finally, we want to make the ingested data available to our friends. For that, I wrote a bare-bones UI which has proven to be quite reliable. Without any framework, it loads quickly on all devices and has in general proven quite useful. It issues XHR requests to the live tracking endpoint described above, offering real-time location updates. I chose MapBox as map provider, as I like their style; the map is embedded using Leaflet, meaning that other map providers can be integrated without trouble too. A tooltip shows the most recent location, along with timestamp and speed. The location accuracy is indicated with a circle, and a the complete track can be downloaded as either GPX or GeoJSON file.

LiveMap

Conclusion

Even though this project is far from being highly polished, I’ve had quite some fun with it. It took only a couple lines of code to write both backend and UI, and has proven very stable so far.

Of all the potential improvements, I’m planning to look into using a more asynchronous model next, to allow for better scalability. On the same issue, it would be nice to use websockets instead of hanging HTTP requests for the live updates – with modern Rust web frameworks, I’m sure there already exists a solution for this. Rocket is supposed to come out with a Tokio-based release 0.5 soon-ish (?), and with PostgreSQL libraries based on Tokio already being available, there shouldn’t be much stopping me from migrating. At that point, nothing should be in the way of scaling to thousands of clients.