Geohub
A couple months ago I finally realized an idea that I’d had for several years. Not a breakthrough idea, mind you, but something I thought would be nice to have, and that to me didn’t exist yet.
Specifically, I’d wanted a system for e.g. my phone to send location data to, which could then both be stored in a structured form and be available for later analysis, and also be available live for tracking applications. Thus, Geohub was born, which I (somewhat sarcastically) call a real-time geo data framework.
It basically does what I set out to create: it presents a simple REST API for ingestion that is compatible with existing applications such as the GPSLogger Android app and the Overland app. It also is simple enough to implement a client for it in a few lines of Python code. The API is described in detail in the README.md file I provide.
I wrote Geohub in Rust, using the Rocket 0.4 framework. In hindsight, this was not the perfect choice, as Rocket has some limits regarding scalability: it uses static worker threads, not the hot new Tokio asynchronous event loop model. I will explain below why this matters, especially for the live tracking feature. Otherwise, Rocket is very pleasant to work with, however, so it has proved to also not be the worst choice either.
Ingested data points are stored in a PostgreSQL database. Here I also went for utmost simplicity; note that Geohub doesn’t use PostGIS, for the simple reason that it just doesn’t make use of any advanced features yet. For the future, it’s definitely a good extension to consider though.
> \d geodata
+----------+--------------------------+-------------------------------------------------------+
| Column | Type | Modifiers |
|----------+--------------------------+-------------------------------------------------------|
| id | integer | not null default nextval('geodata_id_seq'::regclass) |
| client | text | not null |
| secret | bytea | |
| t | timestamp with time zone | |
| lat | double precision | |
| long | double precision | |
| spd | double precision | |
| ele | double precision | |
| accuracy | double precision | |
| note | text | |
+----------+--------------------------+-------------------------------------------------------+
For ingestion, Geohub currently provides two APIs: One to ingest a single point, encoded as URL parameters; and another one to ingest points in bulk, as JSON document. Depending on the client application, either one can be used. The single-point ingestion API is in general called like this, where most fields are self-explanatory:
POST /geo/<client>/log?lat=<latitude>&longitude=<longitude>&time=<time>&s=<speed>&ele=<elevation>&secret=<secret>
An important aspect of the API that you can see in this endpoint – which by itself only deals in geographical points, making
it very simple – is the client/secret system. In order to provide
separation between individual users, and different recordings of each user,
every ingested point is tagged with a client
and a secret
field. The
client
is any alphanumeric ID identifying a user; for example, alice
.
The secret is used to protect ingested points from being publicly available,
and also allows alice
to record different activities in her name. The idea
is very simple, but effective (and useful for building abstractions on top
of): The secret
for each point is hashed and stored with the geographical
information. Anyone wanting to retrieve this point needs to know the
client
and secret
of that point; Geohub then filters the points by
those two fields, only returning points with matching client
and
secret
. Points can also be stored without secret
, making them available
to anyone knowing the client
ID.
On the consumer side, Geohub presents an export API, allowing download of geographical data as GeoJSON or GPX, as well as a live tracking API:
# EXPORT
GET /geo/<client>/retrieve/gpx?secret=<secret>&from=<from_timestamp>&to=<to_timestamp> \
&limit=<maximum number of entries returned>&last=<id of last known entry>
# LIVE
GET /geo/<client>/retrieve/live?secret=<secret>&timeout=<timeout in sec>
Both APIs adhere to the client/secret system, only returning “authorized”
points. The export API is conceptually simple, as it only renders the
geographical points as GPX or GeoJSON. The live API is slightly more tricky to
implement efficiently: Geohub is using a background listener thread, occupying
one PostgreSQL connection. Every time a new point is added, a web serving thread
will issue a NOTIFY
command on a special channel for the given client
and
secret
. Conversely, as long as a client is hanging on a retrieve/live
API
call, the background thread will issue a LISTEN
command on the appropriate
channel. As soon as a NOTIFY
is sent on that channel, the background notifier
thread will notify the web serving thread handling the live tracking request,
causing it to return the new point. This will result in updates being
delivered mostly instantly. Essentially, inside Geohub there is a
client/server architecture to enable live tracking, with frontend threads
being the clients to a background server thread dispatching notifications
to the frontend threads. The background thread routes all notifications
through PostgreSQL; frontend threads ingesting points will issue a
notification through the database. This could also have been solved
within Geohub (eliminating database roundtrips), but this architecture
enables us to deploy multiple Geohub instances sharing the same database
– for example, have separate ingestion and live tracking instances –
while delivering updates consistently. Also, I like PostgreSQL, and
obviously want to make use of its features.
The live tracking API is also where the limitations of Rocket come into play: as this API operates with hanging requests for simplicity, each livetracking client consumes one thread. This will scale to hundreds of clients (because threads are not as inefficient as often assumed, it turns out that the kernel is not bad at doing what Tokio does, namely I/O event multiplexing and dispatching), but probably not to tens of thousands, mostly due to memory requirements of threads. I haven’t really benchmarked this so far, as the limitations are quite clear. Another issue of this approach is that live tracking clients will potentially starve ingestion clients, once no threads are left over – this would be a good first issue to worry about before enhancing scalability.
In any case, Geohub has been working very reliably for me and some friends. No crashes have been reported, and I don’t have to care about day-to-day maintenance.
Ingestion
Now with all the technical stuff out of the way, let’s take a look at how Geohub
is actually used. I mentioned above that there are various apps already allowing
ingestion, which is pleasant, as I didn’t have to write my own.
Apart from that, I wrote two Python clients that can be found in the
examples
directory.
One is called
track_ICE
,
and does just that: German high speed trains (ICE, InterCity Express) and most
IC (InterCity) trains present a location API on their on-board WiFi. Then, even
with poor GPS reception, we can log reliable location updates to a Geohub
instance. Due to some sampling issues (the reported location is only updated
every couple seconds), this is slightly worse than an actual GPS receiver – but
obviously much better than the spotty GPS signal inside a train.
The other one connects to
gpsd and
enables high-frequency high-accuracy ingestion of NMEA reported points. We can
either use a computer-attached GPS receiver and report the gpsd data, or use a
gpsd forwarding app on a phone sending NMEA UDP packets to a gpsd instance which
makes it available to the gpsd.py
script.
Both scripts are relatively short, showing how easy it is to ingest points to
Geohub: all that’s needed is import json
and import requests
!
There is also a web-based ingestion script called trackme.html
, which uses XHR
requests to deliver Location API data from the browser to the API: I mostly used
this for debugging, as it’s not very reliable for long-term data recording.
UI
Finally, we want to make the ingested data available to our friends. For that, I wrote a bare-bones UI which has proven to be quite reliable. Without any framework, it loads quickly on all devices and has in general proven quite useful. It issues XHR requests to the live tracking endpoint described above, offering real-time location updates. I chose MapBox as map provider, as I like their style; the map is embedded using Leaflet, meaning that other map providers can be integrated without trouble too. A tooltip shows the most recent location, along with timestamp and speed. The location accuracy is indicated with a circle, and a the complete track can be downloaded as either GPX or GeoJSON file.
Conclusion
Even though this project is far from being highly polished, I’ve had quite some fun with it. It took only a couple lines of code to write both backend and UI, and has proven very stable so far.
Of all the potential improvements, I’m planning to look into using a more asynchronous model next, to allow for better scalability. On the same issue, it would be nice to use websockets instead of hanging HTTP requests for the live updates – with modern Rust web frameworks, I’m sure there already exists a solution for this. Rocket is supposed to come out with a Tokio-based release 0.5 soon-ish (?), and with PostgreSQL libraries based on Tokio already being available, there shouldn’t be much stopping me from migrating. At that point, nothing should be in the way of scaling to thousands of clients.