sondehub

WebSockets

Added 2021-06-21 00:09:04 +0000 UTC

Last post I mentioned one of the improvements we've made is reducing the cost of the WebSockets feed. I want to spend sometime to explain what it is, what it's for, why it's important and our current implementation details.

In a traditional HTTP setup your client (eg browser) has to request data. This is called polling, as the client has ask, or poll, for new data. As you need to choose how often to poll, it usually means the client is several seconds out of date. It's also resource intensive and requires making decisions on how much data is sent back to the client.

WebSockets resolve some of these issues by providing a communication channel between the server and the client that's always open. The server can send data to the client without the client requesting it. This means that the client no longer polls and all we need to do is send new data to any WebSocket that happens to be open.

SondeHub is pretty unique in that we collect as much data as possible with a fairly quick upload rate. The SondeHub network is huge, with over 300 receivers.

A lot of the usefulness of the data comes from having such a large network of receivers (thank you!) however having a large network is only as useful as the applications built on top of that data. If we were to lockdown access to just using the SondeHub website there would be no innovation, and no way for users to build and develop things using the data they have uploaded to us. This is why providing the data back to the community is so important to us. We want to share the data that people upload.

So how does this fit into WebSockets? Well one of the great things about SondeHub is being able to process live data, as it's being received. Anyone should be able to have the same privileges as we do to be able to process that data. Our WebSocket implementation allows anyone with a compatible MQTT WebSocket client connect and access the important. As WebSockets work in the browser this means that the data could be processed in browser, or within a server application making it quite accessible.

Our implementation

The original implementation we used was AWS IoT. As payloads would arrive on our endpoint we would unwrap them, parse them and upload them to AWS IoT. To provide access to the WebSocket endpoint we would provide presigned URLs to the WebSocket endpoint.

While this approach worked, its biggest problem was cost. AWS IoT provides an MQTT endpoint which simplifies running the broker for these messages, and has a bunch of useful features for managing IoT devices. We didn't use any of the special features of AWS IoT however, so a lot of the features went to waste and I can only assume that a large proportion of the cost went into funding these features.

So we've abandoned AWS IoT for our own home grown solution. For this we are using an AWS Fargate container configured as a service. We process hundreds of messages per second, rather than millions per second, so we actually don't need any fancy multi server architecture here and can settle for a single server. If we start processing many more messages we can scale the size of the instance.

For our MQTT broker we are using Mosquitto, an opensource MQTT broker that supports MQTT over WebSockets. We are using the prebuilt docker container that can be found on docker hub. One of the more tricky bits to this is getting a config into the container. There's a couple of ways to do this, such as baking your own container image with the config built into, however I opted for a side car approach using the aws-cli to copy the config files into a volume prior to starting. The side car configuration looks a bit like this:

It's not the most elegant solution, but it certainly works for us, and means we don't need to worry all that much about a proper docker container build chain.

The actual configuration of Mosquitto is fairly basic:

max_qos 0
persistence false
listener 8883 0.0.0.0
protocol mqtt
listener 8080 0.0.0.0
protocol websockets
allow_anonymous true
password_file /mosquitto/config/passwd
acl_file /mosquitto/config/acl
http_dir /mosquitto/config/html

As we aren't doing anything special with Mosquitto we can keep the config short and basic. You'll notice that we haven't configured HTTPS or TLS certificates here. This is because the application load balancer is performing the TLS termination for us, which save a bunch of time and effort.

And that's about it. If your curious about building an application using the websockets endpoint you can check out the example pysondehub project : https://github.com/projecthorus/pysondehub