sondehub

History API improvements

Added 2021-08-06 10:46:14 +0000 UTC

Last weekend I was able to get a significant amount of working done improving the history endpoints. The APIs have been updated and you can see an improvement in the latest version of the SondeHub Python library, along with the sondehub.org/card endpoint. We can see in the screenshot the old sondehub client would take minutes to download all sonde data while the new client took all of 4 seconds. These improvements means that we are now only keeping a few months of data in ElasticSearch.

To achieve this performance we break the task into two parts. Roughly every 6 hours a scheduled task runs to query ElasticSearch for radiosondes in the last 24 hours and adds them to a queue.

A queue worker then queries ElasticSearch and the S3 archive for all the data available for that particular serial number, merges all the data and re-uploads to S3. Having a queue made it simple to backfill all the old radiosondes.

This approach is cost effective and bandwidth efficient. The only downside is that there may be several hours of lag before radiosonde data lands in S3, but for most use cases this isn't a problem.

Bucket

Of course the SDK isn't the only way to consume the data. We also provide the S3 bucket open to anyone that wants to process the data. You can use the bucket explorer to explore it: https://sondehub-history.s3.amazonaws.com/index.html

One of the neat things we've done in the bucket design for this iteration is that the date prefix only includes a small subset of frames. The first frame, the highest frame, and the last frame. This allows us to use S3 like a mini database of radiosonde launch metadata. I look forward to see what comes out of that :)

That's all for the moment, have a great weekend!
~ Michaela.