XaiJu
bcachefs
bcachefs

patreon


Telemetry

Recently, I added a new superblock section for tracking counts of every distinct filesystem error (i.e. fsck error) since filesystem creation, as well as the date of the most recent error.

The idea is that inconsistencies that fsck is able to repair often don't go reported - but they still need to be fixed. And I won't know to go bug hunting if I don't know they're happening.
So, I'd like to add some telemetry - opt in, of course. I'm thinking a weekly cron job to upload the superblock errors section (or perhaps full superblocks, which shouldn't contain anything sensitive but would include filesystem configuration that would be helpful in correlating errors).

Any suggestions on easy ways to implement telemetry, or useful library code?

Comments

The Syncthing project may give you some inspiration: - usage data: https://docs.syncthing.net/users/security.html#usage-reporting , visualized at https://data.syncthing.net/ - crash reporting: https://docs.syncthing.net/users/crashrep.html

franknord23

Would something like Cloudflare's ebpf_exporter be useful here [1] ? My first thought was presenting the erroneous info in a format compatible with Prometheus [2] [3]. I think telemetry is a pretty controversial subject, but I understand the desire for it in identify and fixing problems. On the user end, I would like to see a properly secured/permissions-limited (seccomp?) binary/daemon that scrapes, organizes the information locally, and optionally uploads it. This way, these metrics can be useful locally to the user and can be attached to bug reports. Additionally, the output can be further massaged into visualizations of the user's choosing. For uploads, the user can configure optional identifiable info (i.e. email) that further enriches the data. I have no idea what a good analytics-infrastructure would look like. I was thinking AWS or Google Cloud, but that's another layer of complexity altogether, raises privacy concerns (maybe needing Privacy Policies and legal advice) and possibly a huge distraction. I would start with getting the metrics useful in a local capacity first. EDIT: I know that homebrew for sure leverages Google Analytics and seems pretty transparent (with the major controversy being automatically opt-in by default) [4]. [1] https://github.com/cloudflare/ebpf_exporter [2] https://prometheus.io/docs/instrumenting/writing_exporters/ [3] https://prometheus.io/docs/instrumenting/clientlibs/ [4] https://docs.brew.sh/Analytics

Stanley Chan

Mozilla has a pretty well developed system for pushing up crashes, as well as other telemetry. Maybe parts of that are reusable?

Scrivener

I have no idea about implementing it but this seems like a really cool idea for detecting ongoing issues with a server or hardware. Similar (but more detailed) to what i end up getting with `zpool status` to monitor for disk issues that aren't obvious until something actually fails.

Ryan Voots


More Creators