Your irregular status update
Added 2023-03-29 14:57:29 +0000 UTCLet's see, what is there to talk about since the last:
Mostly, it's been a whole lot of bug fixing and stabilizing, grinding away at test failures in the CI: https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs
We're currently at ~50 test failures, which doesn't seem like progress compared to 6 months ago at first - except that since then I've added the xfstests-nocow tests, and also added new things for our existing tests to check (e.g, tests now check the number of slowpath events and fail the tests if they were over some threshold). Most of the test failures we're seeing (as well as user bug reports) are looking quite a bit less serious than they were six months ago, too.
Backpointers has pretty well settled down, we got all the known bugs in snapshots fixed (including deletion, except for snapshots + quotas), and erasure coding has seen a lot of progress too.
The next big thing for erasure coding is a patch set that switches the data move path to not initially replicate writes, as with foreground writes, but instead wait for a stripe to be completed before doing an index update. With that, erasure coding survives a copygc torture test - so that's a big milestone.
Large folio support was just merged! This means data in the page cache is no longer stored as fixed size 4k pages; instead, page cache folios are a variable (power of two) number of pages. It's a big performance boost for all the buffered IO paths.
Special mention to Matthew Wilcox for implementing folios and large folios in the Linux kernel, it's been a huge multi year effort by him.
And, there's been a lot of perfomance work done over the past ~5 months. Redhat uncovered some major performance regressions, so I went over the entire git history with a fine tooth combed and new automated performance testing tooling hunting for regressions. The big performance issue was from backpointers - but the new btree write buffer (initially created for the LRU btree) has solved that.
Oh, and there's now a fragmentation LRU: this means copygc no longer has to scan the alloc btree to pick buckets to evacuate - another big scalability improvement.
LSF is coming up in May, maybe we'll have something to talk about on the upstreaming front then...
Comments
Development is like the weather. Sometimes it is sunshine and ice cream, other times it is foggy and miserable. Keep at it. Just take a break sometimes to clear your head.
veritanuda
2023-03-29 16:42:25 +0000 UTC