Encryption
Added 2016-08-07 16:10:52 +0000 UTCBeen studying random papers/RFCs/Dan Bernstein's code and figuring out the plan for adding encryption to bcachefs... doing crypto right is hard. In storage land, I'm not sure anyone really gets it right - if you're doing block storage (e.g. dm-crypt), or if you're adding encryption to an existing filesystem, you're kind of screwed since you have no place to stick a nonce. Unless I missed it when I was reading the code, ext4 doesn't even try to use block or file offset or anything - it's just AES(key, data) in ecb mode, which is really bad.
Nonces: I'm not going to go into why they're important for your encryption, just what they mean for the filesystem. A nonce is some additional input to your encryption function - 96 bits is typical. A nonce doesn't have to be secret - an incrementing counter is generally fine - but the important thing is that you must never, ever reuse the same nonce twice.
Using the physical offset on disk as a nonce is the typical solution when you can't store a real nonce somewhere - if an attacker only gets to see your disk image once, they won't see any blocks that have been encrypted with the same nonce. But if they can look more than once and see blocks that have been overwritten with new data, then they can see multiple messages (blocks) that have been encrypted with the same key:nonce - which is really bad. Or worse, if you're using an SSD when you overwrite data you aren't really overwriting data - the old data is still there until the SSD's internal garbage collection reclaims it, and a sophisticated enough attacker could certainly find it. So implementing nonces correctly is emphatically worth it, if we can.
Anyways, it's the filesystem's responsibility to generate these nonces, and store them somewhere so we have them when we go to decrypt. Conveniently, keys in bcache have a version number field - I'm going to expand that to 96 bits (which is a good thing to do for other reasons anyways), and then we just have to make sure that when encryption is enabled we're generating unique version numbers.
There's a corner case with partially overwritten extents and copygc: when we rewrite a partially overwritten extent, we can't generate a new version number - that would break other uses of version numbers. So that means when we go to rewrite and reencrypt the data, we're actually encrypting different data (we start encrypting from a different position than before) with the same version number - but we can solve that issue by also incorporating the offset field into our nonce.
So normal data extents are easy. We also need to encrypt metadata too, though - btree nodes and journal entries. They already have fields we can use for nonces, fortunately.
Algorithms: I'm primarily interested in supporting ChaCha20 and Poly1305. ChaCha20 beats AES in software (by a lot), and even if you have hardware AES ChaCha20 is quite fast. AES will almost definitely be supported, as well as other MACs.
Using a cryptographic MAC is also generally considered to be quite important when you're using encryption - there are real world attacks they protect against. I don't know if any of the attacks they guard against practically apply to a filesystem (vs. networked applications), but I don't particularly care to test that - anyways, if you're paying the cost in CPU for encryption you can afford a MAC. Checksumming with just crc32c will probably be allowed, but discouraged.
edit: crc32c will not be allowed with encryption, poly1305 will be used. There's some nasty attacks on stream cyphers you really need a HMAC to deal with.Also no longer planning on adding AES support - it's too vulnerable to side channel attacks when implemented in software. Fortunately, chacha20 is plenty fast - I measure roughly 4 GB/second on my Haswell machine.
Also, getting nonces right is turning out to be ever so much fun.
Comments
None of those links are remotely convincing, they assert that cryptanalysis has been done without saying anything about who, where, or what the results were. Your first link claims to describe a weakness in scrypt but I don't see how you can honestly call what he's describing a weakness. I'm also well aware of the importance of password hashing functions, and secure crypto in general - which is why I'm hesitant to switch to something newer without some solid justification. Argon2 is _quite_ new still, and scrypt is still considered quite secure. I will make the key derivation function pluggable (which I need to do anyways), but I don't expect to be changing the default.
Kent Overstreet
2016-09-21 06:54:56 +0000 UTCI'm not really sure what you are asking for? Argon2 has been subjected to a lot of analysis and it's survived better than Scrypt: <a href="https://131002.net/data/talks/argon2_bsides16.pdf" rel="nofollow noopener" target="_blank">https://131002.net/data/talks/argon2_bsides16.pdf</a> Sure, being better reviewed and being (relatively) future proof against improvements in memory bandwidth are mostly theoretical advantages. But Infosec is a branch of safety engineering and a different cost/benefit analysis applies. I do usability-oriented security research and the weakest link in FDE is the user's password: there's only so much you can do with ~40 bits of entropy. Even if Argon2 only keeps the NSA/FSB/IRGC locked out for an extra day or two ... people's lives are literally at risk. I know that's a bit dramatic, but activists and journalists alike routinely have their equipment confiscated or cloned. I personally interviewed for a job in which one of the interviewers was an Iranian academic phoning in over VoIP and the police raided my friend for running a Tor exit node. The threat model is very real.
2016-09-20 22:26:41 +0000 UTCCheckout password-hashing.net! While Scrypt is more secure against ASICs, GPUs see a big speedup: <a href="http://blog.ircmaxell.com/2014/03/why-i-dont-recommend-scrypt.html" rel="nofollow noopener" target="_blank">http://blog.ircmaxell.com/2014/03/why-i-dont-recommend-scrypt.html</a> The competition led to a lot of great research, including the first formal models of TMTO. Two papers have come out on Argon2, which has led to tweaks. Here's a slide deck covering current status on research regarding Argon2: <a href="https://131002.net/data/talks/argon2_bsides16.pdf" rel="nofollow noopener" target="_blank">https://131002.net/data/talks/argon2_bsides16.pdf</a> But even after all of this analysis, Argon2 is still decidedly better than Scrypt. The analogy is imperfect, but Scrypt is akin to SHA-1 (which is practically secure) and Argon2 is akin to SHA-2 (solid barring major breakthroughs). You should probably reach out to real cryptographers/security analysts about what settings to use.
2016-09-20 18:59:13 +0000 UTCCan you link me anything to back that up? After reading up myself, I'm inclined to stick with scrypt, but if you've got more detailed justification I'll consider it...
Kent Overstreet
2016-09-16 04:36:52 +0000 UTCYou really should be using Argon2 for your KDF. Scrypt has poor TMTO and because the design is overly complex there isn't nearly as much confidence in its construction. Argon2 was designed by professional cryptographers and vetted in a SHA3 style competition. Argon2's memory/time requirements can be adjusted independently, so if memristors come along we can jack up memory usage without worrying about CPU performance.
2016-09-14 21:13:47 +0000 UTCWell whether you store the IV or the nonce, it still needs to be stored in an extent, so no. (I was offering a nonceless solution so you don't need to keep track of counters since you said it was "ever so much fun", presumably sarcastic.) This makes it so you don't need a nonce, you need an IV. The HMAC is the IV and it also gives you a hash to deduplicate with if you plan to implement deduplication.
2016-08-26 17:27:06 +0000 UTCIf I follow, you're trying to solve the problem of what to do when you've got no place to stick a real nonce, as in a conventional non COW filesystem or something like dm crypt. But we don't have that problem - we do have a place to stick a real nonce, with the extent pointer (same place we store data checksums). That means we're able to use a stream cypher + HMAC, and the encryption itself gets a whole lot simpler than in other schemes. I need to finish up the design doc, I actually have encryption fully implemented and working now.
Kent Overstreet
2016-08-26 14:51:42 +0000 UTC