Part one in a series on porting the Node.js crypto library to the browser, this one focuses on porting the functionality of crypto.createCipher, crypto.createCipheriv, crypto.createDecipher, and crypto.createDecipheriv, which was done in the library browserify-aes.
Some background on the Node.js crypto library
crypto is one of the first Node.js standard libraries to have been written, so it is not quite idiomatic with the others. The ciphers are streams, but since they predate the stream library, they also have synchronous update and final methods that work similarly to write and end, but simply return any data they produce instead of passing it into the stream.
Almost all crypto functions are CPU-bound instead of IO-bound, so simply making them non-blocking wouldn’t be much help on a single core machine. That, and making the operations async would add latency.
The fact that the cipher operations are all synchronous also rules out any possibility of using the WebCrypto API. I will leave it as an exercise to the reader to write a crypto library with its own API which efficiently makes use of Node.js and WebCrypto in an isomorphic API.
Lastly, the Node.js crypto library is really a thin wrapper around OpenSSL. Yes, this is the same OpenSSL that has been in the news a lot lately, and I don’t want to really pile onto that train. But let me just say that OpenSSL is a typical crypto library, and crypto libraries as a universal rule (with a few exceptions) have terrible documentation. So I had to go and spelunk through the OpenSSL code much more then I would have liked, in order to figure out how things worked.
XOR
In case you aren’t familiar with the term, XOR or xoring refers to the eXclusive OR operation to combine two pieces of data. Represented sometimes by a caret (^), it means that for a ^ b, if both a and b are false, then the result is false. However, if both a and b are true, then the result is also false. Only in the case where exactly one of a and b are true, but the other is false, does XOR evaluate to true.
We can construct a truth table as follows:
|t|f|
t|f|t|
f|t|f|
We can do this with numbers by turning them into binary representations and doing the XOR operation individually on the bits. So to XOR 240 and 170, we’d get 70 because:
1111 0000 (240)
^ 1010 1010 (170)
= 0101 1010 (70)
This is very helpful in cryptography, because when you take a piece of binary data a, and XOR it with another piece b, you go through b, and anywhere there is a 1, you flip the bit in a (and don’t if there is a zero in b). You can think of b as being a mask over a (though masking refers to something else), which hides the contents of a behind b by mixing them together.
AES
Implementing the AES block cipher was actually really easy. There are quite a few open source JavaScript AES implementations out there, but if there is one thing that all JavaScript crypto libraries have in common, it is that they all have their own hand-rolled buffer implementation. I ended up basing my version on the one from triple sec, which is based on the one from crypto-js.
AES is a block cipher which takes blocks of 128 bits (16 bytes) and a key with a length of 128/192/256 bits (16/24/32 bytes) and outputs 16 bytes of encrypted data. And it can also do the opposite, i.e. take a key with 16 bytes of encrypted data and output plaintext. If you want more information, check out out the Wikipedia article on AES.
Modes of operation
A block cipher by itself is actually rather useless, because it is determanistic. In other words, the same block of 16 bytes and the same key will always output the same cipher text. Why this is bad is illustrated quite well on the Wikipedia page on modes of operation. Basically if you just encrypt with AES (the technical term for which is encrypting using the “Electronic Code Book,” or ECB, mode of operation), it leaks information about the structure of the data, which can be used to make powerful inferences about the message.
To do this, you need a mode of operation and a second piece of data, besides a key, called either a nonce or an initialization vector. Both nonces and IVs are used in the same way, but they differ in that the security of an IV depends on it being random, while the security of a nonce only depends on it being never reused with the same key.
Before we get to modes of operation, a brief digression into the first big roadblock I hit.
Key Derivation
Disclaimer: human readable passwords are generally an a very bad way of creating keys for ciphers, you almost certainly want to use some other way to create a key such as something based on public key encryption, I will discuss some of those in other installments.
So AES takes a key of exactly 16, 24, or 32 bytes and an IV of 12-16 bytes (or zero for Electronic Code Book). But crypto.createCipher takes a string password of arbitrary length, so how does Node.js create the key it needs? Simple: it uses the OpenSSL function EVP_BytesToKey, and it does so with MD5 as the digest algorithm, no salt, and an iteration count of 1. First off, even without knowing how EVP_BytesToKey works internally, just by knowing the parameters that Node sends to it, you can take a guess that the keys it derives are very weak.
What do I mean by weak keys? Basically, if somebody wanted to crack data stored with keys derived this way, a few things would make it very easy.
First off, the keys are not derived with a salt. A salt is a random piece of data that is not secret, but used to prevent the derived key from being deterministic. Essentially, you generate a random number which you use, in addition to the password, to derive the key.
The salt doesn’t need to be secret, and you can send it to the recipient in the clear. The benefit of this is that, without the salt, an adversary can simply hash a dictionary of common passwords ahead of time and then just go through them one at a time and try to decrypt your text. This is called a rainbow table, and constructing one wouldn’t be too much of a problem for an attacker, unless you also had the foresight to increase the number of iterations.
The iterations means that, instead of hashing a piece of data once, you do it 10 times, or 100, or 1000 times. This technique, combined with the salt, means that suddenly going through a dictionary of common passwords requires much more work.
So after some trial and error and looking at the code, it turned out the correct formula is basically:
This is simplified; the full version is here. If you’re interested, PBKDF2 (which is much much more secure) works like this:
This is available in Node as crypto.pbkdf2. So if you ever encrypt things in Node with a password, you should never use createCipher; instead you should use createCipheriv and use PBKDF2 (or a similar algorithm like scrypt) to generate the key and IV. The correct IV length for all the modes of operation is 16 bytes, except for Galois/Counter Mode, or GCM, which is 12, and ECB, which is 0. (We’ll discuss these in the next section.) If you don’t want to roll your own, I have a library that does it for you.
Modes Of Operation, continued
The modes of operation can be divided broadly into three groups:
- Block cipher
- Self-synchronizing
- Stream cipher
Let’s go through them one by one.
Block cipher modes of operations
These modes only work on plaintext, where the length is an exact multiple of 16 bytes. In order to deal with messages that are not, the end of the message is padded up to a multiple of 16 bytes using a padding standard known as PKCS7, which is defined here and is very simple.
PKCS7 just requires you to take the number of bytes of padding you would need to add in order to get a multiple of 16 (and if it is already a multiple of 16, you add a full 16 bytes of padding) and then you add that number of bytes each containing that number. For instance, if you have 15 bytes of data, you’d add a single byte containing the number 1, if you had 25 bytes of data you’d add 7 bytes each containing the number 7, and if you had 32 bytes of data you’d add 16 more each containing the number 16. Block cipher modes that Node supports are:
- Electronic Code Book (ECB)
- Cipher Block Chaining (CBC)
Self-synchronizing stream cipher modes of operations
Stream ciphers are ciphers that can work on arbitrarily-sized inputs without padding. So stream cipher modes of operation are actually modes that transform AES into a stream cipher. The self-synchronizing ones can lose a certain amount of the encrypted message and still decrypt (most of) the rest of it. The only one that Node supports is called Cipher Feedback (FB), but it has three variants:
- Cipher Feedback (CFB)
- Cipher Feedback 8 bit (CFB8)
- Cipher Feedback 1 bit (CFB1)
Stream cipher modes of operation
These transform AES into a normal (synchronous) stream cipher, which is basically just a deterministic random number generator which can be XORed with the plaintext to create the cipher text. Node has three:
- Output feedback (OFB)
- Counter (CTR)
- Galois/Counter Mode (GCM)
Electronic Code Book (ECB)
We’ve discussed this before, but ECB is less of a mode of operation then a lack of mode of operation. It is good for three things:
- testing
- getting yourself hacked
- implementing other modes
For all intents and purposes, you should just pretend this mode doesn’t exist and move along.
Cipher Block Chaining
This is a very simple mode of operation which can be implemented entirely from the Wikipedia entry – heck, entirely from the picture. It takes the previous cipher text, XORs it to the plaintext and then encrypts that. For the first block, which doesn’t have a previous piece of cipher text, it is XORed with the IV.
This method has some advantages:
- It is really simple.
- If you lose a block of encrypted text or a block of cipher text gets curruped, the subsequent block will decrypt wrong but the the rest will be fine.
- You can parallelize decryption of the message.
This method has a couple downsides:
- You can’t parallelize encryption.
- It is vulnerable to a known plaintext attack in certain circumstances. When used in a streaming context, where an attacker can know the cipher text of a block before the plaintext for the next block is sent to be decrypted, then the attacker can figure out if a specific plaintext encrypts into a specific cipher text. In other words, an attacker can ask, “Does this block decrypt to this?” That might not sound like much, but this formed the basis of the BEAST attack, so you should avoid this mode if possible.
Cipher feedback (CFB)
This mode is very similar to CBC, but instead of XORing the plaintext to the previous cipher text and then encrypting, it encrypts the previous ciphertext and then XORs it to plaintext to create the next cipher text.
The first benefit of this over CBC is that you don’t need to have a full block of plaintext to encrypt it. You have the 16 bytes of data you got from encrypting the last block or IV. If you just get 10 bytes of plaintext, you XOR that with the first 10 bytes of the encrypted data (and save it). If you then get 10 more, you XOR the first 6 bytes with the last 6 bytes of the encrypted data, concat that to the end of the 10 bytes of cipher text you already made, encrypt that and XOR the first 4 bytes with the last 4 bytes of the previous data.
Cipher feedback 8 bit (CFB8)
Regular Cipher feedback is like CBC in that you can lose a block of cipher text and, while the next block would decrypt incorrectly, the one after wouldn’t. CFB8 allows you to lose any number of bytes of cipher text and still eventually decipher things correctly.
The way it works is that you encrypt the IV and create a 16-byte buffer, which we will call your “pad,” then for every byte of data you want to:
- XOR the byte to the first byte of the pad,
- take that output byte and shift it into the last position of the pad, shifting all the other bytes over one position, shifting the first byte out,
- encrypt that to make the new pad.
So if you have a pad:
data: a, b, c, d, e, f, g
position: 1, 2, 3, 5, 6, 7, 8
And you get a piece of data:
data: h, e, l, l, o
position: 1, 2, 3, 5, 6
You XOR h and a to get z. Then you shift the buffer to the left and shift in z:
before: a, b, c, d, e, f, g
<---a <---z
after: b, c, d, e, f, g, z
position: 1, 2, 3, 5, 6, 7, 8
Then encrypt that to get the next pad. The upside is that for mangled data, you are much more resilient. But the downside is that you have to encrypt once per byte instead of once per block – 16 times more often.
Cipher feedback 1 bit (CFB1)
This is like CFB8 except instead of per-byte, it is per-bit. You XOR the first bit, shift the result in on the right and the old one out on the left, and then encrypt again. While this is obviously even more resilient, it means you are encrypting 128 times more often than most other ciphers, which has obvious performance implications.
Output feedback (OFB)
This is a simple stream cipher that works by encrypting the nonce, XORing that with the first 16 bytes of plaintext, then reencrypting the nonce and XORing that with the second block. This is another one you can implement entirly from the picture on Wikipedia.
OFB tends not to to be used a lot compared to other streaming modes, because unlike the next two streaming modes, you can’t do random access. If you know some data is 16 * 500 bytes in, you have to encrypt the nonce 500 times to then decrypt that data.
Note that, for this mode and the rest, security depends on the nonce not being reused, but it doesn’t matter on it being predictable. So if you have to encrypt a series of messages with OFB, you can just increment the nonces for each message and this would be more secure for a large number of messages because you’d avoid accidentally randomly getting the same nonce twice (see birthday problem).
Counter (CTR)
There are multiple ways of doing a couter mode of operation (and the next one, Galois/Counter Mode, does it differently). I was able to figure out how OpenSSL did it with the following code:
Basically it takes the IV and encrypts it, XORs that with the first 16 bytes of the plaintext. Then it adds 1 (treating the IV as an unsigned 128-bit number and wrapping around) and does it again. It’s dead simple, and allows random access to decrypt data. Although, you can’t use the trick of adding one to the IV for each message to avoid accidentally repeating IVs. That being said, if all of your messages are less then 256 GB, then you can just get away with treating the IV as a list of 4 32-bit numbers and increment the 3rd one (instead of the 4th).
Galois/Counter Mode (GCM)
As suggested by the slash, GCM is actually two different things: it is a mode of operation in counter mode and a Galois message authenticator.
The counter mode in GCM is slightly different from how it is in CTR mode. Instead of just taking a 16-byte IV and incrementing it like a 128-bit number, it takes a 12-byte IV and then adds a 32-bit counter (starting at 2). In other words, it acts like CTR if all the IVS had to end in 0002. This is so you can encrypt subsequent messages with incrementing IVs.
GCM is not just a regular mode of operation, but an “authenticated” mode. This means that not only does it make sure that nobody can read your message, but it also makes sure that nobody has tampered with your message.
It does this by creating a Message Authentication Code (MAC). A MAC is similar to a hash, but requires a key to calculate it. You can also think of it as an encryption scheme that can’t decrypt the cipher text it produces, which is always 16 bytes.
In addition to authenticating the text it encrypted, it also authenticates other data, such as metadata or the destination in a network request. For instance, authenticating the time a message was sent would be useful, as it would prevent somebody from taking that message and later resending it.
So GCM first takes a 16-byte block that is all zeroes, and then encrypts this and uses the result as the key for the MAC it calculates using the GHASH algorithm. It then appends 0002 to the end of the IV and encrypts the message just like CTR mode, except after it encrypts it it updates the GHASH with the encrypted data.
When the message is done, if it wasn’t an even multiple of 16, the GHASH is updated with zeroes to get it up to an even multiple (although it doesn’t pad with a full block if it was already an even multiple). Then the GHASH is updated with a block containing the length of the additional authenticated data and the length of the encrypted data as 64-bit numbers, and then the result of the GHASH is retrieved.
Remember how we started with the counter set to 2? We take the original IV and append 0001, encrypt that and XOR it with the result of the GHASH. This is our “tag” that somebody can send along with the encrypted message to verify that it has not been tampered with.
The GHASH algorithm involves taking that initial encrypted block of zeroes and calling it our key, then creating a 16-byte buffer filled with zeroes and calling it our state. When we update with a block, we XOR that block to the state and then create a new state by multiplying the current state to our key over a Galois field. This is kind of complicated; I don’t 100% understand it myself, so I based mine on the nicely BSD-licensed version in the Stanford Javascript Crypto Library (note the homebrewed buffer object). If you really want to know how it works, check out section 6.3 of the spec.
The GHASH function used for message authentication is both the best and worst part of GCM. Apparently Galois field multiplication is very fast when implemented in hardware (supported by Intel chips) but not so fast in software.
Side note: I have a JavaScript library for Chacha20/poly1305, which means that if you are encrypting and decrypting on a computer powered by an Intel chip (i.e. not your phone) and you are using a low-level language, then it will be very fast. In all high-level languages (e.g. decrypting it in your browser) or computers with non-Intel chips (e.g. your ARM-powered phone), it will be much slower.
There are two other downsides of the GCM mode, the first being that it only works in Node 0.11 or higher. Lower version of Node can use HMACs, which are message authentication codes made out of a hash algorithm.
The last issue is that authenticated encryption doesn’t really work with the Node streaming model of ciphers. When using GCM to decrypt, you have access to the decrypted data immediately, but don’t know if it’s valid until you get all of the ciphertext.
To work around those issues, here is a simple authenticated encryption mode that runs on multiple versions of Node and prevents you from using invalid data:
As for stream-friendly authenticated encryption, that is a whole different discussion for another day, but it seems to be an outstanding problem, I’ve been working on ideas in this library.
Other Modes
There is actually one more cipher that I didn’t include called XTS. I didn’t include it because I had trouble finding documentation on it, and what I could find about it mainly discussed how it wasn’t very good. Feel free to open an issue if you can find documentation (specifically about how the tweaking works), or even better, send a pull request.
Wrapping Up
Overall if there is one thing I got out of this is that a block cipher is only about 10% of what you need to successfully build an encryption system. In order to take plaintext and a password/secret, turns it into cipher text, and allow it to be round-tripped, it also needs:
- A method to come up with a key of the appropriate length and an IV. In addition to using algorithms like PBKDF2 and a salt to derive a key from a password, there are other methods of securely coming up with a key and IV. For instance, public key techniques like Diffie-Hellman and RSA avoid the need for a password entirely. Getting this wrong will cause a crypto system to be hackable; see the Debian key bug and WEP for systems where predictable keys led to hacking.
- A mode of operation to make it actually useful for encrypting data. Picking the wrong one can lead to hacks like BEAST. Stream ciphers like RC4 (broken for other reasons), as well as ChaCha20 and Salsa20, avoid this problem entirely.
- An authentication method for making sure the data that was received was also the data sent. Getting this wrong, e.g. by using or decrypting data before you authenticate it, usually does not cause breaches by itself, but instead opens up other breaches that would have been avoided if the validity of the data was checked earlier. For instance, the POODLE attack, which strictly speaking was caused by a bug with bad padding, wouldn’t have been possible if the validity of the data had been checked.
I’d like to thank Nolan Lawson for proof reading this and Dominic Tarr for starting crypto-browserify (and putting up with all my pulls). If you’ve managed to stick around this long, feel free to ping me on Twitter if you find any errors. Also if you find any problems with browserify-aes, open an issue. Join me next time for Diffie-Hellman!