Monday 23 March 2015

Codegate 2015 Good_Crypto: Advanced WEP Cracking

In the Codegate CTF there was challenge based on a wireless network capture. Since I do research on wireless security, I had to solve this one! The challenge was:

We recently intercepted some kind of encrypted traffic, can you help us recover the password?

Update: Due to a crappy javascript programmer there's one line of code missing, but I'm sure you can figure out which one

Step 1: Decrypting the Traffic

Inspection of a few packets in Wireshark hint that there is only one network, and that it uses WEP. To verify this we go to "Statistics > WLAN traffic":


We have a network called cgnetwork which uses WEP. The only client is an Apple device, which communicates with the EfmNetwo router. Let's try to decrypt the traffic using aircrack-ng:


Thanks to all the research on RC4, that wasn't hard :) Note that we got the 40-bit HEX key A4:3D:F6:F3:74. This is different from the WEP passphrase! In practice the WEP passphrase is converted to a HEX key, hence users only need to remember their passphrase. Though there is no official standard, there are quite popular algorithms to convert the WEP passphrase to a 40-bit HEX key. We will discuss this algorithm later!

Using the HEX key we can decrypt all the traffic:


This creates the file file-dec.pcap containing all the decrypted packets. The decrypted traffic can now be further inspected. There is a lot of traffic to m.reddit.com, imgur.com, and phisme.com. It was likely generated so we were able to crack the WEP encryption, since that requires many packets.

The challenge description mentioned "Due to a crappy javascript programmer there's one line of code missing, but I'm sure you can figure out which one". Unfortunately, using NetworkMiner, we were not able to find a useful JavaScript file. With Wireshark we see most of the traffic is TCP. Using "Statistics > Conversation List > TCP" we can list all TCP connections:


As expected port 80 and 443 are heavily used. More interestingly, a few connections also use the non-standard port 5000! Manual inspection reveals that this is HTTP traffic towards a custom router page. On this webpage the user can change the WEP passphrase. The page contains the following JavaScript code (I simplified it a little bit):


So we need to find the WEP passphrase, which consists of lowercase and uppercase characters. And we're given the prefix of the SHA1 of this passphrase. We tried to brute-force small passphrases using the SHA1 hash alone, but without success.


Step 2: Obtaining the Passphrase

Our goal is to find the WEP passphrase. We know its derived 40-bit HEX key, and are given the prefix of its SHA1 hash. Unfortunately we can't simply invert the algorithm which converts a passphrase to a 40-bit HEX key. The problem is that one HEX key corresponds to many passphrases. This is why we are also given the SHA1 prefix: we can iterative over all passphrases corresponding to the HEX key, until we find one with the correct hash.

First we try to generate all passphrases which lead to our HEX key. For a detailed background on how the passphrase is converted to a key, see this presentation, and this stackoverflow question. Essentially the characters of the passphrase are XORed together in four groups, resulting in a seed for a PRNG. The key is constructed by letting the PRNG output five numbers, and taking the third significant byte of each output. An illustration makes this easier to understand:


Notice that the seed of the PRNG defines the relations between the passphrase characters which must hold. So we want to recover the seed value, and to do this we need to know precisely which PRNG is being used. In this case a linear congruential generator (LCG) is used. At each iteration it outputs a 32-bit word, from which the third most significant byte is added to the key. So the most significant byte of the outputs is not used, which is equivalent to generating only 24-bits words at each iteration. Hence the most significant byte of the seed is also unused (and can be any value). Using the appropriate constants for the linear congruential generator we get:


The initial value of X is the seed. We know the third most significant byte of the first five outputs (i.e., the next five values of X after setting the seed). In a sense, each output is truncated to one byte. Hence this is actually a truncated linear congruential generator. While it's possible to efficiently break these, and to determine the seed from the truncated outputs, it would take some time to implement such attacks. Instead, we simply try all possible 2^24 seeds until we get the correct one:


This prints out the value 12766B, so the seed is 6B:76:12:??. Recall that we don't know the last byte because the most significant byte of the PRNG outputs are never used. We know that each byte of the seed is the XOR of an unknown amount of [A-Za-z] characters. By looking at the ASCII table, we can derive that the byte 0x12 can only be generated by an even number of [A-Za-z] characters. Indeed, if an uneven number of [A-Za-z] characters were used, the resulting XOR value would always have more of its high order bits set. Similarly, the other two bytes 0x6B and 0x76 correspond to an uneven number of characters. This leaks information about the length of the password (look at the 40-bit key generator illustration to realize this). In particular the length can only be 10, 18, 26, and so on. Any length higher than 10 can't be brute-forced, so we assume the length is 10.

Using the seed value, and that the length of the passphrase P is 10, we get the following relations must hold between the differen characters:


This means we only need to brute-force 7 characters out of 10. The inferred characters (according to the relations above) must also be either uppercase or lowercase characters. We calculate the SHA1 of every guess, and compare it to the hash in the JavaScript code. Using 52 parallel instances this took less than an hour to brute-force. The passphrase was:

cgwepkeyxz

My final remark is that the bytes 0x6B and 0x76 implied that there were at least two lowercase characters. In hindsight, this could have been used to guess that all other characters were lowercase too. But I don't like guessing, and we had some powerful servers that were idle anyway... ;)