Wednesday, 24 September 2014

CSAW 2014: xorcise challenge

Last weekend our CTF team participated in CSAW. One of the challenges was particularly interesting, and this blog post gives a (somewhat) detailed overview on how to solve it. The source code and binary executable were given. This write-up is also available on the awesome write-ups repository.

Length checking vulnerabilities: Part One

By inspecting the source code we see the binary is a service which listens on port 24001. New clients are handled in int process_connection(int sockfd). This function reads one single packet, which is expected to be in the following format:
struct cipher_data {
    /** Header: Unencrypted and unauthenticated */
    uint8_t length; /** Length of the bytes array */
    uint8_t key[8];
    /** Payload: Encrypted and (partly) authenticated */
    uint8_t bytes[128];
As the comments indicate, the header is sent unencrypted. In particular this header includes an 8-byte key (which is chosen by the client). Roughly speaking the bytes array is XORed with the key in blocks of 8 bytes. The function doing this decryption is decipher(data, output), which will be investigated in more detail later on (because it contains a vulnerability). The length field the header contains the actual size of the bytes array. This must be smaller or equal to 128. An attempt is made to assure the given length is valid:
cipher_data encrypted;
ssize_t bytes_read = recv(sockfd, (uint8_t *)&encrypted, sizeof(encrypted), 0);
if (encrypted.length > bytes_read)
    return -1;
However this check is flawed. Variable bytes_read is the length of the packet including the header, while the program will treat the length field as the size of the bytes array. This means that as an attacker we can force the length to be bigger than 128.
The second vulnerability is in decipher(data, output). This function decrypts the bytes array using the key. Somewhat simplified we have the following code:
uint32_t decipher(cipher_data *data, uint8_t *output)
    uint8_t buf[MAX_BLOCKS * BLOCK_SIZE];    
    uint32_t loop, block_index;

    memcpy(buf, data->bytes, sizeof(buf));

    if ((data->length / BLOCK_SIZE) > MAX_BLOCKS)
        data->length = BLOCK_SIZE * MAX_BLOCKS;

    // Block-decryption loop
    for (loop = 0; loop < data->length; loop += 8)
        for (block_index = 0; block_index < 8; ++block_index)
            buf[loop+block_index] ^= (0x8F ^ data->key[block_index]);

    memcpy(output, buf, sizeof(buf));
Note that data->bytes is copied to a local buffer, and this local buffer is then processed. The first if-test is an attempt to assure that data->length is not bigger than the local buffer. However this checked is flawed because in (data->length / BLOCK_SIZE) the intermediate result will be rounded down. In particular, the following value will pass the length check:
data->length = BLOCK_SIZE * MAX_BLOCKS + (BLOCK_SIZE - 1) = 135
And due to the first length check vulnerability, we know that data->length can indeed contain such values. The consequence of both vulnerabilities means that we can force the block-decryption loop to decrypt an extra block of 8-bytes. Since the local buffer is too small for this, we are capable of modifying local variables placed after the buffer. In particular we can overwrite loop and block_index.

Length checking vulnerabilities: Part Two?

Interestingly, decipher(data, output) is the only function that uses the length field in the header. All other functions are coded in such a way that knowing this length is not required. This causes another peculiar observation: when setting length to zero, no decryption takes place, and the packet is processed as-is. This observation is not required to solve the challenge though (but it does make it easier to construct packets).

Authentication and Commands

The above two vulnerabilities are sufficient to exploit the challenge. However we had some problems doing this, and instead opted for another approach. Our approach relies on additional functionality of the challenge: letting it execute commands. We can execute three commands: (1) getting the current server time; (2) reading an arbitrary file; (3) executing a system command. The latter two commands required the packet to be authenticated.

The client must provide a checksum equal to H(H(password||key) || H(data||password)) for the packet to be authenticated. Here password is a secret loaded at startup, and key and data are chosen by the client. The function H() seems to be a custom (and insecure?) hash function.


We use both length-check vulnerabilities to read the secret password. Once we have this, we can execute arbitrary system commands on the server, and exploitation becomes trivial.
In the decipher(data, output) function we overwrite the return address. This is done by first overwriting the local variable loop, such that the next xor-decrypt operation will point to the return address. We prevent other variables from being overwritten by XORing with zero. Recall that we can XOR 8 bytes in total, and in particular we XOR these bytes with the key in the header of the packet.
Where will we point the return address to? First observe that, because of the last mempcy call, the eax register contains a pointer to the local buffer when returning from the function. Hence the content pointed to by eax is under our control. If we now exploit the binary for interesting gadgets, we spot the following:
.text:08049290                 mov     eax, [ebp+packet]
.text:08049293                 add     eax, 8
.text:08049296                 sub     esp, 8
.text:08049299                 push    eax             ; filename
.text:0804929A                 push    [ebp+fd]        ; fd
.text:0804929D                 call    read_file
This is the code that calls read_file(sockfd, name). When returning from decipher the ebp register is restored, hence it correctly points to the local variables. However, variable packet is not yet initialized by the program, hence we can't use it. Instead we redirect code to push eax; push [ebp+fd]. Since eax points to the beginning of the buffer this code will successfully be executed. Putting the stringpassword.txt in the start of the buffer will now make the service print out the password pass123. We can now send authenticated commands and execute arbitrary shell commands. The flag was in the fileflag.txt in the same directory as password.txt (i.e. the current working directory).
Remark: the problem we encountered when directly calling system(cmd) was that it would overwrite the local buffer used in decipher (since it was saved on the stack). Recall that the pointer to this buffer was stored in eax. Luckily, in the read_file function, the stack is used in such a way so this didn't occur.

Friday, 30 May 2014

ApBleed: Heartbleed over WPA1/2 Enterprise

Tl;dr: ApBleed is my proof-of-concept to test heartbleed against wireless networks. Patches welcome.

Once the heartbleed vulnerability in OpenSSL was made public, most focused on its applicability to web servers. Other targets such as SVN servers, VPNs, etc. were also mentioned. However, there was little public discussion about the impact of heartbleed on wireless networks.

Nevertheless, it was clear heartbleed was also exploitable against WPA1/2 enterprise networks, even if it wasn't discussed as publicly as other heartbleed stories. Normally enterprise networks use one of the many EAP methods inside an SSL tunnel to authenticate users. If OpenSSL is being used, this SSL tunnel might be vulnerable to heartbleed.

Of particular interest are networks like eduroam. Eduroam is a world-wide roaming access service between research institutions. This means that I can go to a different country, see an eduroam hotspot, and connect to it using the credentials of my home institution. What's interesting is how my credentials are checked. It's done by setting up an SSL tunnel to the RADIUS server of my own institution. The eduroam network will take care of the necessary packet routing. The image below illustrates this (taken from eduroam website):

Let's assume I'm a student at and currently visiting an institution of If I now connect to an eduroam hotspot, an SSL tunnel will be set up between my device and the RADIUS server of (i.e. my own institution). If my credentials are valid, will notify that I should be allowed on the network.

Why is this interesting for an attacker? Because the SSL tunnel is set up before user authentication, and even before you are assigned an IP. The only thing known about you is your MAC address and your "home" institution (i.e. the realm defining your home institution). Of course, an attacker can spoof both the MAC address and the home institution. The attacker only has to be within range of an eduroam hotspot, and he or she can pick any eduroam hotspot at will. The eduroam network will then forward packets to the RADIUS server the attacker specified (i.e. the realm spoofed by the attacker). This is what allows a user to directly set up an SSL tunnel with the radius server of their home institution. However, this means we can anonymously connect to any institution we want!

The guys from eduroam quickly responded to this. Hours after heartbleed got public they posted a warning on their website. One day after heartbleed, with some help from the HostApd mailing list, they had a working proof of concept to test whether institutions were vulnerable. To quote eduroam:
Following up on the heartbleed vulnerability in OpenSSL: it is confirmed that EAP-authentication in RADIUS servers is vulnerable to the attack. It is therefore extremely important to upgrade OpenSSL and restart RADIUS services as soon as possible.
The attack is feasible from any public eduroam hotspot, not just your RADIUS peers.
Federation-level admins in Europe will receive notice from the eduroam OT with a list of vulnerable realms. [..]
While the tools are not publicly available for security reasons, we're providing the service of scanning a realm list for global federation admins on their request. [..]
Though I focus on eduroam, every networking using enterprise authentication with an EAP method inside an SSL tunnel might be vulnerable to heartbleed. Nevertheless, eduroam is more interesting than your ordinary network, because it allows you to anonymously connect to any institution which has joined the eduroam federation.

Few public proof-of-concepts were available (if any at all). I can think of three reasons for this. First, not everyone was aware enterprise authentication was vulnerable to heartbleed. Second, it's a little bit harder to make a working proof-of-concept against wireless networks. Third reason might have to do with the impact of public exploits. Because authentication (and hence the SSL tunnel) takes place before getting access to a network, it's not possible to consult a certificate revocation list (CRL) or query the certificate status online (with OCSP) [eduroam]. So revoking certificates is not possible.

However, recently proof-of-concepts are being released. So it's time to release my own proof-of-concept as well :) I modified wpa_supplicant so, once connected to an AP and ready to talk to the radius server, it opens a local socket. This means you can now connect to this socket and pretend it's the RADIUS server. The big advantage of this approach is that you can use any existing heartbleed tool to test the radius server. Simply let the heartbleed tool connect to the socket created by my proof-of-concept. The code has not been tested thoroughly, but worked in all my small experiments. Get a copy of my PoC (called ApBleed) at github. Patches are welcome!

Some example outputs of running my tool:

Using existing heartbleed tools to connect to the radius server:

Sunday, 15 December 2013

Reversing and Exploiting ARM Binaries: rwthCTF Trafman

As ARM is becoming more and more popular, the need to reverse engineer ARM binaries is increasing as well. For example, nearly all mobile phones have at least one ARM processor. In this post I show how to set up a virtual ARM environment using Qemu, give an introduction to ARM assembly (while highlighting the differences with x86), show how to reverse ARM binaries, and finally demonstrate how to write basic exploits for ARM. We will use the trafman challenge of rwthCTF as an example.

Virtual ARM Environment

To start we need an environment capable of running ARM binaries. Since I didn't have an ARM machine I created a virtual ARM environment using Qemu. Qemu is similar to VirtualBox or VMWare, except that it can support multiple architectures. This allows you the emulate ARM on your default x86 or x64 machine.

First we need to know which ARM architecture to pick. Most Linux distributions support two architectures: armel and armhf. Armel supports the ARMv4 instruction set and emulates floating point operations in software, while armhf supports the ARMv7 instruction set and uses hardware floating point operations. At least that's the case for Debian, Ubuntu uses the term "armel" differently [Ubuntu FAQ, ARM FOSDEM]. In this post I will stick to Debian. Though I haven't tested this myself, it should be possible to run armel binaries on an armhf system (this should be true for both Ubuntu and Debian). For completeness, there were also the 'arm' and 'armeb' architectures, but they are no longer supported and don't appear to be used anymore. We will use the armhf architecture, in particular Debian Wheezy armhf.

Unlike x86/PC systems, ARM systems typically don't have a BIOS. This means nothing initializes the hardware, reads the first sector of the disk, or finds where to execute code from. Instead, Qemu allows you to start a system using a "-kernel" option. It loads the given binary into memory at 0x6001000 and uses a kernel calling convention to pass the commandline and the location of initrd [#IO ARM Challenge]. In practice this means we'll always have to start Qemu using a -kernel and -initrd option.

With that background we're ready to install Debian Wheezy armhf using Qemu. First create an empty directory for the virtual environment, create a virtual harddisk for Qemu, and download the appropriate initrd and vmlinux from the Debian FTP:
Note that vexpress stands for the system we will emulate, namely a Versatile Express board. We will emulate it with a cortex-a9 processor, so we pass vexpress-a9 as an argument to Qemu. Now start the installer under Qemu as follows:
  • qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.gz -append "root=/dev/mmcblk0" -drive if=sd,cache=unsafe,file=hda.img
Follow the instructions. A graphical environment is not required. After the installation we have to extract the installed initrd image, otherwise (if you use the downloaded initrd) the installer will again boot. To mount the virtual filesystem we first have to find the proper offset. Execute "file hda.img". The output should include something like "startsector 2048". The offset is now 512*2048. In case your offset is different than 2048 you will have to update the numbers in the next commands:
  • mkdir mountdir
  • mount -o loop,offset=$((512*2048)) hda.img mountdir/
  • cp mountdir/initrd.img-3.2.0-4-vexpress .
  • umount mountdir/
And now you can boot your virtual ARM environment with:

  • qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.img-3.2.0-4-vexpress -append "root=/dev/mmcblk0p2" -drive if=sd,cache=unsafe,file=hda.img
You should now have a working Qemu environment.

One warning: Qemu is still an emulator. When testing rare edge cases of exotic ARM instructions (which do not occur much in practice) there's always a chance the emulator is wrong.

Network Interface in Qemu

If the program you want to reverse engineer is a network service, it is convenient to be able to connect to it from your host system. This allows you to use tools on your host system to connect and exploit the binary (while debugging it in the guest system). This is especially convenient if Qemu is slow.

To enable network support we must specify a network device (NIC) that will be emulated. And we must specify the network backend that Qemu should use to interact with the emulated NIC. By default Qemu emulates an Intel e1000 PCI card with a user-mode network stack. User mode networking is great for allowing access to network resources, including the Internet. By default, however, it acts as a firewall and does not permit any incoming traffic. It also doesn't support protocols other than TCP and UDP. For example, ping won't work [WikiBooks].

The first option to allow incoming traffic is port redirection. This allows you to redirect a port on the host OS to a port on the guest OS. For debugging a simple service this is sufficient. As an example, redirecting TCP port 6666 on the host to port 8080 on the guest is done as follows:
qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.img-3.2.0-4-vexpress -append "root=/dev/mmcblk0p2" -drive if=sd,cache=unsafe,file=hda.img -redir tcp:6666::8080
You can now run a service on port 8080 in the Qemu guest, and connect to it from the host using port 6666. Multiple ports can be redirected. Port redirection can also be used to share folders between the guest and the host [WikiBooks]. The downside is that you cannot dynamically add new redirects once Qemu is running. Additionally, only TCP and UDP are supported.

The second option is to create a TAP interface [Tun/Tap IntroQemuWiki]. It offers very good performance and can be configured to create virtually any type of network topology. This creates a new virtual interface on your host system, and you can use this interface to communicate with the guest. To start Qemu using a TAP interface as network backend, use the following command:
qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.img-3.2.0-4-vexpress -append "root=/dev/mmcblk0p2" -drive if=sd,cache=unsafe,file=hda.img -net nic -net tap,ifname=qtap0
Unfortunately you will have to manually configure IP addresses (e.g. setup a DHCP server, configure DNS, configure IP forwarding/NAT, etc). I will not go into detail here. If you don't need an internet connection and only need basic communication between the host and guest, execute the following commands in the guest:
  • ip addr add dev eth0
  • route add default gw eth0
I prefer port redirection because it's easier to configure and is usually sufficient to get the job done.

Introduction to ARM Assembly

Simplified, an ARM processor can be in two modes: ARM mode and Thumb mode. When the CPU is in ARM mode it executes 32 bit ARM instructions, and when it is in Thumb mode it executes mixed 16- and 32-bit Thumb instructions. The purpose of Thumb mode is to improve code density. How to switch between ARM and Thumb mode will be explained shortly.

In both ARM and Thumb mode we have access to 16 32-bit registers called r0..r15. Special registers are:
  • r12: intra procedure register (ip)
  • r13: stack pointer (sp)
  • r14: link register (lr)
  • r15: program counter (pc)
Register r12 is sometimes used as an Intra Procedure (ip) call scratch register, meaning it holds intermediate values when a subroutine is being called. So if you see the ip register in a disassembly listing, it does not stand for instruction pointer. Finally, subroutines must preserve the contents of r4..r11 and the stack pointer r13. A few short examples:
sub sp, #8        ; sp -= 8  // allocate stack memory
add r7, sp, #0    ; r7 = sp + 0
add r7, #0        ; r7 += 0
str r0, [r7, #4]  ; *(r7 + 4) = r0 // dereference ptr
str r0, [r7, #0]  ; *(r7 + 0) = r0
nop               ; No Operation
movw r0, #33848   ; r0[0:16]  = 0x8438 // move to bottom
movt r0, #1       ; r0[16:32] = 0x1    // move to top
                  ; Now r0 contains 0x18438
Here r7 is used as a frame pointer to access local variables or function arguments. Also note that certain instructions (like add) can have three arguments (unlike the common two arguments in x86). In order to move a 32bit word in a register we need two instructions: one to set the bottom half of the register, and one to set the upper half of the register. This is a consequence of the RISC architecture of ARM: we have simple instructions that execute quickly, but must be more verbose to get stuff done.

What we called jumps in x86 lingo are called branches in ARM. For example "jmp offset" on x86 becomes "b addr" in ARM. Function calls from x86 are also called branches, in particular "call func" from x86 becomes "bl addr" in ARM. Here BL stands for Branch and Link, meaning the program counter r15 is set to addr and the link register r14 is set to the return address. This highlights an important difference with x86: in ARM the return address is not pushed on the stack by the caller. Instead, it it always saved in the link register (lr). The callee is responsible for saving the return address (generally the callee will save it on the stack).

The instructions bxblx, and bjx can be used to switch processor mode from ARM to Thumb (or reverse). There are two use cases: the target is a label or the target is a register [ARM]. In the case the target is a label, the instruction is of the form "blx addr", and the mode is always switched. In case the target is specified using a register, e.g. "blx lr", the mode will be set according to the least significant bit (LSB) of the register. Note that all ARM and Thumb instructions must be 16-bit (two bytes) aligned, thus the least significant bit (LSB) of function addresses is in fact always zero. An even address means the target code is an ARM instruction, an uneven address means it is Thumb code. For example, when the register contains the value 0x8001 the processor will switch to Thumb mode and start executing at 0x8000. The same is true when returning from a function: if the return address is even it will switch to ARM, if it is uneven it will switch to Thumb mode. Returning can be done using "pop {pc}" or "bx lr". Both instructions will set the processor context accordingly. Note that branch instructions without an "X" in them do not change processor mode.

One practical consequence of ARM vs. Thumb addresses is that we often see references to "symbol+1", indicating that the symbol is referring to a function in Thumb mode. Additionally we have to use correct addresses in our exploit: Your debugger might say a function starts at address 0x8000, but if it is a function using Thumb instructions we must actually use the address 0x8001 (depending on context). Finally we note that we can inspect the current mode in gdb by using the "i r cpsr" command. Bit 0x20 will be set if we are in Thumb mode, i.e. Thumb mode is set if (cpsr & 0x20) != 0.

Let's end with the second half of the previous example:
blx 0x82ec      ; branch to 0x82ec (and switch mode)
bl 0x8398       ; branch to 0x8298 (don't switch mode)
add r12, r12, #8, 20     ; r12 = 0x8000
                ; 0x8000 = RotateRight(8, 2*20) [ref]
mov sp, r7      ; set stack pointer to original value
pop {r7, pc}    ; restore r7 and return to caller
The first instruction branches to address 0x82ec and switches the current processor mode, with the return address saved in the lr register. In case we were executing Thumb code this return address will be uneven, otherwise it will be even. The second instruction branches to 0x8398 without switching mode (there is no x in the branch instruction). Hence the function will be executed in the mode the processor is currently in. We also have an interesting add instruction: it adds RotateRight(8, 2*20) to r12 (which is equal to 0x8000). For more background on how this constant is encoded read the ARM documentation.

If you want to understand ARM in more detail, a fun way to learn it is by making an ARM intro challenge. Essentially that means displaying cool graphics using only ARM code. The full background on how to get started is on #IO STS. Perhaps luckily, a detailed understanding is not required to reverse engineer a program and follow this guide.

Reversing the Trafman Challenge

As an example we will reverse engineer the Trafman challenge from rwthCTF (for the binary to work properly you need to create a directory called "db" next to the binary). We open the binary in IDA and see that it failed to disassemble the first instructions:

This is because start is located at the uneven address 0x880D. In other words the binary starts executing in Thumb mode at address 0x880C, but IDA thinks it starts executing at 0x880D. We fix this by selecting address 0x880C, pressing ALT+G, and changing the value to 1 (this tells IDA that Thumb code is located at this address). Now again select address 0x880C and press C to directly convert the section to code.

We see a call which initializes libc. The pointer to the main function is stored in r0. Hence main() is located at address 0x87E0 and is Thumb code. So go to address 0x87E0, press ALT+G, set value to 1, and repeat until all instructions are in Thumb. Now select all the instructions belonging to the main function and press P to create a function of this code block. We have to manually select all the instructions in the function because otherwise IDA fails to create the function for us. Rename the function to main.

With this starting point you can analyze what the binary is doing. Also remember to interact as much as possible with the application before looking at assembler code. In fact, I found one vulnerability by simply entering unusual inputs.

The binary gets its input and output using stdin and stdout, respectively. During the CTF a new executable was spawned for every new connection and its input/output was forwarded. On startup the binary asks for a username. Our first goal is to find a valid username.

We want to locate the code responsible for checking the username. To do this we find where the error message is referenced. So, open the Strings subview and double click on the "ERR Invalid User" string. Now select the name of the string (aErrInvalidUser) and press X to find all references to the string. Only one function is referencing this string, open it. After a little bit of reverse engineering we get the following:

The ASCII string "traffic_operator" is loaded and compared to the string entered by the user. Based on the result of this comparison we either get the error message or are successfully logged in. That solves our first problem: the username is traffic_operator.

Once logged in the user has three options:
  1. Get command: the user enters a file name of exactly 40 alphanumeric characters. If this file was previously added (see next point) its content is loaded and displayed to the user.
  2. Execute command: the user enters a file name of at least 40 alphanumeric characters. It creates a file with as name the 40 first characters and allows to user to write arbitrary content to it.
  3. Exit.
After testing these functions we find a buffer overflow when getting a command (i.e. when reading a file). To trigger the overflow we write a large file using the second option, and then read it using the first option. In my virtual ARM environment I used the following commands to test and trigger the vulnerability:
  • ARM host: nc -c ./trafman -l -p 8000
  • Guest OS: perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."A"x400 ."\n1\n"."a"x40 ."\n"' | nc localhost 8000
We will analyze the crash in more detail in the next section. First, we reverse engineer the option menu. It's easy to locate this code because it's contained in the only function that main() calls. So open main and double click on the only non-library function being called. The text-based interface shows only three options. However, when looking at the code, we see there is a hidden option. When entering 23 the address of printf is displayed:

Notice that r0 is set to 0x8FC4 which is a pointer to the ASCII string "> %p". Register r7 is initialized in the beginning of the function to 0x12010. This address contains the dynamically loaded address of printf. Hence, it will tell us where libc is located in memory.

Getting Control of the Program Counter

We know what to do to crash the program. But we don't yet know what is causing the crash, which code is responsible, and whether it's exploitable. To answer these questions we run the binary in a debugger. Executing the following commands allows us to connect to the service from our host OS yet still debug the trafman binary itself:
  • gbc nc
  • set follow-fork-mode child
  • r -c ./trafman -l -p 8000
We now trigger the crash with:
  • perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."A"x400 ."\n1\n"."a"x40 ."\n"' | nc localhost 8000
This results in a segmentation fault because it's trying to execute code at 0x41414141. Executing "x/20x $sp" in gdb to dump the stack shows that it's likely a stackoverflow vulnerability. With this alone we know enough to write a functional exploit. But since we're learning ARM we will take a look at the code that is responsible for the buffer overflow. We open the function that is executed when the user selects option 1 in the menu and reverse it.

What happens here is that each byte in the file gets read using fgetc until we are at the end of the file. In every loop the new byte is saved to an array on the stack, which will eventually overflow. The function returns to the caller using a "pop {pc}" instruction. An interesting instruction here is "adds r0, #1". It adds one to the register r0, and updates the status (a.k.a. condition) flags according to the resulting value. In particular, if the result is zero (meaning r0 was -1), the zero flag will be set. Using the BNE instruction we test for the zero flag. If it is set we exit the loop. Note that more instructions can set the status flag this way, and they can be recognized using the S suffix.

So we indeed have a classic buffer overflow (without a stack canary). The return address is saved on the stack and popped on the function exit. Our first step in exploiting the binary is to find out where we have to put the return address in our file (remember that the content of the file is written to the stack and parts of it will overflow the saved return address). To quickly accomplish this we use the pattern_create tool from metasploit. We again start the binary in Qemu as follows (I won't repeat this again):
  • gbc nc
  • set follow-fork-mode child
  • r -c ./trafman -l -p 8000
On our host we execute the following commands:
  • PATRN=$(/usr/share/metasploit-framework/tools/pattern_create.rb 400)
  • perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."'$PATRN'" ."\n1\n"."a"x40 ."\n"' | nc localhost 8000
We let pattern_create.rb generate a unique pattern of 400 characters. As an example, a unique pattern of 20 characters would be "Aa0Aa1Aa2Aa3Aa4Aa5Aa". If we now know which bytes in this pattern get loaded in the program counter, we can let pattern_offset.rb calculate the offset where the return address is located in our file. In gdb we see it segfaults with the program counter at 0x41326a40. So we find the offset with the following command:
  • /usr/share/metasploit-framework/tools/pattern_offset.rb 41326a40
This returns 276. So we have to put the return address 276 bytes into the payload. As a quick test, execute the following command:
  • perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."A"x276 ."ABCD\n1\n"."a"x40 ."\n"' | nc localhost 8000
This will segfault with the program counter at "ABCD". We now need to point the program counter to code that will get us a shell. Unfortunately the binary uses both ASLR and NX, meaning it uses random addresses, and we can't execute data on the stack and/or heap.

Defeating ASLR and NX

Our goal is a return-to-libc attack. In particular we will attempt to execute system("/bin/sh"). Defeating ASLR is trivial due to the hidden menu option which prints the address of printf. Based on the address of printf we can locate the addresses of other variables and functions within the libc library. To get the address of printf and system for a specific run we execute "p printf" and "p system" in gdb. This gives:
  • system: 0x76f2aa38
  • printf: 0x76f347d4
There is one catch: gdb says these functions are located at an even address. However, the functions are actually Thumb code! So make sure that the correct processor mode is used when executing these functions (if necessary make the addresses uneven). Anyway, the hidden menu option returns an uneven address, hence to get the address of system we take the address of printf and substract 0x9D9C from it.

So we can call system, but still need to pass an argument to it. There are two steps to accomplish this, first we need to find out where in memory the string "/bin/sh" is located, then we need to find a way to put a pointer to this string into register r0 (recall that r0 contains the first argument of a function).

Finding a "/bin/sh" string it easy. Internally the system() function uses it as well, so it's located in the libc library. To quick find it execute the following in gdb:
  • find &system, +99999999, "/bin/sh"
In my particular instance this returned 0x76fc6528. So to dynamically get the address of "/bin/sh" we take the address of printf and add 0x91D53 to it (again remember that the hidden option returns the uneven address). To load this pointer into r0 with NX enabled we will use return-oriented-programming (ROP).

We will use ROP gadgets located in the libc library, because that's the only library we know the addresses of. Within the library we have to find gadgets which set r0 and let us return to the system function. To do this I transferred to the host and executed:
  • arm-linux-objdump -d | grep "pop.*r0.*pc"
To install the arm-linux-* tools see IO SmashTheStack. There are likely better tools to find usable gadgets, but this quick and dirty method got the job done. Important is to search for both Thumb and ARM gadgets! Forcing Thumb mode can be done using -Mforce-thumb. The most interesting gadget found was:
  • 5a7bc: pop  {r0, r4, pc}
Remark that this is ARM instruction. In objdump printf is located at 0x387d4. I found this by executing:
  • arm-linux-objdump -d | grep "printf>:"
So to dynamically get the location of the gadget we take the location of printf and add 0x21FE8 to it. Good, we now have everything in place to trigger the overflow and make it execute system("/bin/sh"). Combining all findings results in the following exploit, where I wrote the nclib myself and its functionality should be self-explanatory. When executing the exploit it looks like this: