Friday 8 February 2013

Understanding the Heap & Exploiting Heap Overflows

This post will begin with a high level description of the heap and slowly builds up untill you able to write your own heap-based exploits. We assume we have non-root access to a computer but are able to run the following program as root (meaning it's a suid binary):

There's a blatant buffer overflow in line 10 which we will be exploiting. First we need to know how the heap is managed (we focus on Linux).

Basic Heap and Chunk Layout

Every memory allocation a program makes (say by calling malloc) is internally represented by a so called "chunk". A chunk consists of metadata and the memory returned to the program (i.e., the memory actually returned by malloc). All these chunks are saved on the heap, which is a memory region capable of expanding when new memory is requested. Similarly, the heap can shrink once a certain amount of memory has been freed. A chunk is defined in the glibc source as follows:

Assuming no memory chunks have been freed yet, new memory allocations are always stored right after the last allocated chunk. So if a program were to call malloc(256), malloc(512), and finally malloc(1024), the memory layout of the heap is as follows:
Meta-data of chunk created by malloc(256)
The 256 bytes of memory return by malloc
Meta-data of chunk created by malloc(512)
The 512 bytes of memory return by malloc
Meta-data of chunk created by malloc(1024)
The 1024 bytes of memory return by malloc
Meta-data of the top chunk
The dash line "---" is an imaginary boundary between the chunks, in reality they are placed right next to each other (example program illustrating the layout). Anyway, you're probably wondering why I included the meta data of the "top chunk" in the layout. Well, the top chunk represents the remaining available memory on the heap, and it is the only chunk that can grow in size. When a new memory request is made, the top chunk is split into two: the first part becomes the requested chunk, and the second part is the new the top chunk (so the "top chunk" shrunk in size). If the top chunk is not large enough to fulfill the memory allocation, the program asks the operating system to expand the top chunk (making the heap grow in size).

Interpretation of the Chunk Structure

The interpretation of the chunk structure depends on the current state of the chunk. For example, the only metadata present in an allocated chunk are the prev_size and size fields. The buffer returned to the program starts at the fd field. This means an allocated chunk always has 8 bytes of metadata, after which the actual buffer starts. Also surprising is that the prev_size field isn't used by allocated chunks! In a minute we will see that an unallocated (free) chunk does use the additional fields.

Another important observation is that in glibc chunks are always 8-bytes aligned. And to simplify memory management the size of chunks is always a multiple of 8 bytes. This means that the last 3 bits of the size field can be used for other purposes (these 3 bits would always be zero otherwise). Only the first (least significant) bit is important for us. If it's set it means that the previous chunk is in use. If it's not set the previous chunk is not in use. A chunk is not in used if the corresponding memory has been freed (by a call to free).

So how can we check whether the current chunk is in use? Simple really: we navigate to the next chunk by adding size to the pointer of the current chunk, resulting in the location of the next chunk. In this next chunk we read the least significant bit of the size field to see if the chunk is in use.

Managing Free Chunks

We know that when a chunk is freed, the least significant bit of the size field in the meta data of the next chunk must be cleared. Additionally, the prev_size field of this next chunk will be set to the size of the chunk we are freeing.

A freed chunk also uses the fd and bk fields. These are the fields we can abuse in our exploit. The fd field points to the previous free chunk, and the bk field to the next free chunk. This means free chunks are saved in a doubly linked list. However there isn't just one list containing all free chunks. There are actually multiple lists of free chunks. Each list contains free chunks of a specific size. This makes searching for a free chunk of a certain size a lot faster, since it only has to search in the list supporting the particular size. We now need to correct an earlier statement: When a memory allocation request is made, it first searches for a free chunk that has the same size (or a bit larger), and will reuse that memory. Only if no appropriate free chunk was found will the top chunk be used.

Finally we get to the functionality we will exploit: freeing a chunk. When a chunk is freed (e.g. by a call to free) it checks whether the chunk before it has already been freed. In case the previous chunk is not in use, it's coalesced with the chunk being freed. However this increases the size of the chunk. As a result the chunk has to be placed in a different list of free chunks. To do this the already freed chunk is first removed from the linked list, and then the coalesced chunk is added to the appropriate list. The code of removing a chunk from a linked list is:

Here argument P represents the chunk being removed from the list. Arguments BK and FD are output arguments representing the previous and next chunk, respectively. This is very typical code to remove an element from a linked list, and its operation is illustrated below [Source]:

Here the element containing "Lam Larry" is being removed from the list. As you can see, to remove an element from a linked list two operators must be performed:
  1. The backward (bk) pointer of the next element (FD->bk) is set to the pointer of the previous element (BK). This corresponds to line 6 of the unlink function.
  2. The forward (fd) pointer of the previous element (BK->fd) is set to the pointer of the next element (FD). This corresponds to line 7 of the unlink function.
From an attacker perspective, the important thing is that two write operations to the memory are performed. The goal is now to manipulate the metadata so that we can control the value being written, and control where it's written. This allows us to write an arbitrary value to an arbitrary location. With this we can overwrite the function pointer of a destructor, and make it point to our own code.

Note: The article Vudo malloc tricks [7] also contains a more detailed discussion about how malloc works internally.

More Recent Techniques

Unfortunately the technique explained above is no longer possible against newer versions of glibc. The unlink function has been hardened and includes the following runtime check: the forward pointer of the previous chunk must point toward the current chunk. Similarly the previous pointer of the next chunk must also point toward the current chunk:

For the more advanced techniques we will base ourselves on the techniques outlined in the Malloc Maleficarum [5] and the Malloc Des-Maleficarum [6]. The author introduced several techniques and gave them rather special names. The requirements of these techniques are:
  • The House of Prime: Requires two free's of chunks containing attacker controlled size fields, followed by a call to malloc.
  • The House of Mind: Requires the manipulation of the program into repeatedly allocating new memory.
  • The House of Force: Requires that we can overwrite the top chunk, that there is one malloc call with a user controllable size, and finally requires another call to malloc.
  • The House of Lore: Again not applicable to our example program.
  • The House of Spirit: One assumption is that the attacker controls a pointer given to free, so again this technique cannot be used.
  • The House of Chaos: This isn't actually a technique, just a section in the article :)
To conclude: the techniques in the Malloc Maleficarum are not applicable to our very simple example. All techniques require a sufficient (and controllable) amount of calls to free and/or malloc. Now, in sufficiently large programs there should be enough methods to trigger calls to free/malloc, possible making them exploitable. So the real problem is that our example is not realistic: it's just too simple! Therefore we will update the example so it matches a possible scenario we can actually exploit. The updated example is:

It might seem like a big requirement that we can control the size given to the malloc call. This is not necessarily true. Imagine a (networked) service capable of caching objects. To cache an object first the size of the object is sent to the service, after which the object itself is transmitted. In such a program one can easily control the size given to a malloc call. Anyway, this example is something we can exploit, even when compiled against the latest glibc!

Exploitation Technique: The House of Force

Our example program nicely matches the requirements of the House of Force technique. This technique abuses the code responsible for allocating memory from the top chunk. This code is in _int_malloc:

The goal is to overwrite av->top with a user controllable value. The av->top variable always points to the top chunk. During a call to malloc this variable is used to get a reference to the top chunk (in case no other chunks could fulfill the request). This means that if we control the value of av->top, and we can force a call to malloc which uses the top chunk, we control where the next chunk will be allocated (i.e. we control the return value of malloc in line 15 of the example). Consequently we can write arbitrary bytes to any address using line 16.

To make sure the appropriate code gets executed the if-test in line 15 must evaluate to true. This test checks whether the top chunk is large enough to contain the requested chunk (and if enough space remains for the metadata in the top chunk). We want to assure that any request (of arbitrary large size) will use the top chunk. To accomplish this we abuse the overflow in line 11 to overwrite the metadata of the top chunk. First we write 256 bytes to fill up the allocated space, then 4 bytes to overwrite prev_size, and we finally overwrite the size with the largest possible (unsigned) integer:
LARGETOPCHUNK=$(perl -e 'print "A"x260 . "\xFF\xFF\xFF\xFF"')
./example $LARGETOPCHUNK 1 2
The above won't actually exploit the program, since more steps are needed. But feel free to use a debugger and confirm that the size of the top chunk is indeed modified. We continue by overwriting the variable av->top. Our goal is to make it point 8 bytes before the GOT entry of free. The GOT table is where pointers to dynamically loaded functions are stored. So if we're able to overwrite the pointer to free, we can make the program jump to an arbitrary location (and in particular to our shellcode). To find out the address of the got.plt entry execute:
readelf --relocs example
In my case free was located at 0804a008, which subtracted by 8 becomes 0x804a000. The value being written to av->top is calculated by chunk_at_offset:
/* Treat space at ptr + offset as a chunk */
#define chunk_at_offset(p, s)  ((mchunkptr)(((char*)(p)) + (s)))
We control the second argument: nb (in the #define called s). By debugging the program I found that victim (the older value for av->top) was 0x804b110. Hence the value passed to malloc should be 0x804a000 - 0x804b110 =  FFFFEEF0. We now get:
LARGETOPCHUNK=$(perl -e 'print "A"x260 . "\xFF\xFF\xFF\xFF"')
Resulting in a segfault with eip set to 0x41414141 (= the byte encoding of AAAA)! The final step is to point eip to a location under our control. Let's assume ASLR is disabled, so the stack is always at a fixed location. In my case it starts at 0xBFFFFFFF. All that remains is to construct a NOP slide, inject the shellcode, and make eip point towards the NOP slide:
LARGETOPCHUNK=$(perl -e 'print "A"x260 . "\xFF\xFF\xFF\xFF"')
NOPS=$(perl -e 'print "\x90"x 0x10000')
Hell yeah! We are now greeted by a nice sh shell! =)

Ahhh, the joy of success!

A few final remarks are in place. First, we used a large NOP slide. This makes it a lot easier to get the stack address (the third argument) correct. Second, the first byte of $STACKADDR starts with 1. This is done to avoid if from containing a NULL byte (which would terminate the string). Finally we placed the NOP slide and shell code into environment variables, and used the env command to clear all other environment variables (resulting in more predictable behaviour of the exploit).

Additionally we could've beaten ASLR by a simple brute force, as we are working on a 32 bit platform. In case DEP is enabled our shellcode on the stack would not be executable, and we would have to fall back on Return Oriented Programming (ROP). This requires controlling the value of esp, which can be done by overwriting the value of a saved frame-pointer (saved ebp value).


We have successfully exploited the second example program. Additionally it was explained how both DEP and ASLR can be bypassed.


[1] Justin N. Ferguson, Understanding the heap by breaking it. Blackhat USA, 2007.
[2] J. Koziol, D. Litch´Čüeld, D. Aitel, C. Anley, S. Eren, N. Mehta, and R. Hassell. The Shellcoder’s Handbook: Discovering and Exploiting Security Holes. Wiley, 2003.
[3] Doug Lea, Design of dlmalloc: A Memory Allocator. Personal website.
[4] Andries Brouwer, Hackers Hut: Exploiting the heap.
[5] Phantasmal Phantasmagoria, The Malloc Maleficarum. bugtraq mailing list, 2005.
[6] blackngel, Malloc Des-Maleficarum. Phrack Volume 0x0d, Issue 0x42, 2009.
[7] MaXX, Vudo malloc tricksPhrack Volume 0x0b, Issue 0x39, 2001.