Basics of Binary Exploitation
Intro into assembly
Each personal computer has a microprocessor that manages the computer’s arithmetical, logical, and control activities.
Each family of processors has its own set of instructions for handling operations like getting user input, displaying info on screen etc. These set of instructions are called ‘machine language instructions’. A processor can only understand these machine language instructions which is basically 0’s & 1’s. So, here comes the need of our low-level assembly.
Assembly can be intimidating so I will sum it up for you and this is (pretty) enough to start pwning some binaries.
- In assembly you are given 8-32 global variables of fixed size to work with which are called “registers”.
- There are some special registers also. MOst important is “program counter”, which tells the cpu which instruction we’re executing next. This is same as IP(instruction pointer) - don’t get confused.
- Technically, all the computation is executed on registers. A 64-bit processor requires 64-bit registers, since it enables the CPU to access 64-bit memory addresses. A 64-bit register can also store 64-bit instructions, which cannot be loaded into a 32-bit register. Therefore, most programs written for 32-bit processors can run on 64-bit computers, while 64-bit programs are not backward compatible with 32-bit machines.
- But big programs need more space so they access memory. Memory is accessed by using memory location or through push & pop op. on a stack.
- Control flow is handled via altering program counter directly using jumps, branches, or calls. These inst. are called “GOTOs”.
- Status flags are generally of 1-bit. They tells about wheather flag is set or reset.
- Branches are just GOTOs that are predicated on a status flag, like, “GOTO this address only if the last arithmetic operation resulted in zero”.
- A CALL is just an unconditional GOTO that pushes the next address on the stack, so a RET instruction can later pop it off and keep going where the CALL left off.
I think this is enough info about assembly and you’re ready to dive into binary exploitation. Wanna learn more then this book is awesome - here
Let’s start pwning binaries
To start you will need a disassembler(converts 0’s & 1’s [machine code] into assembly) like radare2, IDA, objdump etc. and a debugger(used to debug programs) like gdb, OllyDbg etc.
Let’s get started: Here is the code that I wrote and we will try to exploit it. It’s a simple license checker which check two strings. Source will be available on my github.
crackme1.c
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]){
if(argc==2){
printf("Checking Licence: %s\n", argv[1]);
if(strcmp(argv[1], "hello_stranger")==0){
printf("Access Granted!\n");
printf("Your are 1337 h4xx0r\n");
}
else{
printf("Wrong!\n");
}
}
else{
fprintf(stderr, "Usage: %s <name>\n", argv[0]);
return 1;
}
return 0;
}
This code is pretty simple and I hope you can understand it. So lets compile it.
$ gcc crackme1.c -o crackme1
Now we will use gdb to debug our program
$ gdb crackme1
Now we know that every program has main function. So lets disassemble it.
(gdb) disassemble main
It will through this:
Dump of assembler code for function main:
0x0000000000001169 <+0>: push %rbp
0x000000000000116a <+1>: mov %rsp,%rbp
0x000000000000116d <+4>: sub $0x10,%rsp
0x0000000000001171 <+8>: mov %edi,-0x4(%rbp)
0x0000000000001174 <+11>: mov %rsi,-0x10(%rbp)
0x0000000000001178 <+15>: cmpl $0x2,-0x4(%rbp)
0x000000000000117c <+19>: jne 0x11e3 <main+122>
0x000000000000117e <+21>: mov -0x10(%rbp),%rax
0x0000000000001182 <+25>: add $0x8,%rax
0x0000000000001186 <+29>: mov (%rax),%rax
0x0000000000001189 <+32>: mov %rax,%rsi
0x000000000000118c <+35>: lea 0xe71(%rip),%rdi # 0x2004
0x0000000000001193 <+42>: mov $0x0,%eax
0x0000000000001198 <+47>: callq 0x1040 <printf@plt>
0x000000000000119d <+52>: mov -0x10(%rbp),%rax
0x00000000000011a1 <+56>: add $0x8,%rax
0x00000000000011a5 <+60>: mov (%rax),%rax
0x00000000000011a8 <+63>: lea 0xe6b(%rip),%rsi # 0x201a
0x00000000000011af <+70>: mov %rax,%rdi
0x00000000000011b2 <+73>: callq 0x1050 <strcmp@plt>
0x00000000000011b7 <+78>: test %eax,%eax
0x00000000000011b9 <+80>: jne 0x11d5 <main+108>
0x00000000000011bb <+82>: lea 0xe67(%rip),%rdi # 0x2029
0x00000000000011c2 <+89>: callq 0x1030 <puts@plt>
0x00000000000011c7 <+94>: lea 0xe6b(%rip),%rdi # 0x2039
0x00000000000011ce <+101>: callq 0x1030 <puts@plt>
0x00000000000011d3 <+106>: jmp 0x120c <main+163>
0x00000000000011d5 <+108>: lea 0xe72(%rip),%rdi # 0x204e
0x00000000000011dc <+115>: callq 0x1030 <puts@plt>
0x00000000000011e1 <+120>: jmp 0x120c <main+163>
0x00000000000011e3 <+122>: mov -0x10(%rbp),%rax
0x00000000000011e7 <+126>: mov (%rax),%rdx
0x00000000000011ea <+129>: mov 0x2e6f(%rip),%rax # 0x4060 <stderr@@GLIBC_2.2.5>
0x00000000000011f1 <+136>: lea 0xe5d(%rip),%rsi # 0x2055
0x00000000000011f8 <+143>: mov %rax,%rdi
0x00000000000011fb <+146>: mov $0x0,%eax
0x0000000000001200 <+151>: callq 0x1060 <fprintf@plt>
0x0000000000001205 <+156>: mov $0x1,%eax
0x000000000000120a <+161>: jmp 0x1211 <main+168>
0x000000000000120c <+163>: mov $0x0,%eax
0x0000000000001211 <+168>: leaveq
0x0000000000001212 <+169>: retq
End of assembler dump.
This looks ugly right. Well it’s AT&T syntax, change it to intel using:
(gdb) set disassembly-flavor intel
For permanent change, create ~/.gdbinit and add
set disassembly-flavor intel
Again disassemble main and you will get a more readable code
Dump of assembler code for function main:
0x0000000000001169 <+0>: push rbp
0x000000000000116a <+1>: mov rbp,rsp
0x000000000000116d <+4>: sub rsp,0x10
0x0000000000001171 <+8>: mov DWORD PTR [rbp-0x4],edi
0x0000000000001174 <+11>: mov QWORD PTR [rbp-0x10],rsi
0x0000000000001178 <+15>: cmp DWORD PTR [rbp-0x4],0x2
0x000000000000117c <+19>: jne 0x11e3 <main+122>
0x000000000000117e <+21>: mov rax,QWORD PTR [rbp-0x10]
0x0000000000001182 <+25>: add rax,0x8
0x0000000000001186 <+29>: mov rax,QWORD PTR [rax]
0x0000000000001189 <+32>: mov rsi,rax
0x000000000000118c <+35>: lea rdi,[rip+0xe71] # 0x2004
0x0000000000001193 <+42>: mov eax,0x0
0x0000000000001198 <+47>: call 0x1040 <printf@plt>
0x000000000000119d <+52>: mov rax,QWORD PTR [rbp-0x10]
0x00000000000011a1 <+56>: add rax,0x8
0x00000000000011a5 <+60>: mov rax,QWORD PTR [rax]
0x00000000000011a8 <+63>: lea rsi,[rip+0xe6b] # 0x201a
0x00000000000011af <+70>: mov rdi,rax
0x00000000000011b2 <+73>: call 0x1050 <strcmp@plt>
0x00000000000011b7 <+78>: test eax,eax
0x00000000000011b9 <+80>: jne 0x11d5 <main+108>
0x00000000000011bb <+82>: lea rdi,[rip+0xe67] # 0x2029
0x00000000000011c2 <+89>: call 0x1030 <puts@plt>
0x00000000000011c7 <+94>: lea rdi,[rip+0xe6b] # 0x2039
0x00000000000011ce <+101>: call 0x1030 <puts@plt>
0x00000000000011d3 <+106>: jmp 0x120c <main+163>
0x00000000000011d5 <+108>: lea rdi,[rip+0xe72] # 0x204e
0x00000000000011dc <+115>: call 0x1030 <puts@plt>
0x00000000000011e1 <+120>: jmp 0x120c <main+163>
0x00000000000011e3 <+122>: mov rax,QWORD PTR [rbp-0x10]
0x00000000000011e7 <+126>: mov rdx,QWORD PTR [rax]
0x00000000000011ea <+129>: mov rax,QWORD PTR [rip+0x2e6f] # 0x4060 <stderr@@GLIBC_2.2.5>
0x00000000000011f1 <+136>: lea rsi,[rip+0xe5d] # 0x2055
0x00000000000011f8 <+143>: mov rdi,rax
0x00000000000011fb <+146>: mov eax,0x0
0x0000000000001200 <+151>: call 0x1060 <fprintf@plt>
0x0000000000001205 <+156>: mov eax,0x1
0x000000000000120a <+161>: jmp 0x1211 <main+168>
0x000000000000120c <+163>: mov eax,0x0
0x0000000000001211 <+168>: leave
0x0000000000001212 <+169>: ret
End of assembler dump.
Now make a assumption how this binary works. When you run it without any argument it will display the usage message. If you pass two arguments where first one is program name itself and second one is license key, it will display a access granted or access denied message. Now apply that assumption to assembly code.
For exploitation, we can ignore most of the stuff. So at 0x1178, you can see a cmp function which is comparing a pointer to hex 0x2(which is 2 in decimal). According to our assumption, that must be checking arguments. Just below that 0x117c have a jne(basically jump not equal). So if those strings don’t match, control flow will jump to addr 0x11e3. Now at addr 0x1198, it is calling a printf function, which maybe printing “Checking License:” when you run the binary. Next interesting addr is 0x11b2, it is calling a strcmp(string compare) function. It should be comparing our key with the correct key to verify. Next we have 0x11b7 which is a test function and returns value 0 if strings match. After that we have addr 0x11b9 which is jne(jump not equal), jumps to addr 0x11d5 if strings are not equal. After that we have 0x11c2 and 0x11ce which is calling a puts(it just prints stuff) function, this will print “Access Granted!” and some other text if we give correct key. Next is 0x11d3 which will jump to 0x120c and terminates our program. Now let’s exploit it using gdb to print access granted without using key.
First set breakpoint at main. Breakpoint is a point in memory where your execution stops.
(gdb) break *main
Now run the program and watch the control flow. You can use pen-paper for better understanding.
(gdb) run
(gdb) ni
ni is to execute next instruction. After that just press enter and it will execute the next instruction. Now try running the program with a key.
(gdb) run random_key
(gdb) ni
Carefully watch the control flow this time. Now according to our assumption, if we change the value of eax at addr 0x11b7, we are telling the program that the strings matched and it will print the access granted message. So for that set breakpoint 2 to the address of test eax, eax.
(gdb) disass main
(gdb) break *0x00005555555551b7
Again run the program with a random_key.
(gdb) run random_key
After hitting the first breakpoint, type continue to jump to next breakpoint.
(gdb) continue
(gdb) info registers
(gdb) set $eax=0
(gdb) ni
Here I set the value of eax to 0 and run the program instruction by instruction. After setting eax=0, next addr 0x00005555555551b9 will not be executed as it is jne. Use ni to continue executing next instruction.
(gdb) run random_key
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: crackme1 random_key
Breakpoint 1, 0x0000555555555169 in main ()
(gdb) continue
Continuing.
Checking Licence: random_key
Breakpoint 2, 0x00005555555551b7 in main ()
(gdb) info registers
rax 0x3 3
rbx 0x0 0
rcx 0xfff7fdff 4294442495
rdx 0x68 104
rsi 0x55555555601a 93824992239642
rdi 0x7fffffffe563 140737488348515
rbp 0x7fffffffe170 0x7fffffffe170
rsp 0x7fffffffe160 0x7fffffffe160
r8 0xffffffff 4294967295
r9 0x1d 29
r10 0xfffffffffffff1a9 -3671
r11 0x7ffff7f36140 140737353310528
r12 0x555555555070 93824992235632
r13 0x7fffffffe250 140737488347728
r14 0x0 0
r15 0x0 0
rip 0x5555555551b7 0x5555555551b7 <main+78>
eflags 0x206 [ PF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) set $eax=0
(gdb) info registers
rax 0x0 0
rbx 0x0 0
rcx 0xfff7fdff 4294442495
rdx 0x68 104
rsi 0x55555555601a 93824992239642
rdi 0x7fffffffe563 140737488348515
rbp 0x7fffffffe170 0x7fffffffe170
rsp 0x7fffffffe160 0x7fffffffe160
r8 0xffffffff 4294967295
r9 0x1d 29
r10 0xfffffffffffff1a9 -3671
r11 0x7ffff7f36140 140737353310528
r12 0x555555555070 93824992235632
r13 0x7fffffffe250 140737488347728
r14 0x0 0
r15 0x0 0
rip 0x5555555551b7 0x5555555551b7 <main+78>
eflags 0x206 [ PF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) ni
0x00005555555551b9 in main ()
(gdb)
0x00005555555551bb in main ()
(gdb)
0x00005555555551c2 in main ()
(gdb)
Access Granted!
0x00005555555551c7 in main ()
(gdb)
0x00005555555551ce in main ()
(gdb)
Your are 1337 h4xx0r
0x00005555555551d3 in main ()
(gdb)
Voila! You have cracked the program without knowing the correct key.
This one is just a basic intro into binary exploitation and enough to get you started.