Stack overflow when passing a vector to memcpy

Stack overflow when passing a vector to memcpy - c++

I am trying to pass a vector of unsigned char (shellcode) to virtualalloc but keep getting stack overflow when trying to execute
std::vector<unsigned char> decrypted = {0x00, 0x15, ...} ; // shell code decrypted to unsigned char
cout << &decrypted.front() << endl;
size_t shellcodesize = decrypted.size();
void *exec = VirtualAlloc(0, shellcodesize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(exec, &decrypted.front(), shellcodesize);
((void(*)())exec)();
The shellcode is not the problem because I used different shellcodes all cause the same problem
The encryption/decryption works as intended because it is tested in other projects before and works flawlessly which leaves me with the last 4-5 lines shown above
when compiling no errors are shown but when running in windbg preview I get this
(3cd4.3664): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds.
00000000`00170003 385c7838 cmp byte ptr [rax+rdi*2+38h],bl ds:00000000`01f13078=??
WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds.
I think that when using unsigned char buf[] = "\x00\x15"; it is automatically null-terminated but as far as I know a vector is not which I think causes the stack overflow issue (please correct me if wrong )

There is one slight issue with your code:
void *exec = VirtualAlloc(0, shellcodesize, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
is the correct call to VirtualAlloc here. Most Windows versions actually return allocated memory, if the first parameter is 0, but there is no guarantee for this. If the first parameter is non-0 (and MEM_COMMIT is requested w/o MEM_RESERVE), the call will fail.
Other than that, your code is correct (you might check the return value of VirtualAlloc to be non-zero) and I would suspect, that the loaded code contains the stack overflow. Especially, as 0x00 0x15 does not make any sense at all when disassembled and will certainly crash your application.
EDIT: If your architecture is X64, you can test the following shellcode: 0xb8, 0x01, 0x00, 0x00, 0x00, 0xc3 (it simply returns 1 from the subroutine)

Related

What's the deal with changing pointers?

I just have a quick question on the extraneous number (1684096032) outputted to the screen. I expected the integer ASCII value (97) to be outputted, as that is the ASCII value of lowercase 'a'. The console gave me a ginormous number instead...
using namespace std;
int main(){
char dadum[50] = "Papa Dadi";
char* wicked = dadum;
int* baboom = (int*)wicked;
cout << baboom[1] << endl;
cout<<"Hello World";
return 0;
}

A Word of Warning
Casting a a pointer to int* that wasn’t derived from an int* is undefined behavior, and the compiler is allowed to do literally anything. This isn’t just theoretical: on some workstations I’ve coded on, an int* must be aligned on a four-byte boundary, whereas a char* might not be, trying to load a misaligned 32-bit value would cause a CPU fault, and therefore code like this might or might not crash at runtime with a “bus error.”
Where 1684096032 Comes From
The compiler here is doing the “common-sense” thing for a desktop computer in 2019. It’s letting you “shoot yourself in the foot,” so to speak, by emitting a load instruction from the address you cast to int*, no questions asked. What you get is the garbage that CPU instruction gives you.
What’s going on is that "Papa Dadi" is stored as an array of bytes in memory, including the terminating zero byte. It’s equivalent to, {'P','a','p','a',' ','D','a','d', 'i', '\0'}. ("Papa Dadi" is just syntactic sugar. You could write it either way) These are stored as their ASCII values1 { 0x50, 0x61, 0x70, 0x61, 0x20, 0x44, 0x61, 0x64, 0x69, 0x00 }.
You happen to be compiling on a machine with four-byte int, so when you alias wicked to the int* baboom, baboom[0] aliases bytes 0–3 of wicked and baboom[1] bytes 4–7. Therefore, on the implementation you tested, you happened to get back the bytes " Dad", or in Hex, 20 44 61 64.
Next, you happen to be compiling for a little-endian machine, so that gets loaded from memory in “back-words order,” 0x64614420.
This has the decimal value 1684096032.
What You Meant to Do
Based on your remarks:2
cout << (int)wicked[1];
1 Evidently you aren’t running on the sole exception, an IBM mainframe compiler. These still use EBCDIC by default.
2 The consensus here is never to write using namespace std;. Personally, I learned to use <iostream.h> and the C standard library long before namespace std and I still prefer to omit std for those, so you can tell my code by the block of commands like using std::cout;.

You are trying to print out the second value of integer array which points to the string "Papa Dadi"
You should know that the integer is 4-bytes and char is 1-byte. so each one element in the integer array will skip 4-bytes(characters)
the print out you see is "0x64614420" in hex, which when you think about big-endian you will see the following
0x20 is space character.
0x44 is 'D'
0x61 is 'a'
0x64 is 'd'
so you need to convert big-endian to little-endian if you care about order. however I am not sure what are you trying to achieve.

How to alloc a executable memory buffer?

I would like to alloc a buffer that I can execute on Win32 but I have an exception in visual studio cuz the malloc function returns a non executable memory zone. I read that there a NX flag to disable... My goal is convert a bytecode to asm x86 on fly with keep in mind performance.
Does somemone can help me?

You don't use malloc for that. Why would you anyway, in a C++ program? You also don't use new for executable memory, however. There's the Windows-specific VirtualAlloc function to reserve memory which you then mark as executable with the VirtualProtect function applying, for instance, the PAGE_EXECUTE_READ flag.
When you have done that, you can cast the pointer to the allocated memory to an appropriate function pointer type and just call the function. Don't forget to call VirtualFree when you are done.
Here is some very basic example code with no error handling or other sanity checks, just to show you how this can be accomplished in modern C++ (the program prints 5):
#include <windows.h>
#include <vector>
#include <iostream>
#include <cstring>
int main()
{
std::vector<unsigned char> const code =
{
0xb8, // move the following value to EAX:
0x05, 0x00, 0x00, 0x00, // 5
0xc3 // return what's currently in EAX
};
SYSTEM_INFO system_info;
GetSystemInfo(&system_info);
auto const page_size = system_info.dwPageSize;
// prepare the memory in which the machine code will be put (it's not executable yet):
auto const buffer = VirtualAlloc(nullptr, page_size, MEM_COMMIT, PAGE_READWRITE);
// copy the machine code into that memory:
std::memcpy(buffer, code.data(), code.size());
// mark the memory as executable:
DWORD dummy;
VirtualProtect(buffer, code.size(), PAGE_EXECUTE_READ, &dummy);
// interpret the beginning of the (now) executable memory as the entry
// point of a function taking no arguments and returning a 4-byte int:
auto const function_ptr = reinterpret_cast<std::int32_t(*)()>(buffer);
// call the function and store the result in a local std::int32_t object:
auto const result = function_ptr();
// free the executable memory:
VirtualFree(buffer, 0, MEM_RELEASE);
// use your std::int32_t:
std::cout << result << "\n";
}
It's very unusual compared to normal C++ memory management, but not really rocket science. The hard part is to get the actual machine code right. Note that my example here is just very basic x64 code.

Extending the above answer, a good practice is:
Allocate memory with VirtualAlloc and read-write-access.
Fill that region with your code
Change that region's protection with VirtualProtectto execute-read-access
jump to/call the entry point in this region
So it could look like this:
adr = VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
// write code to the region
ok = VirtualProtect(adr, size, PAGE_EXECUTE_READ, &oldProtection);
// execute the code in the region

As stated in documentation for VirtualAlloc
flProtect [in]
The memory protection for the region of pages to be allocated. If the pages are being committed, you can specify any one of the memory protection constants.
one of them is:
PAGE_EXECUTE
0x10
Enables execute access to the committed region of pages. An attempt to write to the committed region results in an access violation.
This flag is not supported by the CreateFileMapping function.
PAGE_EXECUTE_READ
0x20
Enables execute or read-only access to the committed region of pages. An attempt to write to the committed region results in an access violation.
Windows Server 2003 and Windows XP: This attribute is not supported by the CreateFileMapping function until Windows XP with SP2 and Windows Server 2003 with SP1.
PAGE_EXECUTE_READWRITE
0x40
Enables execute, read-only, or read/write access to the committed region of pages.
Windows Server 2003 and Windows XP: This attribute is not supported by the CreateFileMapping function until Windows XP with SP2 and Windows Server 2003 with SP1.
and so on from here

C version based off of Christian Hackl's answer
I think SIZE_T dwSize of VirtualAlloc should be the size of the code in bytes, not system_info.dwPageSize (what if sizeof code is bigger than system_info.dwPageSize?).
I don't know C enough to know if sizeof(code) is the "correct" way of getting the size of the machine code
this compiles under c++ so I guess it's not off topic lol
#include <Windows.h>
#include <stdio.h>
int main()
{
// double add(double a, double b) {
// return a + b;
// }
unsigned char code[] = { //Antonio Cuni - How to write a JIT compiler in 30 minutes: https://www.youtube.com/watch?v=DKns_rH8rrg&t=118s
0xf2,0x0f,0x58,0xc1, //addsd %xmm1,%xmm0
0xc3, //ret
};
LPVOID buffer = VirtualAlloc(NULL, sizeof(code), MEM_COMMIT, PAGE_READWRITE);
memcpy(buffer, code, sizeof(code));
//protect after write, because protect will prevent writing.
DWORD oldProtection;
VirtualProtect(buffer, sizeof(code), PAGE_EXECUTE_READ, &oldProtection);
double (*function_ptr)(double, double) = (double (*)(double, double))buffer; //is there a cleaner way to write this ?
// double result = (*function_ptr)(2, 234); //NOT SURE WHY THIS ALSO WORKS
double result = function_ptr(2, 234);
VirtualFree(buffer, 0, MEM_RELEASE);
printf("%f\n", result);
}

At compile time, the linker will organize your program's memory footprint by allocating memory into data sections and code sections. The CPU will make sure that the program counter (the hard CPU register) value remains within a code section or the CPU will throw a hardware exception for violating the memory bounds. This provides some security by making sure your program only executes valid code. Malloc is intended for allocating data memory. Your application has a heap and the heap's size is established by the linker and is marked as data memory. So at runtime malloc is just grabbing some of the virtual memory from your heap which will always be data.
I hope this helps you have a better understanding what's going on, though it might not be enough to get you where you need to be. Perhaps you can pre-allocate a "code heap" or memory pool for your runtime-generated code. You will probably need to fuss with the linker to accomplish this but I don't know any of the details.

How to call machine code stored in char array?

I'm trying to call native machine-language code. Here's what I have so far (it gets a bus error):
char prog[] = {'\xc3'}; // x86 ret instruction
int main()
{
typedef double (*dfunc)();
dfunc d = (dfunc)(&prog[0]);
(*d)();
return 0;
}
It does correctly call the function and it gets to the ret instruction. But when it tries to execute the ret instruction, it has a SIGBUS error. Is it because I'm executing code on a page that is not cleared for execution or something like that?
So what am I doing wrong here?

One first problem might be that the location where the prog data is stored is not executable.
On Linux at least, the resulting binary will place the contents of global variables in the "data" segment or here, which is not executable in most normal cases.
The second problem might be that the code you are invoking is invalid in some way. There's a certain procedure to calling a method in C, called the calling convention (you might be using the "cdecl" one, for example). It might not be enough for the called function to just "ret". It might also need to do some stack cleanup etc. otherwise the program will behave unexpectedly. This might prove an issue once you get past the first problem.

You need to call memprotect in order to make the page where prog lives executable. The following code does make this call, and can execute the text in prog.
#include <unistd.h>
#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
char prog[] = {
0x55, // push %rbp
0x48, 0x89, 0xe5, // mov %rsp,%rbp
0xf2, 0x0f, 0x10, 0x05, 0x00, 0x00, 0x00,
//movsd 0x0(%rip),%xmm0 # c <x+0xc>
0x00,
0x5d, // pop %rbp
0xc3, // retq
};
int main()
{
long pagesize = sysconf(_SC_PAGE_SIZE);
long page_no = (long)prog/pagesize;
int res = mprotect((void*)(page_no*pagesize), (long)page_no+sizeof(prog), PROT_EXEC|PROT_READ|PROT_WRITE);
if(res)
{
fprintf(stderr, "mprotect error:%d\n", res);
return 1;
}
typedef double (*dfunc)(void);
dfunc d = (dfunc)(&prog[0]);
double x = (*d)();
printf("x=%f\n", x);
fflush(stdout);
return 0;
}

As everyone already said, you must ensure prog[] is executable, however the proper way to do it, unless you're writing a JIT compiler, is to put the symbol in an executable area, either by using a linker script or by specifying the section in the C code if the compiler allows , e.g.:
const char prog[] __attribute__((section(".text"))) = {...}

Virtually all C compilers will let you do this by embedding regular assembly language in your code. Of course it's a non-standard extension to C, but compiler writers recognise that it's often necessary. As a non-standard extension, you'll have to read your compiler manual and check how to do it, but the GCC "asm" extension is a fairly standard approach.
void DoCheck(uint32_t dwSomeValue)
{
uint32_t dwRes;
// Assumes dwSomeValue is not zero.
asm ("bsfl %1,%0"
: "=r" (dwRes)
: "r" (dwSomeValue)
: "cc");
assert(dwRes > 3);
}
Since it's easy to trash the stack in assembler, compilers often also allow you to identify registers you'll use as part of your assembler. The compiler can then ensure the rest of that function steers clear of those registers.
If you're writing the assembler code yourself, there is no good reason to set up that assembler as an array of bytes. It's not just a code smell - I'd say it is a genuine error which could only happen by being unaware of the "asm" extension which is the right way to embed assembler in your C.

Essentially this has been clamped down on because it was an open invitation to virus writers. But you can allocate and buffer and set it up with native machinecode in straight C - that's no problem. The issue is calling it. Whilst you can try setting up a function pointer with the address of the buffer and calling it, that's highly unlikely to work, and highly likely to break on the next version of the compiler if somehow you do manage to coax it into doing what you want. So the best bet is to simply resort to a bit of inline assembly, to set up the return and jump to the automatically generated code. But if the system protects against this, you'll have to find methods of circumventing the protection, as Rudi described in his answer (but very specific to one particular system).

One obvious error is that \xc3 is not returning the double that you claim it's returning.

How does Visual Studio 2013 detect buffer overrun

Visual Studio 2013 C++ projects have a /GS switch to enable buffer security check validation at runtime. We are encountering many more STATUS_STACK_BUFFER_OVERRUN errors since upgrading to VS 2013, and suspect it has something to do with improved checking of buffer overrun in the new compiler. I've been trying to verify this and better understand how buffer overrun is detected. I'm befuddled by the fact that buffer overrun is reported even when the memory updated by a statement only changes the contents of another local variable on the stack in the same scope! So it must be checking not only that the change doesn't corrupt memory not "owned" by a local variable, but that the change doesn't affect any local variable other than that allocated to the one referenced by the individual update statement. How does this work? Has it changed since VS 2010?
Edit:
Here's an example illustrating a case that Mysticial's explanation doesn't cover:
void TestFunc1();
int _tmain(int argc, _TCHAR* argv[])
{
TestFunc1();
return 0;
}
void TestFunc1()
{
char buffer1[4] = ("123");
char buffer2[4] = ("456");
int diff = buffer1 - buffer2;
printf("%d\n", diff);
getchar();
buffer2[4] = '\0';
}
The output is 4 indicating that the memory about to be overwritten is within the bounds of buffer1 (immediately after buffer2), but then the program terminates with a buffer overrun. Technically it should be considered a buffer overrun, but I don't know how it's being detected since it's still within the local variables' storage and not really corrupting anything outside local variables.
This screenshot with memory layout proves it. After stepping one line the program aborted with the buffer overrun error.
I just tried the same code in VS 2010, and although debug mode caught the buffer overrun (with a buffer offset of 12), in release mode it did not catch it (with a buffer offset of 8). So I think VS 2013 tightened the behavior of the /GS switch.
Edit 2:
I managed to sneak past even VS 2013 range checking with this code. It still did not detect that an attempt to update one local variable actually updated another:
void TestFunc()
{
char buffer1[4] = "123";
char buffer2[4] = "456";
int diff;
if (buffer1 < buffer2)
{
puts("Sequence 1,2");
diff = buffer2 - buffer1;
}
else
{
puts("Sequence 2,1");
diff = buffer1 - buffer2;
}
printf("Offset: %d\n", diff);
switch (getchar())
{
case '1':
puts("Updating buffer 1");
buffer1[diff] = '!';
break;
case '2':
puts("Updating buffer 2");
buffer2[diff] = '!';
break;
}
getchar(); // Eat enter keypress
printf("%s,%s\n", buffer1, buffer2);
}

You are seeing an improvement to the /GS mechanism, first added to VS2012. Originally /GS could detect buffer overflows but there's still a loop-hole where attacking code can stomp the stack but bypass the cookie. Roughly like this:
void foo(int index, char value) {
char buf[256];
buf[index] = value;
}
If the attacker can manipulate the value of index then the cookie doesn't help. This code is now rewritten to:
void foo(int index, char value) {
char buf[256];
buf[index] = value;
if (index >= 256) __report_rangefailure();
}
Just plain index checking. Which, when triggered, instantly terminates the app with __fastfail() if no debugger is attached. Backgrounder is here.

From the MSDN page on /GS in Visual Studio 2013 :
Security Checks
On functions that the compiler recognizes as subject to buffer overrun problems, the compiler allocates space on the stack before the return address. On function entry, the allocated space is loaded with a security cookie that is computed once at module load. On function exit, and during frame unwinding on 64-bit operating systems, a helper function is called to make sure that the value of the cookie is still the same. A different value indicates that an overwrite of the stack may have occurred. If a different value is detected, the process is terminated.
for more details, the same page refers to Compiler Security Checks In Depth:
What /GS Does
The /GS switch provides a "speed bump," or cookie, between the buffer and the return address. If an overflow writes over the return address, it will have to overwrite the cookie put in between it and the buffer, resulting in a new stack layout:
Function parameters
Function return address
Frame pointer
Cookie
Exception Handler frame
Locally declared variables and buffers
Callee save registers
The cookie will be examined in more detail later. The function's execution does change with these security checks. First, when a function is called, the first instructions to execute are in the function’s prolog. At a minimum, a prolog allocates space for the local variables on the stack, such as the following instruction:
sub esp, 20h
This instruction sets aside 32 bytes for use by local variables in the function. When the function is compiled with /GS, the functions prolog will set aside an additional four bytes and add three more instructions as follows:
sub esp,24h
mov eax,dword ptr [___security_cookie (408040h)]
xor eax,dword ptr [esp+24h]
mov dword ptr [esp+20h],eax
The prolog contains an instruction that fetches a copy of the cookie, followed by an instruction that does a logical xor of the cookie and the return address, and then finally an instruction that stores the cookie on the stack directly below the return address. From this point forward, the function will execute as it does normally. When a function returns, the last thing to execute is the function’s epilog, which is the opposite of the prolog. Without security checks, it will reclaim the stack space and return, such as the following instructions:
add esp,20h
ret
When compiled with /GS, the security checks are also placed in the epilog:
mov ecx,dword ptr [esp+20h]
xor ecx,dword ptr [esp+24h]
add esp,24h
jmp __security_check_cookie (4010B2h)
The stack's copy of the cookie is retrieved and then follows with the XOR instruction with the return address. The ECX register should contain a value that matches the original cookie stored in the __security_cookie variable. The stack space is then reclaimed, and then, instead of executing the RET instruction, the JMP instruction to the __security_check_cookie routine is executed.
The __security_check_cookie routine is straightforward: if the cookie was unchanged, it executes the RET instruction and ends the function call. If the cookie fails to match, the routine calls report_failure. The report_failure function then calls __security_error_handler(_SECERR_BUFFER_OVERRUN, NULL). Both functions are defined in the seccook.c file of the C run-time (CRT) source files.

Segfault occurs, GDB breaks on line, but running line after segfault shows no violation

I have a program which is segfaulting at a specific line:
uint64_t smaller_num = *(uint64_t*)(smaller_base+index);
GDB successfully catches the segfault allowing me to debug the issue. However, if I run the line from the GDB prompt, no memory access violation occurs:
(gdb) p smaller_num = *(uint64_t*)(smaller_base+index)
Can anyone provide some suggestions on how to debug this issue? I am at a loss of words and ideas because I verified that the memory at smaller_base+index exists. Can it be something with the casting?
Thanks in advance.
EDIT: providing more code but it really is that simple. I have heavily edited code to show the point of the indexing.
uint64_t ** find_difference(unsigned char * larger_base,
uint64_t size,
unsigned char * smaller_base,
uint64_t tmap_size)
{
uint64_t len = size < tmap_size?size:tmap_size;
uint64_t index=0;
while(index<len)
{
uint64_t larger_num = *(uint64_t*)(larger_base+index);
uint64_t smaller_num = *(uint64_t*)(smaller_base+index);
if(larger_num > smaller_num)
{
... do stuff
}
index++;
}
...
}
Edit#2: Now that I am thinking about it, is it poissible that the pointer dereference goes beyond len? It was my understanding that x86 numbers are stored from high to low addresses. Thus in memory, a number 0x01020304 is stored as 0x04 0x03 0x02 0x01. Is this correct? If this is not true, the deference would go beyond the end of the buffer. However, in GDB I verified the address is accessible.

I do not know how you use find_difference() function and what values of parameters passed to the function, but I suspect that your pointer arithmetic is wrong.
You are incrementing large_base and smaller_base by 1 and casting resulting address as u64*.
If size is in bytes, then you should check that large_base+index+8 < size.

Your original pointer arithmetic added index to a (unsigned char *) before casting it to uint64_t causing an unaligned memory load, i.e. trying to dereference an (uint64_t *) from an address that is not a multiple of 8. This works in the debugger but causes a SEGFAULT in your program.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js