I have read from Google that it is used for controlling buffer overruns at application level and it is called by CRT.
It also says that
" Essentially, on entry to an overrun-protected function, the cookie is put on the stack, and on exit, the value on the stack is compared against the global cookie. Any difference between them indicates that a buffer overrun has occurred and results in immediate termination of the program."
But I could not much understand how it works? Please help.
The "cookie" is basically nothing more than an arbitrary value.
So, the basic idea is that you write the chosen value on the stack before calling a function. Although it's probably not a very good value, let's arbitrarily chose 0x12345678 as the value.
Then it calls the function.
When the function returns, it goes back to the correct spot on the stack, and compares that value to 0x12345678. If the value has changed, this indicates that the function that was called wrote outside the area of the stack where it was allowed to write, so it (and that process in general) are deemed untrustworthy, and shut down.
In this case, instead of choosing 0x12345678, the system chooses a different value on a regular basis, such as every time the system is started. This means it's less likely to hit the correct value by accident -- it might happen to do so once, but if it's writing a specific value there, when the correct/chosen value changes, it'll end up writing the wrong value, and the problem will be detected.
It's probably also worth noting that this basic idea isn't particularly new. Just for example, back in the MS-DOS days, both Borland's and Microsoft's compilers would write some known value at the very bottom of the stack before calling main in your program. After main returned, they'd re-check that value. It would then print out an error message (right as the program exited) if the value didn't match what was expected.
It's exactly what the explanation says, but you can replace "cookie" with "some value". When the function is called, it puts some value on the stack. When the function returns, it checks it again to see if it changed.
The normal behavior of the function is to not touch the memory location. If the value there changed, it means that function code somehow overwrote it, and this means there was a buffer overflow.
Related
I have been reading up on HeapValidate in an existing code and trying to figure out what does it do. The documentation says that it checks whether the heap control structures are in a consistent state. What does that mean?
The heap is a data structure like any other, and in its metadata-state-variables there are certain conditions that should always be true. As a made-up example, the number of children in a heap-tree-node should always be a non-negative number; so if HeapValidate() reads a child-count variable and sees that it is negative, it knows something has gone badly wrong and can flag that block as broken.
You might wonder, assuming Microsoft’s heap code does not have any bugs, how the heap’s metadata might get in to an invalid/“impossible” state like that in the first place. Since the heap’s metadata structures live in the same address space that the user-code has access to, it’s usually the result of buggy user code writing some other data via an invalid pointer that happens to point at a memory location where the heap’s metadata field happens to reside, silently overwriting/corrupting the metadata.
I'm reading a document showing ways of safe coding named terminator canary. The terminator canary means a piece of padding into function stack frame preventing maliciously overwriting the return address on stack. The terminator canary act as a safe band, hard to compute the size or to know where is the return address for attackers, whatever, as my understanding, may not accurate.
After the doc, there's a test, in which one true-or-false question is as below:
"The source of entropy for a terminator canary can be attacked"
My question is:
I have no idea what "The source of entropy for sth." and the question means.
The canary is typically a random value placed before the return address on the stack. A buffer overrun that reaches to and overwrites the return address, would also overwrite the canary. The compiler inserts a check, right before returning from the function, that the canary is unmodified, that it still contains the original random value. If it has been modified, the check usually terminates the process (rather than jumping to a compromised return address and giving control to the attacker).
In information theory, "enthropy" is, roughly, another term for "randomness". The source of enthropy is basically the random number generator - in this context, one that is used to set up the canary. If the attacker can predict the random values produced by that generator, then it can arrange its buffer overrun to keep the canary intact, thus bypassing the safety check.
i got the following code:
char *func(char * a)
{
char b[1000];
strcpy(b,a);
return b;
}
(I know that the code is bad, because I return address of array, that will delete when I exit the function.) My question is, what will be deleted/override, if I put in "a", an array of 2000 chars, and "b" is only 1000 chars array. I read this question somewhere, and they said that by this code I can know what will override.
It seems you are not familiar with the idea of stack. When the program control enters into a function a pointer to stack is given to the program control. And all local variables are allocated on this stack. When program returns from the function then the stack pointer is changed to it's original value. Therefore b is on stack and 1000 bytes are allocated for it. And when program returns from func then in fact nothing will be deleted or overwritten until some other function uses that area of stack. You can try accessing 'b' just after you come out from the function and it must work. But suppose after calling 'func' you call another function 'func1' which has some local variables then updating those variables will overwrite the content where b is pointing
what will be deleted/override, if I put in "a", an array of 2000 chars, and "b" is only 1000 chars array.
The behaviour is undefined. From the standard's perspective, nothing is guaranteed. Some memory could be overwritten, or it might not be. In practice, it's likely that some memory would be overwritten.
correct me if I wrong, that this overflow, will rewrite the value of the pointer. right?
It could. It might not. The behaviour is undefined.
What will happen depends on the compiler, the version of the compiler, the cpu architecture, the compilation options, how the rest of the program is defined and possibly other factors.
In a typical implementation, the return value fits in a register and is a compile time constant so in practice it is unlikely to be stored on the stack in which case it would not be affected by the overflow. There are much more dangerous potential side-effects, such as the function returning in a completely different place than where it was called from.
If the C-string in a that you pass is bigger than the space you allocated for b (1000 bytes) it will happily write past the end of b and down through whatever is on the stack below b. A C-string has no defined length. strcpy(b,a) will keep copying bytes into b until it finds a \0 inside a. On your function func's stack, the compiler will reserve 1000 bytes for b then save the return address to whatever called func. If a overwrites b you'll write over the return address and when func returns you'll jump to some random address with horrible results. Each compiler is free to put the return address from the function wherever it likes. Maybe it puts the return address at the top of its stack. But even in that scenario you'll be writing over stuff that you should not be writing over. If you're lucky you'll get an access violation.
To protect against this you can use strncpy(b,a,sizeof(b)-1) and put a \0 at the end of b. Best to check strlen(a) and handle the error sanely if strlen(a) > sizeof(b)-1.
This is exactly the "buffer overrun" technique that hackers used to break Windows security restrictions. You call a Windows function that expects a C-string but does not check the length of the C-string that is passed in. The string that is passed in guesses the length of the input buffer and eventually it guesses correctly and overwrites the return address of the function to point into the string that was passed in. The remainder of the input string contains machine code instructions that then operate under the security permissions of the Windows function. It can then do whatever it wants.
Microsoft closed this security loophole long ago, but it remains as a good lesson to check the length of C-strings that you accept as input parameters.
I recently used the /FAsu Visual C++ compiler option to output the source + assembly of a particularly long member function definition. In the assembly output, after the stack frame is set up, there is a single call to a mysterious _chkstk() function.
The MSDN page on _chkstk() does not explain the reason why this function is called. I have also seen the Stack Overflow question Allocating a buffer of more a page size on stack will corrupt memory?, but I do not understand what the OP and the accepted answer are talking about.
What is the purpose of the _chkstk() CRT function? What does it do?
Windows pages in extra stack for your thread as it is used. At the end of the stack, there is one guard page mapped as inaccessible memory -- if the program accesses it (because it is trying to use more stack than is currently mapped), there's an access violation. The OS catches the fault, maps in another page of stack at the same address as the old guard page, creates a new guard page just beyond the old one, and resumes from the instruction that caused the violation.
If a function has more than one page of local variables, then the first address it accesses might be more than one page beyond the current end of the stack. Hence it would miss the guard page and trigger an access violation that the OS doesn't realise is because more stack is needed. If the total stack required is particularly huge, it could perhaps even reach beyond the guard page, beyond the end of the virtual address space assigned to stack, and into memory that's actually in use for something else.
So, _chkstk ensures that there is enough space for the local variables. You can imagine that it does this by touching the memory for the local variables at page-sized intervals, in increasing order, to ensure that it doesn't miss the guard page (so-called "stack probes"). I don't know whether it actually does that, though, possibly it takes a more direct route and instructs the OS to map in a certain amount of stack. Either way, if the total required is greater than the virtual address space available for stack, then the OS can complain about it instead of doing something undefined.
I looked at the code for __chkstk and it does do the repeated stack probes at one-page intervals. So this way, it doesn't need to make any calls to the OS. The parameter in rax is size of data you want to add. It ensures that the target address (current rsp - rax) is accessible. If rax > rsp, it does this for address 0. As an interesting shortcut, it first compares the address with gs:[10h], which is the current lowest page that is mapped; if the target address >= this, then it does nothing.
By the way, for 64-bit code at least, it is spelled with two underscores: __chkstk__.
I'm currently making one of those game trainers as a small project. I've already ran into a problem; when you "go into a different level", the addresses for things such as fuel, cash, bullets, their addresses change. This would also happen say, if you were to restart the application.
How can I re-locate these addresses?
I feel like it's a fairly basic question, but it's one of those "it is or is not possible" questions to me. Should I just stop looking and forget the concept entirely? "Too hard?"
It's a bit hard to describe exactly how to do this since it heavily dependents on the program you're studying and whether the author went out if his way to make your life difficult. Note that I've only done this once but it worked reasonably well even if I only knew a little assembly.
What is probably happening is that the values are allocated on the heap using a call to malloc/new and everytime you change level they are cleaned up and re-allocated somewhere else. So the idea is to look at the assembly code of the program to find where the pointer returned by malloc is stored and figure out a way to reliably read the content of the pointer and find the value you're looking for.
First thing you'll want is a debugger like OllyDbg and a basic knowledge of assembly. After that, start by setting a read and write breakpoint on the variable you want to examine. Since you said that you can't tell exactly where the variable is, you'll have to pause the process while it's running and search the program's memory for the value. Hopefully you'll end up with only a few results to sift through but be suspicious of anything that is on the stack since it might just be a copy for a function call or for local use.
Once the breakpoint is set just run the program until a break occurs. Now all you have to do is look at the code and examine how the variable is being accessed. If it's being passed as a parameter, go examine the call site of the function. If it's being accessed through a pointer, make a note of it and start examining the pointer. If it's being accessed as an offset of a pointer, that means it's part of a data structure so make a note of it and start examining the other variable. And so on.
Stay focused on your variable and just keep examining the code until you eventually find the root which can be one of two things:
A global variable that has a static address. This is the easiest scenario since you have a static address hardcoded straight into the code that you can use to reliably walk through the data structures.
A stack allocated variable. This is trickier and I'm not entirely sure how to deal with this scenario reliably. It's possible that its address will have the same offset from the beginning of the stack most of the time but it might not. You could also walk the stack to find the corresponding function and its parameters but this a bit tricky to get right.
Once you have an address all that's left to do is use ReadProcessMemory to locate your variable using the information you found. For example, if the address you have represents a pointer to a data structure where at offset 0x40 your fuel value is stored, then you'll have to read the value at the address, add 0x40 to it and do another read on the result.
Note that the address is only valid as long as the executable doesn't change in any way. If it's recompiled or patched then you have to start over. I believe you'll also have to be careful about Windows' ASLR which might change the address around every time you start the program.
Comment box was too small to fit this so I'll put it here.
If it's esp plus a constant then I believe that this is a parameter and not a local variable (do confirm by checking the layout of the calling convention). If that's the case, then you should step the program until it returns to its caller, figure out how the parameter is being set (look for push instructions before the call instruction) and continue exploring from there. When I did this I had to unwind the stack once or twice before I found the global pointer to the data structure.
Also the esi register is not related to the stack (I had to look it up) so I'd check how it's being set. It could be that it contains the address of the data structure and the constant is the offset to the variable. If you figure out how the register is set you'll be that much closer to the pointer.