I have been reading up on HeapValidate in an existing code and trying to figure out what does it do. The documentation says that it checks whether the heap control structures are in a consistent state. What does that mean?
The heap is a data structure like any other, and in its metadata-state-variables there are certain conditions that should always be true. As a made-up example, the number of children in a heap-tree-node should always be a non-negative number; so if HeapValidate() reads a child-count variable and sees that it is negative, it knows something has gone badly wrong and can flag that block as broken.
You might wonder, assuming Microsoft’s heap code does not have any bugs, how the heap’s metadata might get in to an invalid/“impossible” state like that in the first place. Since the heap’s metadata structures live in the same address space that the user-code has access to, it’s usually the result of buggy user code writing some other data via an invalid pointer that happens to point at a memory location where the heap’s metadata field happens to reside, silently overwriting/corrupting the metadata.
Related
Would it be at all possible (I don't care about practicality or usefulness) to write a C or C++ program that monitored memory usage in the following, very basic way?
Given that declaring a variable without assigning it a value results in it having the value of whatever is already at its memory location, one could create a large array (thousands or millions of elements) and leave all the values unassigned. Then to see if any of these elements have been overwritten, we would simply need to repeatedly compare their current values to a previous value.
I highly doubt this would be as simple as I posited above. Assuming my doubt is well-founded, wherein would the problem lie and, more importantly, would it be something we could circumvent with some creative or esoteric code? I imagine that the problem would be attributable to something along the lines of the declared, uninitialized elements being not allowing other system processes to write to their memory address. Please give me some pointers! (heehee) Thanks.
Lets say your program is in C
Creating a large array is limited to the extent free memory is allowed and how the OS limits you.
So let's say you created a pretty large array (uninitialized).
Now that memory is given to your process(program you ran) and no other process can access it ! (It's OS role to avoid such things , basic requirements of Virtualization).
So as no other process can access its value won't be changed once its allocated to you.
Stash library in "Thinking in C++" by Bruce Eckel:
Basically he seems to be setting up an array-index-addressable interface (via fetch) to a set of entities that are actually stored at random memory locations, without actually copying those data entities, in order to simulate for the user the existence of an actual contiguous-memory data block. In short, a contiguous, indexed address map. Do I have this right? Also, his mapping is on a byte-by-byte basis; if it were not for this requirement (and I am unsure of its importance), I believe that there may be simpler ways to generate such a data structure in C++. I looked into memcpy, but do not see how to actually copy data on a byte-by-byte basis to create such an indexed structure.
Prior posting:
This library appears to create a pointer assemblage, not a data-storage assemblage.
Is this true? (Applies to both the C and C++ versions.) Thus the name "stash" might be a little misleading, as nothing but pointers to data stashed elsewhere is put into a "stash," and Eckel states that "the data is copied."
Background: Looking at “add” and “inflate,” the so-called “copying” is equating pointers to other pointers (“storage” to “e” in “add” and “b” to “storage” in “inflate”). The use of “new” in this case is strange to me, because storage for data is indeed allocated but “b” is set to the address of the data, and no data assignments seem to take place in the entire library. So I am not sure what the point of the “allocation” by “new” is when the allocated space is apparently never written into or read from in the library. The “element” to be added exists elsewhere in memory already, and seemingly all we are doing is creating a sequential pointer structure to each byte of every “element” desired to be reference-able through CStash. Do I understand this library correctly?
Similarly, it looks as though “stack” in the section “Nested structures” appears actually to work only with addresses of data, not with data. I wrote my own linked-list stack successfully, which actually stores data in the stack nodes.
I am assigning std::ostream *pout_ = output_.get();
where output_ is scoped_ptr<std::stringstream> output_;
after assigning i am filling *pout_ <<"large strings"; appx 1172528 characters.
But after some extent i am not able to insert charcters inside *pout_. I tried surfing on the net whats the maximum size of this class but couldn't find.
Someone please tell how much characters maximum i can store in *pout_. Is there any function which can tell me the maximum size of this class??
There are several possible causes for output to an ostream to fail.
The most obvious is that the underlying supporting media (memory, in the
case of an std::ostringstream) is full. Another is that you've
reached some internal limit: a lot of systems have (or had) file size
limits which would hit you long before the disk was full, and some
systems have had similar constraints on single objects in memory (and
the std::stringbuf class typically keeps its data in a single object).
(There's also the possiblity of a hardware error, but if this occurs
with std::stringbuf, i.e. a memory error, the hardware probably won't
detect it.)
All of these mean that there is no hard limit as to how much you can
write to a stream, string or otherwise. It all depends, and one time,
you might succeed writing 2 GB, and the next fail after 1 MB or less.
Practically speaking, in most cases, you should be aware of the fact
that writes can fail, test the results (after a final flush) and be
prepared to do something reasonable if they do.
In the specific case of string streams, of course, about the only
failure you're likely to be able to detect is out of memory. (An
ostream does not propagate an exception from the streambuf; it
sets the badbit when one occurs, and if exceptions have been activated
for badbit, it will throw its own exception, not rethrow the original
one. Which means that an std::bad_alloc will not propagate out.)
A lot of applications don't handle out of memory, and should logically
have set the new handler to abort. If you've set the new handler to
abort, then you can probably forego such error checking on string
outputs.
The answer depends on your compiler, platform etc. It also depends on the manner in which you write stuff to the stringstream (since it may need to reallocate memory for its buffer, potentially causing memory fragmentation and therefore under-utilisation).
The only practical way to get an idea of the limit is by conducting realistic experiments in your target environment. Even then, you should only use the results of such experiments as a rough guide.
A stringstream stores the data in a memory buffer. It will grow the buffer as needed, as long as it can.
You can continue to write to it until you run out of memory. There is not fixed limit.
It depends on a huge number of factors, such as
what load addresses are chosen for the shared objects you used
how much memory the rest of your program uses
how badly fragmented your address space has become
That's why you couldn't find any documented limit on the web.
I'm currently making one of those game trainers as a small project. I've already ran into a problem; when you "go into a different level", the addresses for things such as fuel, cash, bullets, their addresses change. This would also happen say, if you were to restart the application.
How can I re-locate these addresses?
I feel like it's a fairly basic question, but it's one of those "it is or is not possible" questions to me. Should I just stop looking and forget the concept entirely? "Too hard?"
It's a bit hard to describe exactly how to do this since it heavily dependents on the program you're studying and whether the author went out if his way to make your life difficult. Note that I've only done this once but it worked reasonably well even if I only knew a little assembly.
What is probably happening is that the values are allocated on the heap using a call to malloc/new and everytime you change level they are cleaned up and re-allocated somewhere else. So the idea is to look at the assembly code of the program to find where the pointer returned by malloc is stored and figure out a way to reliably read the content of the pointer and find the value you're looking for.
First thing you'll want is a debugger like OllyDbg and a basic knowledge of assembly. After that, start by setting a read and write breakpoint on the variable you want to examine. Since you said that you can't tell exactly where the variable is, you'll have to pause the process while it's running and search the program's memory for the value. Hopefully you'll end up with only a few results to sift through but be suspicious of anything that is on the stack since it might just be a copy for a function call or for local use.
Once the breakpoint is set just run the program until a break occurs. Now all you have to do is look at the code and examine how the variable is being accessed. If it's being passed as a parameter, go examine the call site of the function. If it's being accessed through a pointer, make a note of it and start examining the pointer. If it's being accessed as an offset of a pointer, that means it's part of a data structure so make a note of it and start examining the other variable. And so on.
Stay focused on your variable and just keep examining the code until you eventually find the root which can be one of two things:
A global variable that has a static address. This is the easiest scenario since you have a static address hardcoded straight into the code that you can use to reliably walk through the data structures.
A stack allocated variable. This is trickier and I'm not entirely sure how to deal with this scenario reliably. It's possible that its address will have the same offset from the beginning of the stack most of the time but it might not. You could also walk the stack to find the corresponding function and its parameters but this a bit tricky to get right.
Once you have an address all that's left to do is use ReadProcessMemory to locate your variable using the information you found. For example, if the address you have represents a pointer to a data structure where at offset 0x40 your fuel value is stored, then you'll have to read the value at the address, add 0x40 to it and do another read on the result.
Note that the address is only valid as long as the executable doesn't change in any way. If it's recompiled or patched then you have to start over. I believe you'll also have to be careful about Windows' ASLR which might change the address around every time you start the program.
Comment box was too small to fit this so I'll put it here.
If it's esp plus a constant then I believe that this is a parameter and not a local variable (do confirm by checking the layout of the calling convention). If that's the case, then you should step the program until it returns to its caller, figure out how the parameter is being set (look for push instructions before the call instruction) and continue exploring from there. When I did this I had to unwind the stack once or twice before I found the global pointer to the data structure.
Also the esi register is not related to the stack (I had to look it up) so I'd check how it's being set. It could be that it contains the address of the data structure and the constant is the offset to the variable. If you figure out how the register is set you'll be that much closer to the pointer.
I just came to know that there are data breakpoints. I have worked for the last 5 years in C++ using Visual Studio, and I have never used data breakpoints.
Can someone throw some light on what data breakpoints are, when to use them and how to use them with VS?
As per my understanding we can set a data breakpoint when we want to check for changes to a variable's value. In this case, we can set a data breakpoint with a condition on the variable value.
Any other examples?
Good ol' Daniel LeCheminant has a solid answer on what a data breakpoint does, so i'll toss in some anecdotes that highlight useful uses:
Any scenario where you know what will change, but have little or no idea where the code changing it lives (since otherwise you could simply use a conditional breakpoint). Specifically,
"Impossible" scenarios - program is crashing, because variable X is NULL, when variable X should never be NULL because no code anywhere ever sets variable X to NULL. Put a normal breakpoint in the code that initializes X, and when it is hit, set up a data breakpoint to watch for the change to NULL. Somewhat more common is the case where memory is released too early, and there are still pointers to it hanging around: use data breakpoints to find out who's releasing the memory.
Tedious scenarios - a 3rd-party library is doing bad, nasty, horrible things to your data structures. You know it's happening, because someone is trashing your data and obviously your code is perfect. But you don't know where, or when. Sure, you could single-step through a megabyte of disassembled DLL... but why bother, when you can set a data breakpoint on your data, sit back, and wait for it to get trashed!
Heisenbugs - similar to the impossible scenario, but they go away when you watch too closely, such that normal breakpoints - even conditional breakpoints - are useless. Timing and user-input sensitive logic is particularly vulnerable to this sort of thing. Since data breakpoints don't require the debugger to actually break at all until the time is right, assuming you can come up with a memory location that will only change when that elusive bug actually occurs you can use data breakpoints to set a trap for the Heisenbug and catch it in flagrante delicto.
Spaghetti scenarios - common in old, rotten code bases where global data is accessed everywhere. Yeah, you could use plain ol' conditional breakpoints... but you'd need hundreds of them. Data breakpoints make it easy.
Definition:
Data breakpoints allow you to break
execution when the value stored at a
specified memory location changes.
From MSDN: How to: Set a Data Breakpoint:
How to Set a Memory Change Breakpoint
From the Debug Menu, choose New Breakpoint and click New Data Breakpoint
—or—
in the Breakpoints window Menu, click the New dropdown and choose New Data Breakpoint.
The New Breakpoint dialog box appears.
In the Address box, enter a memory address or expression that evaluates to a memory address. For example, &foo to break when the contents of variable foo change.
In the Byte Count box, enter the number of bytes you want the debugger to watch. For example, if you enter 4, the debugger will watch the four bytes starting at &foo and break if any of those bytes change value.
Click OK.
So far we've got a great definition and a bunch of great theoretical explanations.
Let's have a concrete example!
I'm currently working on a rather large and convoluted codebase. I made a small safe change to one bit of code and started getting - in a completely unrelated chunk of the codebase - crashes in the memory allocator. This is generally a sign that you're doing something Very Wrong with memory management - either double-deletion or writing out-of-bounds.
Thankfully, we have an option to turn on a debug memory manager that checks for things like this. I turned it on and it immediately started reporting a memory block guard violation, which means that something wrote out of bounds. The problem is that this report shows up only once the memory is deallocated - essentially saying "hey, something was broken. Hope you can figure out what!"
Unfortunately this particular chunk of memory, at the point of deallocation, is completely indistinguishable from literally thousands of other chunks of memory. Fortunately, our debug framework tags each allocation with a consecutive ID, and the memory that got corrupted had a consistent ID (#9667, if you're curious.) One quick breakpoint in the memory manager later and I was able to find where that memory was allocated. Which, as it turned out, wasn't immediately helpful either.
But at that point, I had several important components:
I knew the address of a block of memory
I knew the intended length of that memory
I knew that, at some point in the future, a specific byte past the intended length of that memory would be overwritten
Given this, I could set up a data breakpoint on that specific byte, then hit "go" and find out where the corruption occured.
Which I did - it led to an off-by-one error which I am now in the process of fixing.
And that's a concrete example of how data breakpoints can be useful. :)
I believe data breakpoints are breakpoints which will occur when some memory is set to a certain value. For example, you can set a breakpoint when i == 10 in a typical for loop to stop after the 10th iteration. You can also watch for changes to variables on the heap, like wait for a member of a class to be modified.