What are data breakpoints? - c++

I just came to know that there are data breakpoints. I have worked for the last 5 years in C++ using Visual Studio, and I have never used data breakpoints.
Can someone throw some light on what data breakpoints are, when to use them and how to use them with VS?
As per my understanding we can set a data breakpoint when we want to check for changes to a variable's value. In this case, we can set a data breakpoint with a condition on the variable value.
Any other examples?

Good ol' Daniel LeCheminant has a solid answer on what a data breakpoint does, so i'll toss in some anecdotes that highlight useful uses:
Any scenario where you know what will change, but have little or no idea where the code changing it lives (since otherwise you could simply use a conditional breakpoint). Specifically,
"Impossible" scenarios - program is crashing, because variable X is NULL, when variable X should never be NULL because no code anywhere ever sets variable X to NULL. Put a normal breakpoint in the code that initializes X, and when it is hit, set up a data breakpoint to watch for the change to NULL. Somewhat more common is the case where memory is released too early, and there are still pointers to it hanging around: use data breakpoints to find out who's releasing the memory.
Tedious scenarios - a 3rd-party library is doing bad, nasty, horrible things to your data structures. You know it's happening, because someone is trashing your data and obviously your code is perfect. But you don't know where, or when. Sure, you could single-step through a megabyte of disassembled DLL... but why bother, when you can set a data breakpoint on your data, sit back, and wait for it to get trashed!
Heisenbugs - similar to the impossible scenario, but they go away when you watch too closely, such that normal breakpoints - even conditional breakpoints - are useless. Timing and user-input sensitive logic is particularly vulnerable to this sort of thing. Since data breakpoints don't require the debugger to actually break at all until the time is right, assuming you can come up with a memory location that will only change when that elusive bug actually occurs you can use data breakpoints to set a trap for the Heisenbug and catch it in flagrante delicto.
Spaghetti scenarios - common in old, rotten code bases where global data is accessed everywhere. Yeah, you could use plain ol' conditional breakpoints... but you'd need hundreds of them. Data breakpoints make it easy.

Definition:
Data breakpoints allow you to break
execution when the value stored at a
specified memory location changes.
From MSDN: How to: Set a Data Breakpoint:
How to Set a Memory Change Breakpoint
From the Debug Menu, choose New Breakpoint and click New Data Breakpoint
—or—
in the Breakpoints window Menu, click the New dropdown and choose New Data Breakpoint.
The New Breakpoint dialog box appears.
In the Address box, enter a memory address or expression that evaluates to a memory address. For example, &foo to break when the contents of variable foo change.
In the Byte Count box, enter the number of bytes you want the debugger to watch. For example, if you enter 4, the debugger will watch the four bytes starting at &foo and break if any of those bytes change value.
Click OK.

So far we've got a great definition and a bunch of great theoretical explanations.
Let's have a concrete example!
I'm currently working on a rather large and convoluted codebase. I made a small safe change to one bit of code and started getting - in a completely unrelated chunk of the codebase - crashes in the memory allocator. This is generally a sign that you're doing something Very Wrong with memory management - either double-deletion or writing out-of-bounds.
Thankfully, we have an option to turn on a debug memory manager that checks for things like this. I turned it on and it immediately started reporting a memory block guard violation, which means that something wrote out of bounds. The problem is that this report shows up only once the memory is deallocated - essentially saying "hey, something was broken. Hope you can figure out what!"
Unfortunately this particular chunk of memory, at the point of deallocation, is completely indistinguishable from literally thousands of other chunks of memory. Fortunately, our debug framework tags each allocation with a consecutive ID, and the memory that got corrupted had a consistent ID (#9667, if you're curious.) One quick breakpoint in the memory manager later and I was able to find where that memory was allocated. Which, as it turned out, wasn't immediately helpful either.
But at that point, I had several important components:
I knew the address of a block of memory
I knew the intended length of that memory
I knew that, at some point in the future, a specific byte past the intended length of that memory would be overwritten
Given this, I could set up a data breakpoint on that specific byte, then hit "go" and find out where the corruption occured.
Which I did - it led to an off-by-one error which I am now in the process of fixing.
And that's a concrete example of how data breakpoints can be useful. :)

I believe data breakpoints are breakpoints which will occur when some memory is set to a certain value. For example, you can set a breakpoint when i == 10 in a typical for loop to stop after the 10th iteration. You can also watch for changes to variables on the heap, like wait for a member of a class to be modified.

Related

What does HeapValidate in windows do?

I have been reading up on HeapValidate in an existing code and trying to figure out what does it do. The documentation says that it checks whether the heap control structures are in a consistent state. What does that mean?
The heap is a data structure like any other, and in its metadata-state-variables there are certain conditions that should always be true. As a made-up example, the number of children in a heap-tree-node should always be a non-negative number; so if HeapValidate() reads a child-count variable and sees that it is negative, it knows something has gone badly wrong and can flag that block as broken.
You might wonder, assuming Microsoft’s heap code does not have any bugs, how the heap’s metadata might get in to an invalid/“impossible” state like that in the first place. Since the heap’s metadata structures live in the same address space that the user-code has access to, it’s usually the result of buggy user code writing some other data via an invalid pointer that happens to point at a memory location where the heap’s metadata field happens to reside, silently overwriting/corrupting the metadata.

Special Pointer Value 0xFEEEFEF6

I came across an access violation while reading location 0xFEEEFEF6 in Visual Studio 2012 (using Nov 2012 CTP compiler) somewhere deep inside the C++11 concurrency implementation.
Does this value have any special meaning? Looking in Wikipedia I found similar entries (0xFEEEFEEE and 0xFEEDFACE).
0xFEEEFEF6 itself doesn't have a special meaning, but it's likely based on one of the "guard bytes" segments that MSVC likes to put around heap allocations. As Jan Dvorak noted, it's probably 8 bytes past the end of something, quite likely 2 pointers past the end of an array.
The concept is that memory you're likely to accidentally access, but shouldn't, is marked with very obvious patterns. Some of the most common are 0xCDCDCDCD and 0xFDFDFDFD, although 0xDDDDDDDD and 0xFEEEFEEE are also easy to run into. Classic compilers (not sure if any still use it) liked 0xDEADBEEF. Here's a pretty good write-up of the cases and positions you'll see guard bytes in.
The two most common causes of seeing these in a segfault (access violation) are typically accessing memory that's already been freed and over-running your bounds, especially in an array of pointers. Most of the values used for guard data aren't valid if they were to show up in the app otherwise (you're not going to get a block of memory at 0x00000000 or 0xCDCDCDCD, those are well outside of the virtual address space your heap lives in). Knowing the common ones off the top of your head can save a lot of time debugging.
Note that, with very few/no exceptions, these guard bytes only show up in debug builds. Writing memory with special patterns everytime it's allocated/deallocated (in fact, writing significantly more memory than has been allocated, since most of the guard patterns occur on the boundaries of allocated chunks) is fairly expensive and shouldn't be done at runtime. If you have issues like this in your debug builds, you're likely to get seemingly random (undefined) addresses from a release build. You may also get unlucky enough that a legitimate address ends up being grabbed by mistake, which can lead to all sorts of heap corruption.
Because guard bytes don't show up in release builds, you can't check for them as you might NULL and use that as a condition in your code. Instead, smart pointers and containers can help you manage memory correctly and avoid bad access in the first place. While occasionally annoying, smart pointers very much help avoid issues like this. Note that some types of access violation, particularly buffer overruns, are considered an entire class of security vulnerability because of how often they appear.
Without a VM/runtime forcing you to stay within certain memory constraints, if can be very easy to access memory you ought not. For example:
int values[10];
int output = 0;
int length = 10;
while (int i <= length) {
output += values[++length];
}
The prefix-increment will cause you to run off the end of the array and access values[10], an invalid index. Sometimes that may immediately cause an access violation and halt your program, other times it may be memory you're allowed to access and the value will be added to output, which could cause an overflow on output and unexpected behavior through the rest of the app.
Guard bytes exist so that your segfaults or increments will, while debugging, have repeatable values and be as obvious as possible.

Find all members of all classes/structs N bytes offset from the start of the class/struct?

Ok this is kind of a complicated problem, I've been trying to track down a bug in a fairly large codebase for the past 4 months, that only happens on a platform where valgrind isn't available.
What's happening is that is a single byte, 0x01, is getting written in a weird spot (random when it happens, but it always seems to be written in a small collection of possible spots, regardless of debug/release or which compiler is used). I found out that the error byte is always 80 bytes away from the start of the object it corrupts.
Anyway, is there any tool or trick or plugin for visual studio that can scan the entire codebase and list all members that are 80 bytes offset from the start of their class?
If it's always in a consistent place, i.e. in a particular instance of a struct, you could put a break point once that instance has been initialised, then set a break point on the particular address which fires when the memory changes (I forgot the actual name for those break points in VS).
This is a really handy technique for finding that elusive screwy operation that writes to a location it shouldn't!

Interprocess Memory Editing - Finding changed addresses

I'm currently making one of those game trainers as a small project. I've already ran into a problem; when you "go into a different level", the addresses for things such as fuel, cash, bullets, their addresses change. This would also happen say, if you were to restart the application.
How can I re-locate these addresses?
I feel like it's a fairly basic question, but it's one of those "it is or is not possible" questions to me. Should I just stop looking and forget the concept entirely? "Too hard?"
It's a bit hard to describe exactly how to do this since it heavily dependents on the program you're studying and whether the author went out if his way to make your life difficult. Note that I've only done this once but it worked reasonably well even if I only knew a little assembly.
What is probably happening is that the values are allocated on the heap using a call to malloc/new and everytime you change level they are cleaned up and re-allocated somewhere else. So the idea is to look at the assembly code of the program to find where the pointer returned by malloc is stored and figure out a way to reliably read the content of the pointer and find the value you're looking for.
First thing you'll want is a debugger like OllyDbg and a basic knowledge of assembly. After that, start by setting a read and write breakpoint on the variable you want to examine. Since you said that you can't tell exactly where the variable is, you'll have to pause the process while it's running and search the program's memory for the value. Hopefully you'll end up with only a few results to sift through but be suspicious of anything that is on the stack since it might just be a copy for a function call or for local use.
Once the breakpoint is set just run the program until a break occurs. Now all you have to do is look at the code and examine how the variable is being accessed. If it's being passed as a parameter, go examine the call site of the function. If it's being accessed through a pointer, make a note of it and start examining the pointer. If it's being accessed as an offset of a pointer, that means it's part of a data structure so make a note of it and start examining the other variable. And so on.
Stay focused on your variable and just keep examining the code until you eventually find the root which can be one of two things:
A global variable that has a static address. This is the easiest scenario since you have a static address hardcoded straight into the code that you can use to reliably walk through the data structures.
A stack allocated variable. This is trickier and I'm not entirely sure how to deal with this scenario reliably. It's possible that its address will have the same offset from the beginning of the stack most of the time but it might not. You could also walk the stack to find the corresponding function and its parameters but this a bit tricky to get right.
Once you have an address all that's left to do is use ReadProcessMemory to locate your variable using the information you found. For example, if the address you have represents a pointer to a data structure where at offset 0x40 your fuel value is stored, then you'll have to read the value at the address, add 0x40 to it and do another read on the result.
Note that the address is only valid as long as the executable doesn't change in any way. If it's recompiled or patched then you have to start over. I believe you'll also have to be careful about Windows' ASLR which might change the address around every time you start the program.
Comment box was too small to fit this so I'll put it here.
If it's esp plus a constant then I believe that this is a parameter and not a local variable (do confirm by checking the layout of the calling convention). If that's the case, then you should step the program until it returns to its caller, figure out how the parameter is being set (look for push instructions before the call instruction) and continue exploring from there. When I did this I had to unwind the stack once or twice before I found the global pointer to the data structure.
Also the esi register is not related to the stack (I had to look it up) so I'd check how it's being set. It could be that it contains the address of the data structure and the constant is the offset to the variable. If you figure out how the register is set you'll be that much closer to the pointer.

Reading off the end of an array: running in terminal vs. debugger

I encountered an error in my code where an if() statement was checking a value off the end of an array. IE,
int arrayX [2];
if(arrayX [2])
FunctionCall();
This was leading to a function call that, for reasons related to the length of the above array, tried to subscript a vector with an out-of-bounds index, casuing the error. However, the error only occurred when running under the Xcode debugger; whenever I ran under terminal it didn't happen. This leads me to suspect that when I run under terminal, memory outside the array is being zeroed or tends to be zero for some other reason. The if statement gets tested for 80 different 'faulty' arrays per cycle so it seems unlikely that its a coincidence that it never pops up under terminal.
Just to be clear, my question is: why would unallocated or unrelated memory hold zeroes when run under terminal but not when run under a debugger.
Many debuggers fill unused memory with some distinct pattern, so that exactly the behaviour you describe happens.
What exactly is the question?
Whatever the question, the answer is likely... The program generator can do that if it wants to. The behavior of the sample code is undefined so the resulting program's behavior is wholly unpredictable.
You can't really tell what data is outside the array. Shall there be any debugger that zeroes that part of memory, it may be the Xcode debugger, not the terminal. So it's very strange for me that you had no problems in terminal!!
You said "The if statement gets tested for 80 different 'faulty' arrays per cycle ", consider this: are you sure those "different" faulty arrays reside on "different" areas of ram actually (If it's static data compiler may put it in once place of ram and re-use it)? And, the compiler ( / interpreter) may optimize your code and also take care of memory.