How to Drill Down on Apparent Corruption - c++

I've been working with C and C++ for a fairly long time. I have a computer science minor. I'm familiar with the pitfalls intrinsic to the low level access to process memory these languages provide. I've spent days and weeks in them.
Learning to use valgrind about a decade ago was a lifesaver in terms of catching minor access errors and such. Currently, I also use ASAN with clion, and mistakes of this sort are usually caught and dealt with quickly.
I presume there's no bulletproof, however, and a recent problem has me completely stumped.1
I have an object that includes a non-public sockaddr_storage field named from. This can be accessed via:
const sockaddr_storage* getSockAddr () {
return &from;
}
But the address returned is wrong. Starting from a breakpoint on the return line in gdb:
Breakpoint 3, socketeering::Socket::getSockAddr (this=0x617000000400) at Socket.hpp:81
81 return &from;
(gdb) p this
$1 = (socketeering::UDPsocket * const) 0x617000000400
(gdb) p &from
$2 = (sockaddr_storage *) 0x617000000600
(gdb) p (const sockaddr_storage*)&from
$3 = (const sockaddr_storage *) 0x617000000600
Seems pretty clear the value returned has to be 0x617000000600. But no:
(gdb) fin
Run till exit from #0 socketeering::Socket::getSockAddr (this=0x617000000400) at Socket.hpp:81
0x00000000004290ab in udpHandler::dataReady (this=0x631000014810, iod=0x617000000400, con=0x60e0000249b0) at /opt/cogware/C++/Socketeering2/demo/echo_server.cpp:66
66 auto sa = sock->getSockAddr();
Value returned is $4 = (const sockaddr_storage *) 0x617000000618
^^
(gdb) p sock
$5 = (socketeering::UDPsocket *) 0x617000000400
That's no good -- it is 18 bytes inside the structure. Even worse, I CANNOT reproduce it with a simple SSCCE:
class foo {
sockaddr_storage ss;
public:
foo () { cout << &ss << "\n"; }
const sockaddr_storage* getSockAddr () { return &ss; }
};
Meaning it's not some misunderstanding of the rules, etc. It's obviously not a logic error either.
It has to be corruption, right?
This is a single threaded process, and if instead of fin I just keep stepping to see what's happening, there is literally nothing to see. One step to the function close, and the next one is at the assignment with the wrong value. Neither valgrind nor ASAN indicate any hijinx.
What can I look at to find out what is happening? Obviously something is going wrong here in between:
return &from;
And the actual return of a value. Is looking at assembly dumps for clues the only route left to me (presuming that would help at all, I'm no ASM guy)?
The answer I dread is there's nothing beyond scouring the code for mistakes that valgrind and ASAN didn't catch. Finding out under what circumstances they would not catch corruption is a starting place for that.
Which I did raise earlier in a now deleted question. All any one could say was exactly what I would say if I read a question like that: We need an SSCCE, and the corruption could be in other parts of the code. Point being, there's nothing in the information I have to show which explains the problem, but, sans inviting everyone onto a 10-20K LOC project, that's all I can do. So what I am asking now is not what's wrong, but "How can I determine what's wrong?"

Is looking at assembly dumps for clues the only route left to me
Yes, using a disas command is the appropriate approach here.
(presuming that would help at all, I'm no ASM guy)?
Even if you can't write assembly, it's often pretty easy to read assembly. Especially if it's something like x86_64 and doesn't involve complicated bit twiddling or floating point. And it's a skill that will serve you well.
Usually the problem of this sort is a result of an ODR violation: somewhere in your program you have a different definition of socketeering::Socket, one in which the offset between this and from is 24 (it's not 18 bytes, it's 0x18 bytes!) instead of 0.
Often such ODR violation comes from using different #defines in different parts of the code, e.g.
class Socket {
#if defined(TRACING_ON)
char trace_buf[24];
#endif
sockaddr_storage from;
};
Compile above struct in one .cc file with -DTRACING_ON, compile another .cc without it, link them together into a single binary and BOOM: you may see exactly the bug you've described.
Sometimes, the problem comes from not recompiling all code (e.g. you may have an old object or a shared library laying around).
It could also come from linking together code built by different compilers, though this is rare (usually if the compilers are not ABI-compatible, they use different name mangling to preclude the program from linking).
Note: if Socket inherits from some other class, the difference may be coming from the superclass and not the Socket itself.

Related

How to return the name of a variable stored at a particular memory address in C++

first time posting here after having so many of my Google results come up from this wonderful site.
Basically, I'd like to find the name of the variable stored at a particular memory address. I have a memory editing application I wrote that edits a single value, the problem being that every time the application holding this value is patched, I have to hardcode in the new memory address into my application, and recompile, which takes so much time to upkeep that its almost not worthwhile to do.
What I'd like to do is grab the name of the variable stored at a certain memory address, that way I can then find its address at runtime and use that as the memory address to edit.
This is all being written in C++.
Thanks in advance!
Edit:
Well I've decided I'd like to stream the data from a .txt file, but I'm not sure how to convert the string into an LPVOID for use as the memory address in WriteProcessMemory(). This is what I've tried:
string fileContents;
ifstream memFile("mem_address.txt");
getline(memFile, fileContents);
memFile.close();
LPVOID memAddress = (LPVOID)fileContents.c_str();
//Lots of code..
WriteProcessMemory(WindowsProcessHandle, memAddress, &BytesToBeWrote, sizeof(BytesToBeWrote), &NumBytesWrote);
The code is all correct in terms of syntax, it compiles and runs, but the WriteProcessMemory errors and I can only imagine it has to do with my faulty LPVOID variable. I apologize if extending the use of my question is against the rules, I'll remove my edit if it is.
Compile and generate a so called map file. This can be done easily with Visual-C++ (/MAP linker option). There you'll see the symbols (functions, ...) with their starting address. Using this map file (Caution: has to be updated each time you recompile) you can match the addresses to names.
This is actually not so easy because the addresses are relative to the preferred load address, and probably will (randomization) be different from the actual load address.
Some old hints on retrieving the right address can be found here: http://home.hiwaay.net/~georgech/WhitePapers/MapFiles/MapFiles.htm
In general, the names of variables are not kept around when the program is compiled. If you are in control of the compilation process, you can usually configure the linker and compiler to produce a map-file listing the locations in memory of all global variables. However, if this is the case, you can probably acheive your goals more easily by not using direct memory accesses, but rather creating a proper command protocol that your external program can call into.
If you do not have control of the compilation process of the other program, you're probably out of luck, unless the program shipped with a map file or debugging symbols, either of which can be used to derive the names of variables from their addresses.
Note that for stack variables, deriving their names will require full debugging symbols and is a very non-trivial process. Heap variables have no names, so you will have no luck there, naturally. Further, as mentioned in #jdehaan's answer, map files can be a bit tricky to work with in the best of times. All in all, it's best to have a proper control protocol you can use to avoid any dependence on the contents of the other program's memory at all.
Finally, if you have no control over the other program, then I would recommend putting the variable location into a separate datafile. This way you would no longer need to recompile each time, and could even support multiple versions of the program being poked at. You could also have some kind of auto-update service pulling new versions of this datafile from a server of yours if you like.
Unless you actually own the application in question, there is no standard way to do this. If you do own the application, you can follow #jdehaan answer.
In any case, instead of hardcoding the memory address into your application, why not host a simple feed somewhere that you can update at any time with the memory address you need to change for each version of the target application? This way, instead of recompiling your app every time, you can just update that feed when you need to be able to manipulate a new version.
You cannot directly do this; variable names do not actually exist in the compiled binary. You might be able to do that if the program was written, in say, Java or C#, which do store information about variables in the compiled binary.
Further, this wouldn't in general be possible, because it's always possible that the most up to date copy of a value inside the target program is located inside of a CPU register rather than in memory. This is more likely if the program in question is compiled in release mode, with optimizations turned on.
If you can ensure the target program is compiled in debug mode you should be able to use the debugging symbols emitted by the compiler (the .pdb file) in order to map addresses to variables, but in that case you would need to launch the target process as if it were being debugged -- the plain Read Process Memory and Write Process Memory methods would not work.
Finally, your question ignores a very important consideration -- there need not be a variable corresponding to a particular address even if such information is stored.
If you have the source to the app in question and optimal memory usage is not a concern, then you can declare the interesting variables inside a debugging-friendly structure similar to:
typedef struct {
const char head_tag[15] = "VARIABLE_START";
char var_name[32];
int value;
const char tail_tag[13] = "VARIABLE_END";
} debuggable_int;
Now, your app should be able to search through the memory space for the program and look for the head and tail tags. Once it locates one of your debuggable variables, it can use the var_name and value members to identify and modify it.
If you are going to go to this length, however, you'd probably be better off building with debugging symbols enabled and using a regular debugger.
Billy O'Neal started to head in the right direction, but didn't (IMO) quite get to the real target. Assuming your target is Windows, a much simpler way would be to use the Windows Symbol handler functions, particularly SymFromName, which will let you supply the symbol's name, and it will return (among other things) the address for that symbol.
Of course, to do any of this you will have to run under an account that's allowed to do debugging. At least for global variables, however, you don't necessarily have to stop the target process to find symbols, addresses, etc. In fact, it works just fine for a process to use these on itself, if it so chooses (quite a few of my early experiments getting to know these functions did exactly that). Here's a bit of demo code I wrote years ago that gives at least a general idea (though it's old enough that it uses SymGetSymbolFromName, which is a couple of generations behind SymFromName). Compile it with debugging information and stand back -- it produces quite a lot of output.
#define UNICODE
#define _UNICODE
#define DBGHELP_TRANSLATE_TCHAR
#include <windows.h>
#include <imagehlp.h>
#include <iostream>
#include <ctype.h>
#include <iomanip>
#pragma comment(lib, "dbghelp.lib")
int y;
int junk() {
return 0;
}
struct XXX {
int a;
int b;
} xxx;
BOOL CALLBACK
sym_handler(wchar_t const *name, ULONG_PTR address, ULONG size, void *) {
if (name[0] != L'_')
std::wcout << std::setw(40) << name
<< std::setw(15) << std::hex << address
<< std::setw(10) << std::dec << size << L"\n";
return TRUE;
}
int
main() {
char const *names[] = { "y", "xxx"};
IMAGEHLP_SYMBOL info;
SymInitializeW(GetCurrentProcess(), NULL, TRUE);
SymSetOptions(SYMOPT_UNDNAME);
SymEnumerateSymbolsW(GetCurrentProcess(),
(ULONG64)GetModuleHandle(NULL),
sym_handler,
NULL);
info.SizeOfStruct = sizeof(IMAGEHLP_SYMBOL);
for (int i=0; i<sizeof(names)/sizeof(names[0]); i++) {
if ( !SymGetSymFromName(GetCurrentProcess(), names[i], &info)) {
std::wcerr << L"Couldn't find symbol 'y'";
return 1;
}
std::wcout << names[i] << L" is at: " << std::hex << info.Address << L"\n";
}
SymCleanup(GetCurrentProcess());
return 0;
}
WinDBG has a particularly useful command
ln
here
Given a memory location, it will give the name of the symbol at that location. With right debug information, it is a debugger's (I mean person doing debugging :)) boon!.
Here is a sample output on my system (XP SP3)
0:000> ln 7c90e514 (7c90e514)
ntdll!KiFastSystemCallRet |
(7c90e520) ntdll!KiIntSystemCall
Exact matches:
ntdll!KiFastSystemCallRet ()

Does this have anything to do with endian-ness?

For this code:
#include<stdio.h>
void hello() { printf("hello\n"); }
void bye() { printf("bye\n"); }
int main() {
printf("%p\n", hello);
printf("%p\n", bye);
return 0;
}
output on my machine:
0x80483f4
0x8048408
[second address is bigger in value]
on Codepad
0x8048541
0x8048511
[second address is smaller in value]
Does this have anything to do with endian-ness of the machines? If not,
Why the difference in the ordering of the addresses?
Also, Why the difference in the difference?
0x8048541 - 0x8048511 = 0x30
0x8048408 - 0x80483f4 = 0x14
Btw, I just checked. This code (taken from here) says that both the machines are Little-Endian
#include<stdio.h>
int main() {
int num = 1;
if(*(char *)&num == 1)
printf("Little-Endian\n");
else
printf("Big-Endian\n");
return 0;
}
No, this has nothing to do with endianness. It has everything to do with compilers and linkers being free to order function definitions in memory pretty much as they see fit, and different compilers choosing different memory layout strategies.
It has nothing to do with endinanness, but with the C++ standard. C++ isn't required to write functions in the order you see them to disk (and think about cross-file linking and even linking other libraries, that's just not feasable), it can write them in any order it wishes.
About the difference between the actual values, one compiler might add guards around a block to prevent memory overrides (or other related stuff, usually only in debug mode). And there's nothing preventing the compiler from writing other functions between your 2 functions. Keep in mind even a simple hello world application comes with thousands of bytes of executable code.
The bottom line is: never assume anything about how things are positioned in memory. Your assumptions will almost always be wrong. And why even assume? There's nothing to be gained over writing normal, safe, structured code anyway.
The location and ordering of functions is extremely specific to platform, architecture, compiler, compiler version and even compiler flags (especially those).
You are printing function addresses. That's purely in the domain of the linker, the compiler doesn't do anything that's involved with creating the binary image of the program. Other than generating the blobs of machine code for each function. The linker arranges those blobs in the final image. Some linkers have command line options that affect the order, it otherwise rarely matters.
Endian-ness cannot affect the output of printf() here. It knows how to interpret the bytes correctly if the pointer value was generated on the same machine.

Remove never-run call to templated function, get allocation error on run-time

I have a piece of templated code that is never run, but is compiled. When I remove it, another part of my program breaks.
First off, I'm a bit at a loss as to how to ask this question. So I'm going to try throwing lots of information at the problem.
Ok, so, I went to completely redesign my test project for my experimental core library thingy. I use a lot of template shenanigans in the library. When I removed the "user" code, the tests gave me a memory allocation error. After quite a bit of experimenting, I narrowed it down to this bit of code (out of a couple hundred lines):
void VOODOO(components::switchBoard &board) {
board.addComponent<using_allegro::keyInputs<'w'> >();
}
Fundementally, what's weirding me out is that it appears that the act of compiling this function (and the template function it then uses, and the template functions those then use...), makes this bug not appear. This code is not being run. Similar code (the same, but for different key vals) occurs elsewhere, but is within Boost TDD code.
I realize I certainly haven't given enough information for you to solve it for me; I tried, but it more-or-less spirals into most of the code base. I think I'm most looking for "here's what the problem could be", "here's where to look", etc. There's something that's happening during compile because of this line, but I don't know enough about that step to begin looking.
Sooo, how can a (presumably) compilied, but never actually run, bit of templated code, when removed, cause another part of code to fail?
Error:
Unhandled exceptionat 0x6fe731ea (msvcr90d.dll) in Switchboard.exe:
0xC0000005: Access violation reading location 0xcdcdcdc1.
Callstack:
operator delete(void * pUser Data)
allocator< class name related to key inputs callbacks >::deallocate
vector< same class >::_Insert_n(...)
vector< " " >::insert(...)
vector<" ">::push_back(...)
It looks like maybe the vector isn't valid, because _MyFirst and similar data members are showing values of 0xcdcdcdcd in the debugger. But the vector is a member variable...
Update: The vector isn't valid because it's never made. I'm getting a channel ID value stomp, which is making me treat one type of channel as another.
Update:
Searching through with the debugger again, it appears that my method for giving each "channel" it's own, unique ID isn't giving me a unique ID:
inline static const char channel<template args>::idFunction() {
return reinterpret_cast<char>(&channel<CHANNEL_IDENTIFY>::idFunction);
};
Update2: These two are giving the same:
slaveChannel<switchboard, ALLEGRO_BITMAP*, entityInfo<ALLEGRO_BITMAP*>
slaveChannel<key<c>, char, push<char>
Sooo, having another compiled channel type changing things makes sense, because it shifts around the values of the idFunctions? But why are there two idFunctions with the same value?
you seem to be returning address of the function as a character? that looks weird. char has much smaller bit count than pointer, so it's highly possible you get same values. that could reason why changing code layout fixes/breaks your program
As a general answer (though aaa's comment alludes to this): When something like this affects whether a bug occurs, it's either because (a) you're wrong and it is being run, or (b) the way that the inclusion of that code happens to affect your code, data, and memory layout in the compiled program causes a heisenbug to change from visible to hidden.
The latter generally occurs when something involves undefined behavior. Sometimes a bogus pointer value will cause you to stomp on a bit of your code (which might or might not be important depending on the code layout), or sometimes a bogus write will stomp on a value in your data stack that might or might not be a pointer that's used later, or so forth.
As a simple example, supposing you have a stack that looks like:
float data[10];
int never_used;
int *important pointer;
And then you erroneously write
data[10] = 0;
Then, assuming that stack got allocated in linear order, you'll stomp on never_used, and the bug will be harmless. However, if you remove never_used (or change something so the compiler knows it can remove it for you -- maybe you remove a never-called function call that would use it), then it will stomp on important_pointer instead, and you'll now get a segfault when you dereference it.

Why is this a memory copying error - Insure++ false positive?

I've been trying run Insure++ with some scientific code and it reports many errors, although to be fair it officially does not support K&R C and I don't know what having a lot of K&R functions has done to its evaluation process. The C and C++ code it is testing is being run in a DLL invoked from a WPF application.
One error report that puzzles me is the following, which I'm confident is safe code but am trying to work out why it thinks is an error (it does work). I'd be interested if anyone has an insight into why this might be an error condition.
[MacImagePlot.c:984] **READ_OVERFLOW**
SetCursorQD(*GetCursorQD(watchCursor));
Reading overflows memory: GetCursorQD(watchCursor)
bbbbb
| 4 | 4 |
rrrrr
Reading (r) : 0x5639d164 thru 0x5639d167 (4 bytes)
From block (b) : 0x5639d160 thru 0x5639d163 (4 bytes)
gWatchCursor, declared at WPFMacGraphics.cpp, 418
for some very simple code.
typedef int Cursor;
typedef Cursor* CursPtr;
typedef CursPtr* CursHandle;
CursHandle GetCursorQD (short cursorID);
void SetCursorQD (const Cursor *crsr);
enum {
....
watchCursor = 4
};
// file globals
Cursor gWatchCursor=watchCursor;
CursPtr gWatchCursorPtr = &gWatchCursor;
CursHandle GetCursorQD (short cursorID)
{
if (cursorID==watchCursor) // this is actually the only case ever called
return &gWatchCursorPtr;
return 0;
}
I'm not familiar at all with the tools you're talking about, but have you verified that your GetCursorQD function is returning the pointer you expect and not NULL/0?
Perhaps something wonky happened with your enum definition for watchCursor (such as it being declared differently elsewhere, or it picking up a local variable instead of the enum).
I hate to say it but I suspect your problem is going to be the lack of some arcane function modifiers needed to ensure that data on the stack isn't getting munged when crossing the DLL boundary. I'd suggest writing a simple app that replicates the code but does it all in one module and see if Insure++ still detects an error. If it doesn't, get ready to wade through __declspec documentation.
I assume that the following line is the Problem:
if (cursorID==watchCursor)
cursorID is defined as short (usually 2 Bytes)
watchCursor is part of a enum and thus of type int (4 Bytes on a 32Bit OS)
This actually is not a problem. The compiler will cast one of both parameters correctly, as far as the enum value will not exceed a 2 Byte range.
By my experience all static (as well as runtime-) code analysis tools report many false positives (i tried some of them). They of course help, but it takes quite a while to assert false positives from real bugs.
Like Soapbox, I am not familiar with Insure++.
But looking at the code, it is admittedly a bit confusing...so
That typedef makes CursHandle effectively a pointer to pointer to int...
CursHandle is a pointer of type CursPtr
CursPtr is a pointer of type Cursor
Cursor is typedef'd to type int
yet in the GetCursorQD, you are using a 'double address of' int? The reason I say 'double address' is the function is returning a address of gWatchCursorPtr (&gWatchCursorPtr) of type CursHandle, which in turn is a global variable which is a address of gWatchCursor (&gWatchCursor) which is of type Cursor.
Your definition of the return type for the function does not match up with the global variable name's typeof despite the typedef's...that's what my thinking is...
Hope this helps,
Best regards,
Tom.

Does an arbitrary instruction pointer reside in a specific function?

I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.
This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.
Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.
OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.
The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}
Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.