Value of pointer different from what it is assigned to - c++

I have two empty functions called TestFunc and TestFunc2, and I assigned their addresses to two variables.
void TestFunc()
{
}
__declspec(naked) void TestFunc2()
{
}
int main()
{
DWORD* test = (DWORD*)TestFunc;
DWORD* test2 = (DWORD*)TestFunc2;
printf("TestFunc is %p at test is %p\n", TestFunc, test);
printf("TestFunc2 is %p at test2 is %p\n", TestFunc2, test2);
getchar();
}
After assignment, the value of the two variables actually differ from what they are assigned.
However, in a printf statement, the output shows that they are the same. Any ideas why is this happening?

This is caused by incremental linking in Visual Studio, from this page you can learn that one of the drawbacks of this is:
An incrementally linked program is functionally equivalent to a program that is non-incrementally linked. However, because it is prepared for subsequent incremental links, an incrementally linked executable, static library, or dynamic-link library file:
Is larger than a non-incrementally linked program because of padding of code and data. Padding enables the linker to increase the size of functions and data without recreating the file.
May contain jump thunks to handle relocation of functions to new addresses.
and those jump thunks is what your have observed.
if you disable this option (vs2015):
Linker -> All Options -> Enable Incremental Linking to NO
then your addresses will be equal.
If you look closer in disassembly what is at the address which you have assigned to DWORD*, you will find that there is a jump to your function:
TestFunc:
000000014001117C jmp TestFunc (01400116D0h)

It has to do with the platform specific runtime environment your code is compiled to. C++ allows the runtime to do some pointer arithmetics behind the scene. It is dangerous to write any code that relies on this runtime behaviour.
If you really want to know, take a look at the assembler code at the memory. My guess would be that the runtime uses an extra jump table perhaps for the new edit and continue debug functionality that is improved/changed with each release of visual studio.

Related

How to return the name of a variable stored at a particular memory address in C++

first time posting here after having so many of my Google results come up from this wonderful site.
Basically, I'd like to find the name of the variable stored at a particular memory address. I have a memory editing application I wrote that edits a single value, the problem being that every time the application holding this value is patched, I have to hardcode in the new memory address into my application, and recompile, which takes so much time to upkeep that its almost not worthwhile to do.
What I'd like to do is grab the name of the variable stored at a certain memory address, that way I can then find its address at runtime and use that as the memory address to edit.
This is all being written in C++.
Thanks in advance!
Edit:
Well I've decided I'd like to stream the data from a .txt file, but I'm not sure how to convert the string into an LPVOID for use as the memory address in WriteProcessMemory(). This is what I've tried:
string fileContents;
ifstream memFile("mem_address.txt");
getline(memFile, fileContents);
memFile.close();
LPVOID memAddress = (LPVOID)fileContents.c_str();
//Lots of code..
WriteProcessMemory(WindowsProcessHandle, memAddress, &BytesToBeWrote, sizeof(BytesToBeWrote), &NumBytesWrote);
The code is all correct in terms of syntax, it compiles and runs, but the WriteProcessMemory errors and I can only imagine it has to do with my faulty LPVOID variable. I apologize if extending the use of my question is against the rules, I'll remove my edit if it is.
Compile and generate a so called map file. This can be done easily with Visual-C++ (/MAP linker option). There you'll see the symbols (functions, ...) with their starting address. Using this map file (Caution: has to be updated each time you recompile) you can match the addresses to names.
This is actually not so easy because the addresses are relative to the preferred load address, and probably will (randomization) be different from the actual load address.
Some old hints on retrieving the right address can be found here: http://home.hiwaay.net/~georgech/WhitePapers/MapFiles/MapFiles.htm
In general, the names of variables are not kept around when the program is compiled. If you are in control of the compilation process, you can usually configure the linker and compiler to produce a map-file listing the locations in memory of all global variables. However, if this is the case, you can probably acheive your goals more easily by not using direct memory accesses, but rather creating a proper command protocol that your external program can call into.
If you do not have control of the compilation process of the other program, you're probably out of luck, unless the program shipped with a map file or debugging symbols, either of which can be used to derive the names of variables from their addresses.
Note that for stack variables, deriving their names will require full debugging symbols and is a very non-trivial process. Heap variables have no names, so you will have no luck there, naturally. Further, as mentioned in #jdehaan's answer, map files can be a bit tricky to work with in the best of times. All in all, it's best to have a proper control protocol you can use to avoid any dependence on the contents of the other program's memory at all.
Finally, if you have no control over the other program, then I would recommend putting the variable location into a separate datafile. This way you would no longer need to recompile each time, and could even support multiple versions of the program being poked at. You could also have some kind of auto-update service pulling new versions of this datafile from a server of yours if you like.
Unless you actually own the application in question, there is no standard way to do this. If you do own the application, you can follow #jdehaan answer.
In any case, instead of hardcoding the memory address into your application, why not host a simple feed somewhere that you can update at any time with the memory address you need to change for each version of the target application? This way, instead of recompiling your app every time, you can just update that feed when you need to be able to manipulate a new version.
You cannot directly do this; variable names do not actually exist in the compiled binary. You might be able to do that if the program was written, in say, Java or C#, which do store information about variables in the compiled binary.
Further, this wouldn't in general be possible, because it's always possible that the most up to date copy of a value inside the target program is located inside of a CPU register rather than in memory. This is more likely if the program in question is compiled in release mode, with optimizations turned on.
If you can ensure the target program is compiled in debug mode you should be able to use the debugging symbols emitted by the compiler (the .pdb file) in order to map addresses to variables, but in that case you would need to launch the target process as if it were being debugged -- the plain Read Process Memory and Write Process Memory methods would not work.
Finally, your question ignores a very important consideration -- there need not be a variable corresponding to a particular address even if such information is stored.
If you have the source to the app in question and optimal memory usage is not a concern, then you can declare the interesting variables inside a debugging-friendly structure similar to:
typedef struct {
const char head_tag[15] = "VARIABLE_START";
char var_name[32];
int value;
const char tail_tag[13] = "VARIABLE_END";
} debuggable_int;
Now, your app should be able to search through the memory space for the program and look for the head and tail tags. Once it locates one of your debuggable variables, it can use the var_name and value members to identify and modify it.
If you are going to go to this length, however, you'd probably be better off building with debugging symbols enabled and using a regular debugger.
Billy O'Neal started to head in the right direction, but didn't (IMO) quite get to the real target. Assuming your target is Windows, a much simpler way would be to use the Windows Symbol handler functions, particularly SymFromName, which will let you supply the symbol's name, and it will return (among other things) the address for that symbol.
Of course, to do any of this you will have to run under an account that's allowed to do debugging. At least for global variables, however, you don't necessarily have to stop the target process to find symbols, addresses, etc. In fact, it works just fine for a process to use these on itself, if it so chooses (quite a few of my early experiments getting to know these functions did exactly that). Here's a bit of demo code I wrote years ago that gives at least a general idea (though it's old enough that it uses SymGetSymbolFromName, which is a couple of generations behind SymFromName). Compile it with debugging information and stand back -- it produces quite a lot of output.
#define UNICODE
#define _UNICODE
#define DBGHELP_TRANSLATE_TCHAR
#include <windows.h>
#include <imagehlp.h>
#include <iostream>
#include <ctype.h>
#include <iomanip>
#pragma comment(lib, "dbghelp.lib")
int y;
int junk() {
return 0;
}
struct XXX {
int a;
int b;
} xxx;
BOOL CALLBACK
sym_handler(wchar_t const *name, ULONG_PTR address, ULONG size, void *) {
if (name[0] != L'_')
std::wcout << std::setw(40) << name
<< std::setw(15) << std::hex << address
<< std::setw(10) << std::dec << size << L"\n";
return TRUE;
}
int
main() {
char const *names[] = { "y", "xxx"};
IMAGEHLP_SYMBOL info;
SymInitializeW(GetCurrentProcess(), NULL, TRUE);
SymSetOptions(SYMOPT_UNDNAME);
SymEnumerateSymbolsW(GetCurrentProcess(),
(ULONG64)GetModuleHandle(NULL),
sym_handler,
NULL);
info.SizeOfStruct = sizeof(IMAGEHLP_SYMBOL);
for (int i=0; i<sizeof(names)/sizeof(names[0]); i++) {
if ( !SymGetSymFromName(GetCurrentProcess(), names[i], &info)) {
std::wcerr << L"Couldn't find symbol 'y'";
return 1;
}
std::wcout << names[i] << L" is at: " << std::hex << info.Address << L"\n";
}
SymCleanup(GetCurrentProcess());
return 0;
}
WinDBG has a particularly useful command
ln
here
Given a memory location, it will give the name of the symbol at that location. With right debug information, it is a debugger's (I mean person doing debugging :)) boon!.
Here is a sample output on my system (XP SP3)
0:000> ln 7c90e514 (7c90e514)
ntdll!KiFastSystemCallRet |
(7c90e520) ntdll!KiIntSystemCall
Exact matches:
ntdll!KiFastSystemCallRet ()

Does this have anything to do with endian-ness?

For this code:
#include<stdio.h>
void hello() { printf("hello\n"); }
void bye() { printf("bye\n"); }
int main() {
printf("%p\n", hello);
printf("%p\n", bye);
return 0;
}
output on my machine:
0x80483f4
0x8048408
[second address is bigger in value]
on Codepad
0x8048541
0x8048511
[second address is smaller in value]
Does this have anything to do with endian-ness of the machines? If not,
Why the difference in the ordering of the addresses?
Also, Why the difference in the difference?
0x8048541 - 0x8048511 = 0x30
0x8048408 - 0x80483f4 = 0x14
Btw, I just checked. This code (taken from here) says that both the machines are Little-Endian
#include<stdio.h>
int main() {
int num = 1;
if(*(char *)&num == 1)
printf("Little-Endian\n");
else
printf("Big-Endian\n");
return 0;
}
No, this has nothing to do with endianness. It has everything to do with compilers and linkers being free to order function definitions in memory pretty much as they see fit, and different compilers choosing different memory layout strategies.
It has nothing to do with endinanness, but with the C++ standard. C++ isn't required to write functions in the order you see them to disk (and think about cross-file linking and even linking other libraries, that's just not feasable), it can write them in any order it wishes.
About the difference between the actual values, one compiler might add guards around a block to prevent memory overrides (or other related stuff, usually only in debug mode). And there's nothing preventing the compiler from writing other functions between your 2 functions. Keep in mind even a simple hello world application comes with thousands of bytes of executable code.
The bottom line is: never assume anything about how things are positioned in memory. Your assumptions will almost always be wrong. And why even assume? There's nothing to be gained over writing normal, safe, structured code anyway.
The location and ordering of functions is extremely specific to platform, architecture, compiler, compiler version and even compiler flags (especially those).
You are printing function addresses. That's purely in the domain of the linker, the compiler doesn't do anything that's involved with creating the binary image of the program. Other than generating the blobs of machine code for each function. The linker arranges those blobs in the final image. Some linkers have command line options that affect the order, it otherwise rarely matters.
Endian-ness cannot affect the output of printf() here. It knows how to interpret the bytes correctly if the pointer value was generated on the same machine.

Calling a non-exported function in a DLL

I have a program which loads DLLs and I need to call one of the non-exported functions it contains. Is there any way I can do this, via searching in a debugger or otherwise? Before anyone asks, yes I have the prototypes and stuff for the functions.
Yes there is, at least sort of, but it isn't a good idea.
In C/C++ all a function pointer is, is an address in memory. So if you somehow where able to find the address of this function you could call it.
Let me ask some questions though, how do you know this DLL contains this function? Do you have the source code? Otherwise I don't know how you could know for certain that this function exists or if it is safe to call. But if you have the source code, then just expose the function. If the DLL writer didn't expose this function, they never expect you to call it and can change/remove the implementation at any time.
Warnings aside, you can find the function address if you have debug symbols or a MAP file you can find the offset in the DLL. If you don't have anything but the DLL, then there is no way to know where that function exists in the DLL - it is not stored in the DLL itself.
Once you have the offset you can then insert that into the code like so:
const DWORD_PTR funcOffset = 0xDEADBEEF;
typedef void (*UnExportedFunc)();
....
void CallUnExportedFunc() {
// This will get the DLL base address (which can vary)
HMODULE hMod = GetModuleHandle("My.dll");
// Calcualte the acutal address
DWORD_PTR funcAddress = (DWORD_PTR)hMod + funcOffset;
// Cast the address to a function poniter
UnExportedFunc func = (UnExportedFunc)funcAddress;
// Call the function
func();
}
Also realize that the offset of this function WILL CHANGE EVERY TIME the DLL is rebuilt so this is very fragile and let me say again, not a good idea.
I realize this question rather is old, but shf301 has the right idea here. The only thing I would add is to implement a pattern search on the target library. If you have IDA or OllyDbg, you can search for the function and view the binary/hex data which surrounds that function's starting address.
In most cases, there will be some sort of binary signature which rarely changes. The signature may hold wildcards which may change between builds, but ultimately there should be at least one successful hit while searching for this pattern, unless extremely drastic changes have occurred between builds (at which point, you could just figure out the new signature for that particular version).
The way that you would implement a binary pattern search is like so:
bool bCompare(const PBYTE pData, const PBYTE bMask, const PCHAR szMask)
{
for(;*szMask;++szMask,++pData,++bMask)
if(*szMask=='x' && *pData!=*bMask)
return 0;
return (*szMask) == NULL;
}
DWORD FindPattern(DWORD dwAddress, DWORD dwLen, PBYTE bMask, PCHAR szMask)
{
for(DWORD i=0; i<dwLen; i++)
if (bCompare((PBYTE)(dwAddress+i),bMask,szMask))
return (DWORD)(dwAddress+i);
return 0;
}
Example usage:
typedef void (*UnExportedFunc)();
//...
void CallUnExportedFunc()
{
// This will get the DLL base address (which can vary)
HMODULE hMod = GetModuleHandleA( "My.dll" );
// Get module info
MODULEINFO modinfo = { NULL, };
GetModuleInformation( GetCurrentProcess(), hMod, &modinfo, sizeof(modinfo) );
// This will search the module for the address of a given signature
DWORD dwAddress = FindPattern(
hMod, modinfo.SizeOfImage,
(PBYTE)"\xC7\x06\x00\x00\x00\x00\x89\x86\x00\x00\x00\x00\x89\x86",
"xx????xx????xx"
);
// Calculate the acutal address
DWORD_PTR funcAddress = (DWORD_PTR)hMod + dwAddress;
// Cast the address to a function poniter
UnExportedFunc func = (UnExportedFunc)funcAddress;
// Call the function
func();
}
The way that this works is by passing in the base address of the loaded library via GetModuleHandle, specifying the length (in bytes) to search, the binary data to search for, and a mask which specifies which bytes of the binary string are valid ('x') and which are to be overlooked ('?'). The function will then walk through the memory space of the loaded module, searching for a match. In some cases, there may be more than one match and in this case, it's wise to make your signature a little more pronounced to where there is only one match.
Again, you would need to do the initial binary search in a disassembly application in order to know what this signature is, but once you have that then this method should work a little better than manually finding the function offset every time the target is built. Hope this helps.
If the function you want isn't exported, then it won't be in the export address table. Assuming Visual Studio was used to produce this DLL and you have its associated PDB (program database) file, then you can use Microsoft's DIA (debug interface access) APIs to locate the desired function either by name or, approximately, by signature.
Once you have the function (symbol) from the PDB, you will also have its RVA (relative virtual address). You can add the RVA to the loaded module's base address to determine the absolute virtual address in memory where the function is stored. Then, you can make a function call through that address.
Alternatively, if this is just a one-off thing that you need to do (i.e. you don't need a programmatic solution), you can use windbg.exe in the Debugging Tools for Windows toolkit to attach to your process and discover the address of the function you care about. In WinDbg, you can use the x command to "examine symbols" in a module.
For example, you can do x mymodule!*foo* to see all functions whose name contains "foo". As long as you have symbols (PDB) loaded for your module, this will show you the non-export functions as well. Use .hh x to get help on the x command.
Even if you can find the function address, it's not in general safe to call a function created by a compiler that thought it was making a "private" internal-use-only function.
Modern compilers with link-time-optimization enabled may make a specialized version of a function that only does what the specific callers need it to do.
Don't assume that a block of machine code that looks like the function you want actually follows the standard ABI and implements everything the source code says.
In gcc's case, it does use special names for specialized versions of a function that aren't inlined but take advantage of a special case (like constant propagation) from multiple callers.
e.g. in this objdump -drwC output (where -C is demangle):
42944c: e8 cf 13 0e 00 call 50a820
429451: 48 8b 7b 48 mov rdi,QWORD PTR [rbx+0x48]
429455: 48 89 ee mov rsi,rbp
429458: e8 b3 10 0e 00 call 50a510
gcc emits code that calls two different clones of the same function, specialized for two different compile-time-constants. (This is from http://endless-sky.github.io/, which desperately needs LTO because even trivial accessor functions for its XY position class are in Point.cpp, not Point.h, so they can only be inlined by LTO.)
LTO can even make .lto_priv static versions of data: like
mov rcx,QWORD PTR [rip+0x412ff7] # 83dbe0 <_ZN12_GLOBAL__N_116playerGovernmentE.lto_priv.898>
So even if you find a function that looks like what you want, calling it from a new place might violate the assumptions that Link-Time-Optimization took advantage of.
I'm afraid there are no "safe" way to do so if referred library does not explicitly export its object (class/func). Because you will have no idea where is the required object mapped in code memory.
However, by using RE tools, you can find offset for interested object within the library, then add it to any known exported object address to obtain the "real" memory location. After that, prepare a function prototype etc and cast into your local structure for usage.
The most general way to do this (and it's still a bad idea, as everyone else pointed out already) is to scan the DLL code at runtime after it's loaded, and look for a known, unique section of code in that function, and then use code similar to that in shf301's answer to call it. If you know that the DLL won't ever change, than any solution based on determining the offset in the DLL should work.
To find that unique section of code, disassemble the DLL using a disassembler that can show you the machine code in addition to the assembly language mnemonics (I can't think of anything that won't do that) and watch out for call and jmp instructions.
I actually had to do something similar once to apply a binary patch to a DOS exe; it was a bug fix, and the code wasn't under revision control so that was the only way to fix it.
I'd be really curious to know why you need this, by the way.

Does an arbitrary instruction pointer reside in a specific function?

I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.
This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.
Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.
OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.
The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}
Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.

Incorrect function addresses in Visual Studio MAP-file

the function addresses (Rva+Base) in my MAP-file from visual studio doesn't match the one I see in the debugger (or when I manually inspect my stack frame).
What could be causing this?
/A.B.
Is the problem in an executable or a DLL?
If it's a DLL what is its preferred load address? If this clashes with any other DLL then it will be rebased by the loader, and this can lead to what you're seeing.
As part of your build process, you should ensure that all your DLLs are rebased (there's a tool to do this) so that their address spaces don't clash (this frees up some page file space as well as improving load time).
Both exe and dll can be relocated, unless you specify a /FIXED command line option when linking. I use following way to determine the real address to determine where my exe was loaded, so that I can compute the offset against the map file.
static void KnownFunctionAddress(){}
...
// check an address of a known function
// and compare this to the value read from the map file
intptr_t CheckRelocationOffset(MapFile map)
{
intptr_t mapAddress = map.PhysicalAddress("?KnownFunctionAddress##YAXXZ");
intptr_t realAddress = (intptr_t)KnownFunctionAddress;
return realAddress-mapAddress;
}
When you are in the debugger and stepping into the code, can you check if the code address is within the range that you see in the "Modules" window? Sometimes the same piece of code may exist in several modules of same / different names.
Once you identify the "Module" which contains the code, use the base address from the Modules window to arrive at (by subtracting) the DLL entry point address.
Finally, there is also the effect of entry jump tables (trampoline), which is a kind of function call indirection that can be added at compile time or at runtime. Thus, the "entry point" address may be a smoke screen and doesn't match the address for the function body.
(My understanding of DLL structure is limited, so there may be inaccuracies in my answer.)
I'm not able to reply directly to Suma's reply for some reason, but you can also just do the following:
extern "C" struct IMAGE_DOS_HEADER __ImageBase; // On platforms other than Win32/Win64, this MAY be a different header type...
...
printf_s("base: %p", &__ImageBase);
__ImageBase is defined by the linker (VC++ at least) and taking its address will give you the base address of the module (EXE/DLL), even if it is relocated at runtime.
There's also
printf_s("calling module's base: %p\n", GetModuleHandle(NULL));
which can give you the same base address value...but there are more caveats to GetModuleHandle (plus it requires windows.h) so I recommend just sticking to __ImageBase.
Like others have mentioned, your problem is probably relating to Windows relocating your module. If there is no .reloc section in the module file, then the file isn't relocatable, in which case it is like you're running into trampolines or such like rwong suggested.