I am profiling an application using VerySleepy 0.7. The application is written in C++ with Qt 4.6.x, compiled with VS 2005 and is running on Windows 7 Ultimate x64.
The highest usage by far is a call to RtlPcToFileHeader
Exclusive Inclusive %Exclusive %Inclusive Module
33.67s 33.67s 15.13% 15.13% ntdll
It is not clear to me from the documentation what RtlPcToFileHeader is but because it is referenced under "Error Handling Functions" it seems like it is something that should not be there. That being said, since it was used basically throughout my profiling capture, it could also be some very basic function call (i.e. something like main) or a side affect of the profiling itself.
What is the purpose of the RtlPcToFileHeader function?
Update: Based on Mike's suggestion, I did break into the running process and the couple times it included RtlPcToFileHeader in the stack trace it seemed somehow tied to a dynamic_cast. I have also changed to question to better reflect that I am trying to determine what RtlPcToFileHeader actually does.
The following MSDN post implies that Microsoft's x64 implementation of the RTTI routines, invoked during dynamic_cast, is slower than the x86 one.
http://blogs.msdn.com/b/junfeng/archive/2006/10/17/dynamic-cast-is-slow-in-x64.aspx
These RTTI pointers on 64-bit systems seem to be only offsets from the base address of the module. To dereference them, you need the module base address--which is retrieved by the API function ::RtlPcToFileHeader().
If this is correct, it seems that you can't do anything about it, except refactor your code to minimize the use of dynamic_casts and rely more on virtual methods. Or could it be only an imperfection of the profiler--it gets lost in dynamic_casts?
RtlPcToFileHeader is a function that uses an arbitrary address (PcValue) to look up a base address of the matching module mapped into the address space of the current process. It is very similar to calling:
HMODULE hModule = NULL;
GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
pArbitraryAddr,
&hModule);
For its search it goes through mapping addresses of all modules mapped into the current process, for which it uses the process' PEB/TEB structures in memory. Before doing that, it acquires a shared RW lock (similar to calling AcquireSRWLockShared) and then releases it when the search is done.
Just pause it under the debugger. Study the call stack to understand what it was doing and why. Then repeat several times. That will tell you where the time is going and the reasons why.
If that's too low-tech for you, try out LTProf, or any other wall-time stack sampler that reports line-level percent, preferably with a butterfly viewer.
The kind of numbers you are puzzling over is precisely the legacy of gprof.
Related
I was thinking about some RPC code that I have to implement in C++ and I wondered if it's safe (and under which assumptions) to send it over the network to the same binary code (assuming it's exactly the same and that they are running on same architecture). I guess virtual memory should do the difference here.
I'm asking it just out of curiosity, since it's a bad design in any case, but I would like to know if it's theoretically possible (and if it's extendable to other kind of pointers to static data other than functions that the program may include).
In general, it's not safe for many reasons, but there are limited cases in which it will work. First of all, I'm going to assume you're using some sort of signing or encryption in the protocol that ensures the integrity of your data stream; if not, you have serious security issues already that are only compounded by passing around function pointers.
If the exact same program binary is running on both ends of the connection, if the function is in the main program (or in code linked from a static library) and not in a shared library, and if the program is not built as a position-independent executable (PIE), then the function pointer will be the same on both ends and passing it across the network should work. Note that these are very stringent conditions that would have to be documented as part of using your program, and they're very fragile; for instance if somebody upgrades the software on one side and forgets to upgrade the version on the other side of the connection at the same time, things will break horribly and dangerously.
I would avoid this type of low-level RPC entirely in favor of a higher-level command structure or abstract RPC framework, but if you really want to do it, a slightly safer approach would be to pass function names and use dlsym or equivalent to look them up. If the symbols reside in the main program binary rather than libraries, then depending on your platform you might need -rdynamic (GCC) or a similar option to make them available to dlsym. libffi might also be a useful tool for abstracting this.
Also, if you want to avoid depending on dlsym or libffi, you could keep your own "symbol table" hard-coded in the binary as a static const linear table or hash table mapping symbol names to function pointers. The hash table format used in ELF for this purpose is very simple to understand and implement, so I might consider basing your implementation on that.
What is it a pointer to?
Is it a pointer to a piece of static program memory? If so, don't forget that it's an address, not an offset, so you'd first need to convert between the two accordingly.
Second, if it's not a piece of static memory (ie: statically allocated array created at build time as opposed to run time) it's not really possible at all.
Finally, how are you ensuring the two pieces of code are the same? Are both binaries bit identical (eg: diff -a binary1 binary2). Even if they are bit-identical, depending on the virtual memory management on each machine, the entire program's program memory segment may not exist in a single page, or the alignment across multiple pages may be different for each system.
This is really a bad idea, no matter how you slice it. This is what message passing and APIs are for.
I don't know of any form of RPC that will let you send a pointer over the network (at least without doing something like casting to int first). If you do convert to int on the sending end, and convert that back to a pointer on the far end, you get pretty much the same as converting any other arbitrary int to a pointer: undefined behavior if you ever attempt to dereference it.
Normally, if you pass a pointer to an RPC function, it'll be marshalled -- i.e., the data it points to will be packaged up, sent across, put into memory, and a pointer to that local copy of the data passed to the function on the other end. That's part of why/how IDL gets a bit ugly -- you need to tell it how to figure out how much data to send across the wire when/if you pass a pointer. Most know about zero-terminated strings. For other types of arrays, you typically need to specify the size of the data (somehow or other).
This is highly system dependent. On systems with virtual addressing such that each process thinks it's running at the same address each time it executes, this could plausibly work for executable code. Darren Kopp's comment and link regarding ASLR is interesting - a quick read of the Wikipedia article suggests the Linux & Windows versions focus on data rather than executable code, except for "network facing daemons" on Linux, and on Windows it applies only when "specifically linked to be ASLR-enabled".
Still, "same binary code" is best assured by static linking - if different shared objects/libraries are loaded, or they're loaded in different order (perhaps due to dynamic loading - dlopen - driven by different ordering in config files or command line args etc.) you're probably stuffed.
Sending a pointer over the network is generally unsafe. The two main reasons are:
Reliability: the data/function pointer may not point to the same entity (data structure or function) on another machine due to different location of the program or its libraries or dynamically allocated objects in memory. Relocatable code + ASLR can break your design. At the very least, if you want to point to a statically allocated object or a function you should sent its offset w.r.t. the image base if your platform is Windows or do something similar on whatever OS you are.
Security: if your network is open and there's a hacker (or they have broken into your network), they can impersonate your first machine and make the second machine either hang or crash, causing a denial of service, or execute arbitrary code and get access to sensitive information or tamper with it or hijack the machine and turn it into an evil bot sending spam or attacking other computers. Of course, there are measures and countermeasures here, but...
If I were you, I'd design something different. And I'd ensure that the transmitted data is either unimportant or encrypted and the receiving part does the necessary validation of it prior to using it, so there are no buffer overflows or execution of arbitrary things.
If you're looking for some formal guarantees, I cannot help you. You would have to look in the documentation of the compiler and OS that you're using - however I doubt that you would find the necessary guarantees - except possibly for some specialized embedded systems OS'.
I can however provide you with one scenario where I'm 99.99% sure that it will work without any problems:
Windows
32 bit process
Function is located in a module that doesn't have relocation information
The module in question is already loaded & initialized on the client side
The module in question is 100% identical on both sides
A compiler that doesn't do very crazy stuff (e.g. MSVC and GCC should both be fine)
If you want to call a function in a DLL you might run into problems. As per the list above the module (=DLL) may not have relocation information, which of course makes it impossible to relocate it (which is what we need). Unfortunately that also means that loading the DLL will fail, if the "preferred load address" is used by something else. So that would be kind-of risky.
If the function resides in the EXE however, you should be fine. A 32 bit EXE doesn't need relocation information, and most don't include it (MSVC default settings). BTW: ASLR is not an issue here since a) ASLR does only move modules that are tagged as wanting to be moved and b) ASLR could not move a 32 bit windows module without relocation information, even if it wanted to.
Most of the above just makes sure that the function will have the same address on both sides. The only remaining question - at least that I can think of - is: is it safe to call a function via a pointer that we initialized by memcpy-ing over some bytes that we received from the network, assuming that the byte-pattern is the same that we would have gotten if we had taken the address of the desired function? That surely is something that the C++ standard doesn't guarantee, but I don't expect any real-world problems from current real-world compilers.
That being said, I would not recommend to do that, except for situations where security and robustness really aren't important.
I have an old application which calls GetOpenFileNameA and GetSaveFileNameA.
Both calls are erroneous. The application crashes!
I have used OllyDbg and API Monitor to read the size stored in the OPENFILENAME struct.
The size of the struct is 76 Bytes (testing with Windows 7 x64).
I get an access violation exception while GetOpenFileNameA or GetSaveFileNameA is called.
I assume that at runtime windows tries to read 88 Bytes instead of 76 Bytes.
Have a look at this:
http://dotnetbutchering.blogspot.de/2007/10/vc-60-getting-0xc0000005-access.html
and this
http://www.asmcommunity.net/board/index.php?topic=5768.15
I did some research and while doing that I have detected following behavior:
While running Microsoft Spy++ the application does not crash!!
I stepped through the debugger and I saw that the access violation exception still occurs but somehow the exception is swallowed.
The application works fine! I can load and save files.
I have the following ideas. What do you think about them?
write sth. like a Loader.exe which does the same like Spy++.
Swallowing the access violation exception when both APIs are called.
Use DLL Injection and API Hooking.
I could hook GetOpenFileName and GetSaveFileName with a custom implementation in a custom DLL. My implementation would fix the struct and pass the corrected struct to the original API calls.
Use SetWindowsHook to hook a window message ?!?!?!
Patch the binary file. Is it possible to fix this struct size issue by patching using a HEX Editor?
Which one would work?
Do you have a better idea how I can fix this?
I am not able to get the source code of this old application.
I have to fix it using the existing binaries.
My solution must work at least on Windows XP and Windows 7 (x86, x64)
The tool PEiD shows me following info about the old application:
Linker info: 2.55
MS Visual C++ 4.0
The size of the struct is 76 Bytes (testing with Windows 7 x64). I get
an access violation exception while GetOpenFileNameA or
GetSaveFileNameA is called. I assume that at runtime windows tries to
read 88 Bytes instead of 76 Bytes.
if you look at the OPENFILENAME struct you will notice a:
#if (_WIN32_WINNT >= 0x0500)
void * pvReserved;
DWORD dwReserved;
DWORD FlagsEx;
#endif // (_WIN32_WINNT >= 0x0500)
which in a 32bit program (VC++ 4 did not support 64bit targets) translates to exactly 12 bytes difference. As long as lStructSize is set properly by the caller, this should not be an issue at all. It may be worthwhile to use procdump from Microsoft/Sysinternals to get a minidump of the exact state (or attach a debugger and investigate). The exception you encounter does not necessarily have to be due to the struct size. If it is, it is more likely that Microsoft dropped the ball when it comes to backward compatibility of this function. Obviously OPENFILENAME::lStructSize is there for versioning of the struct and to ensure what you encounter wouldn't happen. But then, we're talking about a program built with a compiler/linker from times before Windows 2000.
write sth. like a Loader.exe which does the same like Spy++.
Swallowing the access violation exception when both APIs are called.
It's a fair point. If you would insert exception handling at the top level you could do things you want, but it may cause side-effects depending on what exactly caused the exception (i.e. which exact memory was overwritten).
Use DLL Injection and API Hooking. I could hook GetOpenFileName and
GetSaveFileName with a custom implementation in a custom DLL. My
implementation would fix the struct and pass the corrected struct to
the original API calls.
This is pretty much related to the first one. I think it will be easiest and safest in all, because this way you can correct the behavior without too much intrusion. Please read further below. Also, check out NInjectLib.
Use SetWindowsHook to hook a window message ?!?!?!
I don't see how that helps other than facilitating the injection of a DLL (for 1. and 2.).
Patch the binary file. Is it possible to fix this struct size issue by
patching using a HEX Editor?
This may be the trickiest, depending on whether the OPENFILENAME is inside the binary (initialized data) or on stack or whether it gets allocated on the heap (easy then).
One possible hybrid approach for 1. and 2. would be this:
Add a subkey to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options named after the program that you are executing (e.g. foo.exe)
Create a REG_SZ value named Debugger inside that newly-created subkey and set the value to a program that I will try to describe briefly now.
This effectively sets up a debugger for this old application of yours and it means that the debugger we're going to write will receive your application's command line as arguments. It is convenient, because it is transparent to the end-user and you can adjust it to suit your needs.
You'll need to write a debugger. This task isn't as taunting as it seems at first, because you can use the debugging helpers Win32 offers. The gist is in the debugger loop. In general you create the target process yourself using CreateProcess passing the appropriate flags to be able to debug it. The use WaitForDebugEvent and ContinueDebugEvent to control the execution. For all practical purposes you may not even need the debugger loop at all, because you could create the main thread of the target application suspended (pass CREATE_SUSPENDED to CreateProcess) and then point the CONTEXT of the main thread to your own code at the very beginning, then call ResumeThread(pi.hThread). This way you will be done before the main thread starts. However, this may cause issues due to the way kernel32.dll's CreateThread works (which involves registering the new thread with the Win32 subsystem aka csrss.exe). So it may be advisable to instead patch the IAT of the target in memory or something similar. After all you are merely interested in two functions.
Check out the two articles here and here for a more detailed look at the topic.
I for one prefer writing my debuggers based on PyDbg from PaiMei, but I have admittedly not tried to use such a Python-based debugger in Image File Execution Options.
(1) Would be pure hack which can be hard to do (what aspect of Spy++ behavior does it? Or you want to reinvent complete Syp++?), and even if you do that, how can you be sure the application will work correctly (for all input) after the 'swallowed exception'? Internal state of the program can be undefined and lead to other problems later.
(2) Assuming you don't have sources so you cannot fix it in a normal way, this IMHO seems the to be best work around of the issue.
(3) I can't see how this could help you.
(4) Possible but a lot of work probably. Assuming there are some data on the stack, then by resizing one of them (the OPENFILENAMEA struct) you move offsets of others so you will have to 'fix' references to those.
I am looking for a way to cure at least the symptoms of a leaky DLL i have to use. While the library (OpenCascade) claims to provides a memory manager, i have as of yet being unable to make it release any memory it allocated.
I would at least wish to put the calls to this module in a 'sandbox', in order to keep my application from not losing memory while the OCC-Module isn't even running any more.
My question is: While I realise that it would be an UGLY HACK (TM) to do so, is it possible to preallocate a stretch of memory to be used specifically by the libraries, or to build some kind of sandbox around it so i can track what areas of memory they used in order to release them myself when i am finished?
Or would that be to ugly a hack and I should try to resolve the issues otherwise?
The only reliable way is to separate use of the library into a dedicated process. You will start that process, pass data and parameters to it, run the library code, retrieve results. Once you decide the memory consumption is no longer tolerable you restart the process.
Using a library that isn't broken would probably be much easier, but if a replacement ins't available you could try intercepting the allocation calls. If the library isn't too badly 'optimized' (specifically function inlining) you could disassemble it and locate the malloc and free functions; on loading, you could replace every 4 (or 8 on p64 system) byte sequence that encodes that address with one that points to your own memory allocator. This is almost guaranteed to be a buggy, unreadable timesink, though, so don't do this if you can find a working replacement.
Edit:
Saw #sharptooth's answer, which has a much better chance of working. I'd still advise trying to find a replacement though.
You should ask Roman Lygin's opinion - he used to work at occ. He has at least one post that mentions memory management http://opencascade.blogspot.com/2009/06/developing-parallel-applications-with_23.html.
If you ask nicely, he might even write a post that explains mmgt's internals.
After asking this question (C++: Can I get out of the bounds of my app’s memory with a pointer?),
I decided to accept it isn't possible to modify other app's memory with pointers (with a modern OS).
But if this isn't possible, how do programs like ArtMoney and CheatEngine work?
Thanks
Check these functions:
ReadProcessmemory
WriteProcessmemory
It is possible to read process memory on Windows. There is a function, called ReadProcessMemory in kernel32.dll: http://msdn.microsoft.com/en-us/library/ms680553(v=VS.85).aspx
This is used by most applications that change memory of other applications. It can also be used to communicate between two processes (though mostly not recommended).
CheatEngine is a debugger with a non-traditional interface.
Just to give a plain simple explanation - dump / hot search the process memory for specified value and modify it. You can do it using some plain WinAPI functions or using some native API routines (I suppose so).
That's obviously the reason why they fail, for example, if game state is stored with some encryption. That's also the reason you would need to change your value several times and then make your search again (to avoid search collisions, because definitely different memory blocks could hold the same value).
im trying to learn to modify games in C++ not the game just the memory its using to get ammo whatnot so can someone point me to books
The most convenient way to manipulate a remote process' memory is to create a thread within the context of that program. This is usually accomplished by forcibly injecting a dll into the target process. Once you have code executing inside the target application you are free to use standard memory routines. e.g (memcpy, malloc, memset).
I can tell you right now that the most convenient and easy to implement method is the CreateRemoteThread / LoadLibrary trick.
As other people have mentioned, simple hacks can be performed by scanning memory for known values. But if you want to perform anything more advanced you will need to look into debugging and dead-list analysis. (Tools: ollydbg and IDA pro, respectively).
You have scratched the surface of a very expansive hacking topic, there is a wealth of knowledge out there..
First a few internet resources:
gamedeception.net - A community dedicated to game RE (Reverse Engineering) and hacking.
http://www.edgeofnowhere.cc/viewtopic.php?p=2483118 - An excellent tutorial on various DLL injection methods.
Openrce.org - Community for reverse code engineering.
I can also recommend a book to you - http://www.exploitingonlinegames.com/
Windows API Routines you should research (msdn.com):
CreateRemoteThread
LoadLibraryA
VirtualAllocEx
VirtualProtectEx
WriteProcessMemory
ReadProcessMemory
CreateToolhelp32Snapshot
Process32First
Process32Next
Injecting Code:
I think the best method is to modify the exe to inject code into one of the loaded modules. Check this tutorial
Short related story:
Over 10 years ago though, I do remember successfully modifying my score in solitaire in windows with a simple C++ program. I did this by starting an int * pointer at some base address and iterating through memory addresses (with a try /catch to catch exceptions).
I would look for what my current score was in one of those pointer variables, and replace it with a new integer value. I just made sure that my current score was some obscure value that wouldn't be contained in many memory addresses.
Once I found a set of memory addresses that matched my score, I would change my score manually in solitaire and only look through the memory addresses that were found in the last iteration. Usually this would narrow down to a single memory address that contained the score. At this point I had the magical simple line of code *pCode = MY_DESIRED_SCORE;
This may not be possible anymore though with new memory security models. But the method worked pretty good with a 10-20 line C++ program and it only took about a minute to modify my score.
There are enough programs available that let you modify memory of running programs. And they are often used for cheating. But be carefull using those on online games, because most cheats will be detected and you are banned without a warning.
If you like to create them yourself, just look at books that describe the windows API. You will find enough information there.
It can done using hooks on windows to access the memory space of a process.