How to diagnose / profile momentary performance hit, C++

How to diagnose / profile momentary performance hit, C++ - c++

Solved: For when simple profiling isn't effective enough, I have written a tool to show me where performance hits occur. Basic information about how the tool works is in the accepted answer below. The source can be found here: http://pastebin.com/ETiW8hE8 (be sure to turn debugging symbols on in the program you're testing)
I've built a game engine in C++ and I have noticed in one particular area of a level that there is a brief performance hit. The game will stop completely for about half a second, and then continue on merrily. I've tried to profile this, but it's difficult isolate the condition since I also have to load the map and perform the in-game task which causes the performance hit. I can make a map load automatically and skip showing menus, etc, and comparing those profile results against a set of similar control data (all the same steps but without actually initiating the performance hit), but it doesn't show anything obvious.
I'm using gmon to profile.
This is a large application with many, many classes and functions. The performance hit only happens once, so there's no way to just trigger the problem many times during one execution to saturate my profiling results in order to make the offending functions more obvious in the profiling results.
What else can I do?

What I would do is try to grab a stack sample in that half second when it's frozen.
This would require an alarm clock timer set to go off some small time in the future, like 100ms.
Then in some loop, like the frame display loop, that normally takes less than 100ms to repeat, keep resetting the timer.
That way, it will act as a watchdog that barks if you don't keep petting it.
Then, stick a breakpoint in the timer interrupt handler.
When it gets there, you know you're in the bad slice of time.
Then just display the call stack, and it should show you what the problem is.
You might have to repeat the process a few times.

You are not saying anything about whether your application is threaded, but I will assume that it is not.
As per suggestion from mike, get insights by getting a stack trace at and see where it is freezing, you can do that with a bit of luck using pstack, so
while usleep 100000; do
pstack processid
done >/tmp/stack.log
Should give you some output to go on -- my guess is that you are calling a blocking IO operation, like reading some assets from disk.

Related

Is it possible to suspend all Windows Defender tasks when my program runs, and resume when it's done?

I have a performance-sensitive program that I would like to run as stably as possible, thus I'm wanting to disable/suspend MsMpEng.exe, among a few others, to hopefully attain that on Windows 10 when my program starts. When the program finishes, I'd like to restore normal previous function.
I have tried directly suspending the process using resmon.exe (Resource Monitor), and it suspends... but 20-30 seconds later, the entire system just stops. I assume this is some form of self-protect... so at the very least, I'd have to suspend and resume in a timed loop.
Thoughts? Is it even worth the trouble?
EDIT: Gave it some thought and some test cases, and just adjusting process priority isn't quite enough, but it's better than nothing. I'll just recommend people disable their virus protection if they encounter slow downs unless anyone else has any suggestions.

Basic Protection of a Game Client

I currently have a multiplayer game that players are starting to use memory editing to to cancel attack animation making the attack packets come-in faster or making the attacks a lot faster than normal.
Yes a better design would be ideal but that could take a while. I wanted to get a temporary fix that can be done quick.
The ideas:
Check time difference between the last attack packet ignore everything that is too fast. (for server)
Use EnumWindows check for window classes and stop the game if a known memory editor is detected. EnumWindows will be executed each time an attack is made. (for client)
Use ReadProcessMemory to read running processes and find signitures for known memory editors.
Well the question really is if any of the following could work and how it would be done:
Detour ReadProcessMemory or OpenProcess and exit when called? (though I think this wont work because these functions gets called by the memory editor not my game).
ReadProcessMemory on my self(game) and check the addresses that they are changing. Check if the values are not within the normal range then exit.
Any suggestions?
I know that it is futile to do this because cheaters that knows what their doing can still get around all this. But my game has only about 600 active players, I believe they are just somewhat scriptkiddies. I think this simple countermeasures should be enough for small games like mine. But of course, the design will be corrected.

Detour ReadProcessMemory or OpenProcess and exit when called?
These are not being called in your process so hooking them locally wouldn't do anything. You would need to hook every running process, which is not recommended.
ReadProcessMemory on my self(game) and check the addresses that they are changing. Check if the values are not within the normal range then exit.
You don't need to ReadProcessMemory, you're inside your own process. Just check the value normally.
You should calculate these values on the server if you don't want the client's to be able to manipulate them, then just replicate this info to the clients and overwrite them.
You can also add an antidebug library to your client to prevent the majority of people from manipulating your process. Here is a decent one

C++: How to set a timeout (not reading input, not threaded)?

Got a large C++ function in Linux that calls a whole lot of other functions, making up an algorithm. At various points given certain bad inputs, the algorithm can get "stuck" and go on forever. Adding a timeout seems appropriate as all potential "stuck" points cannot be predicted. But despite scouring the Internet for timeout examples I've only found how to apply timeouts when either the thing your timing is a separate thread or it's reading inputs. My code is a single thread and does not modify file descriptors, so not coming up with any luck. Do I basically have no choice but to thread it?

I am not sure about the situation, actually server applications or embedded applications often run for years in background without stopping. I think one option is to let your program run in background and log to a file(or screen) timely, and, if you really want to stop the program after certain time, you can use timeout command or a script to kill your program after that time, say, timeout 15s your-prog.

why is empty while loop using more cpu?

I have two programs that are supposed to do the same thing with slight differences. Both have infinite game loops that runs forever unless user stops the game somehow. One of these programs' game loop is implemented and rendering something, the other game loop is empty and does nothing(just listens for user to stop).
When i opened the task manager to see resource usage, i have discovered that the program with the empty loop uses 14% CPU and the program that actually draws something to screen uses about 1-2%.
My guess on the subject is as follows:
I compared the code of the both programs and looked for differences and there was not much. Then it occurred to me that the loop that renders to screen might be bound by other factors(like sending pixels to the screen, refresh rate maybe?) So after CPU does its thing, it puts that thread to sleep until other stuff is completed. But since other program does pretty much nothing and doing nothing is really easy, CPU never puts that thread to sleep and just keeps going. I lack the knowledge to confirm that if this is the reason, so i am asking you. Is this the reason this is happening? (Bonus question) And if so, why does the CPU stop at about 14% and not going all the way up to 100% ?
Thank you.

Hard to say for certain without seeing the code, but drawing to the screen will, inevitably involve some wait on IO; how much depends on many factors including sync + buffering options.
As for the 14% cpu usage - I'm guessing that your machine has 8 processing units (either cores or cores * hyperthreading) and your code is singlethreaded - i.e. it is maxing out one processing unit.

Platform independent parallelization without changing the framework?

I hope the title did not mislead you.
My problem is the following: Currently I try to speed up a raytracer and this is done with the help of the graphics card. It works fine despite the fact that it got slower by this. :)
This is caused by the fact, that I trace one ray on the whole geometry at once on the graphics card(my "tracing server") and then fetch the results, which is awfully slow, so I have to gather some rays and calc them and fetch the results together to speed this up.
The next problem is, that I am not allowed to rewrite the surrounding framework that should know nothing or least possible about this parallelization.
So here is my approach:
I thought about using several threads, where each one gets a ray and requests my "tracing server" to calc the intersections. Then the thread is stopped until enough rays were gathered to calc the intersections on the graphics card and get the results back efficiently. This means that each thread will wait until the results were fetched.
You see I already have some plan but following I do not know:
Which threading framework should I take to be platformindependent?
Should I use a threadpool of fixed size or create them as needed?
Can any given thread library handle at least 1000 waiting threads(because that would be the number that I need to gather for my fetch to be efficient)?
But I also could imagine doing this with one thread that
dumps its load (a new ray) to the "tracing server" and fetches the next load until
there is enough to fetch the results.
Then the thread would take the results one by one, do the further calculations until all results are processed and then goes back to step one until all rays are done.
Also if you have some better idea how to parallelize this, tell me about it.
Regards,
Nobody
PS
If you need this information: The two platforms I want to use are Linux and Windows.

use either Thread Building Blocks or boost::thread.
http://www.boost.org/doc/libs/1_46_0/doc/html/thread.html
http://threadingbuildingblocks.org/
As far as threadpool/on-demand-threads - threadpool is generally better idea as it avoids creation overhead.
Number of waiting threads is gonna depend on the underlying system more than anything else:
Maximum number of threads per process in Linux?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js