I currently have a rather small and simple C++ application using SDL2 on the most recent version of OS X. The only thing it's really doing is listening for a couple keyboard events and drawing a single, unfilled white square via a SDL_Renderer. It uses lambdas for things such as handling each frame's "tick" and rendering.
When I start running this program, XCode reports that it uses about 14.5MB of memory almost immediately. But then the memory usage slowly starts to increase, getting slower over time. Eventually, after a few minutes, it hits 18MB total, and pretty much stays there. Sometimes it will hit 18.1, but eventually it will drop back down to 18.
This behavior confuses me, because I'm not allocating anything in the my code except at the beginning. The only places where new memory is being used, is when I allocate an SDL_Rect on the stack in the render lambda, and a couple integers/floats in the main loop. This happens even when I don't cause any SDL_Events to be fired via keyboard, mouse, etc. If anyone can offer any suggestions as to why this might be happening, I would appreciate it.
Note that this doesn't seem to be a memory leak or anything dangerous, and is largely an academic question wherein I want to understand the behavior of what is going on. If you want me to put my code somewhere, I can.
Related
I was talking with a friend and he said that they just slow down the closing time of the program and are useless since operating systems will automatically clean everything up and will do it faster.
Since some programs could crash or be killed, OS have to ensure freeing what it could. Sometimes that is not enough - e.g. if you've disabled screensaver, would it be enabled back? If you set fullscreen videomode with different resolution, would resolution reset after your program close? If you use rumble on gamepad, would it stop, and if so, when? There are a lot of questions and no answers, as all that is OS dependent.
And sometimes you actually need to close window without finishing your program.
If you happen to corrupt memory heap or stomped data, explicit free is more likely to crash your program, so you may notice it earlier. Look up famous "Thank you for playing Wing Commander!" story.
You pretty much lose ability to use memory leak detector like valgrind as it would report quite a lot of leaks if you don't do finalisation.
More important question is why would you want to avoid explicit finalisation. To presumably save some minuscule time? By doing finalisation on your side you slightly reduce amount of work other OS parts will do on process finish, so - probably yes, but very unlikely to be noticeable. But it is likely to raise questions if someone else will look at your code (that someone is probably yourself several years later).
I am currently learning how to use Direct2D to create an app, and I was wondering if placing a lot of 'non-graphical' code after calling "BeginDraw()" would matter. I do not fully understand what "BeginDraw()" actually does, so my question is mostly regarding time of execution. Does it slow other processes? Is it 'eating' the CPU after being called until "EndDraw()" is called ?
You don't get an extra cost over the period spent within BeginDraw.
You get a cost for calling it, just like any other function, as it needs to prepare stuff and deal with states. In this respect, less BeginDraw blocks is better than more.
But you don't get a cost by virtue of the block being longer scope. In fact, it might even be beneficial to call it earlier as it gives the Direct2D lib time to do background processing, if needed. Might just involve fetching memory. I'd expect this to be marginal or no impact, but doing it earlier, if that's the only change, can help not hurt.
In practice, try to use as few blocks as possible. And use any command buffering API available as oppposed to immediate draw commands. (sorry for lack of details here, not so familiar with DirectDraw myself, more of an OpenGL/Vulkan dev)
You'll be pleased to know that BeginDraw isn't consuming CPU time after it is called. Some people profile their game or application and get the impression that calls like ID2DDeviceContext.BeginDraw (it's usually EndDraw actually) and IDXGISwapChain.Present are consuming a lot of CPU. The reality is that it only appears that way because they're two of the most commonly made calls in such an application. There's also a good chance some could be misinterpreting profiling reports.
Put simply, BeginDraw and EndDraw just defines when you're able to make drawing calls to your DeviceContext or RenderTarget. When to call it exactly depends upon what sort of application you have. If it has a game loop, call it at the start of your main Draw method. You can write a whole game that makes only a single call to BeginDraw and EndDraw each.
I haven't been able to create a Qt GUI app that didn't have over 1K 'definitely lost' bytes in valgrind. I have experimented with this, making minimal apps that just show one QWidget, that extend QMainWindow; that just create a QApplication object without showing it or without executing it or both, but they always leak.
Trying to figure this out I have read that it's because X11 or glibc has bugs, or because valgrind gives false positives. And in one forum thread it seemed to be implied that creating a QApplication-object in the main function and returning the object's exec()-function, as is done in tutorials, is a "simplified" way to make GUIs (and not necessarily good, perhaps?).
The valgrind output does indeed mention libX11 and libglibc, and also libfontconfig. The rest of the memory losses, 5 loss records, occurs at ??? in libQtCore.so during QLibrary::setFileNameAndVersion.
If there is a more appropriate way to create GUI apps that prevents even just some of this from happening, what is it?
And if any of the valgrind output is just noise, how do I create a suppression file that suppresses the right things?
EDIT: Thank you for comments and answers!
I'm not worrying about the few lost kB themselves, but it'll be easier to find my own memory leaks if I don't have to filter several screens of errors but can normally get an "OK" from valgrind. And if I'm going to suppress warnings, I'd better know what they are, right?
Interesting to see how accepted leaks can be!
It is not uncommon for large-scale multi-thread-capable libraries such as QT, wxWidgets, X11, etc. to set up singleton-type objects that initialize once when a process is started and then make no attempt to effort to clean up the allocation when the process shuts down.
I can assure you that anything "leaked" from a function such as QLibrary::setFileNameAndVersion() has been left so intentionally. The bits of memory left behind by X11/glibc/fontConfig are probably not bugs either.
It could be seen as bad coding practice or etiquette, but it can also greatly simplify certain types of tasks. Operating systems these days offer a very strong guarantee for cleaning up any memory or resources left open by a process when its killed (either gracefully or by force), and if the allocation in question is very likely to be needed for the duration of the application, including shutdown procedures -- and various core components of QT would qualify -- then it can be a boon to performance to have the library set up some memory allocations as soon as it is loaded/initialized, and allow those to persist indefinitely. Among other things, this allows the memory to be present for use by any other C++ destructors that might reference that memory.
Since those allocations are only set up once, and from one point in the code, there is no risk of a meaningful memory leak. Its just memory that belongs to the process and is thus cleaned up when the process is closed by the operating system.
Conclusion: if the memory leak isn't in your code, and it doesn't appear to get significantly larger over time (and by significant these days, think megabytes), and/or is clearly orginating from first-time initialization setup code that is only ever invoked once within your app, then don't worry about it. It is probably intentional.
One way to test this can be to run your code inside a loop, and vary the number of iterations. If the difference between allocs and frees is independent on the number of iterations, you are likely to be safe.
We have a fairly graphical intensive application that uses the FOX toolkit and OpenSceneGraph, and of course C++. I notice that after running the application for some time, it seems there is a memory leak. However when I minimize, a substantial amount of memory appears to be freed (as witnessed in the Windows Task Manager). When the application is restored, the memory usage climbs but plateaus to an amount less than what it was before the minimize.
Is this a huge indicator that we have a nasty memory leak? Or might this be something with how Windows handles graphical applications? I'm not really sure what is going on.
What you are seeing is simply memory caching. When you call free()/delete()/delete, most implementations won't actually return this memory to the OS. They will keep it to be returned in a much faster fashion the next time you request it. When your application is minimized, they will free this memory because you won't be requesting it anytime soon.
It's unlikely that you have an actual memory leak. Task Manager is not particularly accurate, and there's a lot of behaviour that can change the apparent amount of memory that you're using- even if you released it properly. You need to get an actual memory profiler to take a look if you're still concerned.
Also, yes, Windows does a lot of things when minimizing applications. For example, if you use Direct3D, there's a device loss. There's thread timings somethings. Windows is designed to give the user the best experience in a single application at a time and may well take extra cached/buffered resources from your application to do it.
No, there effect you are seeing means that your platform releases resources when it's not visible (good thing), and that seems to clear some cached data, which is not restored after restoring the window.
Doing this may help you find memory leaks. If the minimum amount of memory (while minimized) used by the app grows over time, that would suggest a leak.
You are looking at the working set size of your program. The sum of the virtual memory pages of your program that are actually in RAM. When you minimize your main window, Windows assumes the user won't be interested in the program for a while and aggressively trims the working set. Copying the pages in RAM to the paging file and chucking them out, making room for the other process that the user is likely to start or to switch to.
This number will also go down automatically when the user starts another program that needs a lot of RAM. Windows chucks out your pages to make room for this program. It picks pages that your program hasn't used for a while, making it likely that this doesn't affect the perf of your program much.
When you switch back to your program, Windows needs to swap pages back into RAM. But this is on-demand, it only pages-in pages that your program actually uses. Which will normally be less than what it used before, no need to swap the initialization code of your program back in for example.
Needless to say perhaps, the number has absolutely nothing to do with the memory usage of your program, it is merely a statistical number.
Private bytes would be a better indicator for a memory leak. Taskmgr doesn't show that, SysInternals' ProcMon tool does. It still isn't a great indicator because that number also includes any blocks in the heap that were freed by your program and were added to the list of free blocks, ready to be re-used. There is no good way to measure actual memory in use, read the small print for the HeapWalk() API function for the kind of trouble that causes.
The memory and heap manager in Windows are far too sophisticated to draw conclusions from the available numbers. Use a leak detection tool, like the VC debug allocator (crtdbg.h).
I have a program that loads a file (anywhere from 10MB to 5GB) a chunk at a time (ReadFile), and for each chunk performs a set of mathematical operations (basically calculates the hash).
After calculating the hash, it stores info about the chunk in an STL map (basically <chunkID, hash>) and then writes the chunk itself to another file (WriteFile).
That's all it does. This program will cause certain PCs to choke and die. The mouse begins to stutter, the task manager takes > 2 min to show, ctrl+alt+del is unresponsive, running programs are slow.... the works.
I've done literally everything I can think of to optimize the program, and have triple-checked all objects.
What I've done:
Tried different (less intensive) hashing algorithms.
Switched all allocations to nedmalloc instead of the default new operator
Switched from stl::map to unordered_set, found the performance to still be abysmal, so I switched again to Google's dense_hash_map.
Converted all objects to store pointers to objects instead of the objects themselves.
Caching all Read and Write operations. Instead of reading a 16k chunk of the file and performing the math on it, I read 4MB into a buffer and read 16k chunks from there instead. Same for all write operations - they are coalesced into 4MB blocks before being written to disk.
Run extensive profiling with Visual Studio 2010, AMD Code Analyst, and perfmon.
Set the thread priority to THREAD_MODE_BACKGROUND_BEGIN
Set the thread priority to THREAD_PRIORITY_IDLE
Added a Sleep(100) call after every loop.
Even after all this, the application still results in a system-wide hang on certain machines under certain circumstances.
Perfmon and Process Explorer show minimal CPU usage (with the sleep), no constant reads/writes from disk, few hard pagefaults (and only ~30k pagefaults in the lifetime of the application on a 5GB input file), little virtual memory (never more than 150MB), no leaked handles, no memory leaks.
The machines I've tested it on run Windows XP - Windows 7, x86 and x64 versions included. None have less than 2GB RAM, though the problem is always exacerbated under lower memory conditions.
I'm at a loss as to what to do next. I don't know what's causing it - I'm torn between CPU or Memory as the culprit. CPU because without the sleep and under different thread priorities the system performances changes noticeably. Memory because there's a huge difference in how often the issue occurs when using unordered_set vs Google's dense_hash_map.
What's really weird? Obviously, the NT kernel design is supposed to prevent this sort of behavior from ever occurring (a user-mode application driving the system to this sort of extreme poor performance!?)..... but when I compile the code and run it on OS X or Linux (it's fairly standard C++ throughout) it performs excellently even on poor machines with little RAM and weaker CPUs.
What am I supposed to do next? How do I know what the hell it is that Windows is doing behind the scenes that's killing system performance, when all the indicators are that the application itself isn't doing anything extreme?
Any advice would be most welcome.
I know you said you had monitored memory usage and that it seems minimal here, but the symptoms sound very much like the OS thrashing like crazy, which would definitely cause general loss of OS responsiveness like you're seeing.
When you run the application on a file say 1/4 to 1/2 the size of available physical memory, does it seem to work better?
What I suspect may be happening is that Windows is "helpfully" caching your disk reads into memory and not giving up that cache memory to your application for use, forcing it to go to swap. Thus, even though swap use is minimal (150MB), it's going in and out constantly as you calculate the hash. This then brings the system to its knees.
Some things to check:
Antivirus software. These often scan files as they're opened to check for viruses. Is your delay occuring before any data is read by the application?
General system performance. Does copying the file using Explorer also show this problem?
Your code. Break it down into the various stages. Write a program that just reads the file, then one that reads and writes the files, then one that just hashes random blocks of ram (i.e. remove the disk IO part) and see if any particular step is problematic. If you can get a profiler then use this as well to see if there any slow spots in your code.
EDIT
More ideas. Perhaps your program is holding on to the GDI lock too much. This would explain everything else being slow without high CPU usage. Only one app at a time can have the GDI lock. Is this a GUI app, or just a simple console app?
You also mentioned RtlEnterCriticalSection. This is a costly operation, and can hang the system quite easily, i.e. mismatched Enters and Leaves. Are you multi-threading at all? Is the slow down due to race conditions between threads?
XPerf is your guide here - watch the PDC Video about it, and then take a trace of the misbehaving app. It will tell you exactly what's happening throughout the system, it is extremely powerful.
I like the disk-caching/thrashing suggestions, but if that's not it, here are some scattershot suggestions:
What non-MSVC libraries, if any, are you linking to?
Can your program be modified (#ifdef'd) to run without a GUI? Does the problem occur?
You added ::Sleep(100) after each loop in each thread, right? How many threads are you talking about? A handful or hundreds? How long does each loop take, roughly? What happens if you make that ::Sleep(10000)?
Is your program perhaps doing something else that locks a limited resources (ProcExp can show you what handles are being acquired ... of course you might have difficulty with ProcExp not responding:-[)
Are you sure CriticalSections are userland-only? I recall that was so back when I worked on Windows (or so I believed), but Microsoft could have modified that. I don't see any guarantee in the MSDN article Critical Section Objects (http://msdn.microsoft.com/en-us/library/ms682530%28VS.85%29.aspx) ... and this leads me to wonder: Anti-convoy locks in Windows Server 2003 SP1 and Windows Vista
Hmmm... presumably we're all multi-processor now, so are you setting the spin count on the CS?
How about running a debugging version of one of these OSes and monitoring the kernel debugging output (using DbgView)... possibly using the kernel debugger from the Platform SDK ... if MS still calls it that?
I wonder whether VMMap (another SysInternal/MS utility) might help with the Disk caching hypothesis.
It turns out that this is a bug in the Visual Studio compiler. Using a different compiler resolves the issue entirely.
In my case, I installed and used the Intel C++ Compiler and even with all optimizations disabled I did not see the fully-system hang that I was experiencing w/ the Visual Studio 2005 - 2010 compilers on this library.
I'm not certain as to what is causing the compiler to generate such broken code, but it looks like we'll be buying a copy of the Intel compiler.
It sounds like you're poking around fixing things without knowing what the problem is. Take stackshots. They will tell you what your program is doing when the problem occurs. It might not be easy to get the stackshots if the problem occurs on other machines where you cannot use an IDE or a stack sampler. One possibility is to kill the app and get a stack dump when it's acting up. You need to reproduce the problem in an environment where you can get a stack dump.
Added: You say it performs well on OSX and Linux, and poorly on Windows. I assume the ratio of completion time is some fairly large number, like 10 or 100, if you've even had the patience to wait for it. I said this in the comment, but it is a key point. The program is waiting for something, and you need to find out what. It could be any of the things people mentioned, but it is not random.
Every program, all the time while it runs, has a call stack consisting of a hierarchy of call instructions at specific addresses. If at a point in time it is calculating, the last instruction on the stack is a non-call instruction. If it is in I/O the stack may reach into a few levels of library calls that you can't see into. That's OK. Every call instruction on the stack is waiting. It is waiting for the work it requested to finish. If you look at the call stack, and look at where the call instructions are in your code, you will know what your program is waiting for.
Your program, since it is taking so long to complete, is spending nearly all of its time waiting for something to finish, and as I said, that's what you need to find out. Get a stack dump while it's being slow, and it will give you the answer. The chance that it will miss it is 1/the-slowness-ratio.
Sorry to be so elemental about this, but lots of people (and profiler makers) don't get it. They think they have to measure.