How to avoid HDD thrashing - c++

I am developing a large program which uses a lot of memory. The program is quite experimental and I add and remove big chunks of code all the time. Sometimes I will add a routine that is rather too memory hungry and the HDD drive will start thrashing and the program (and the whole system) will slow to a snails pace. It can easily take 5 mins to shut it down!
What I would like is a mechanism for avoiding this scenario. Either a run time procedure or even something to be done before running the program, which can say something like "If you run this program there is a risk of HDD thrashing - aborting now to avoid slowing to a snails pace".
Any ideas?
EDIT: Forgot to mention, my program uses multiple threads.

You could consider using SetProcessWorkingSetSize . This would be useful in debugging, because your app will crash with a fatal exception when it runs out of memory instead of dragging your machine into a thrashing situation.
http://msdn.microsoft.com/en-us/library/ms686234%28VS.85%29.aspx
Similar SO question
Set Windows process (or user) memory limit

Windows XP is terrible when there are multiple threads or processes accessing the disk at the same time. This is effectively what you experience when your application begins to swap, as the OS is writing out some pages while reading in others. Windows XP (and Server 2003 for that matter) is utterly trash for this. This is a real shame, as it means that swapping is almost synonymous with thrashing on these systems.
Your options:
Microsoft fixed this problem in Vista and Server 2008. So stop using a 9 year old OS. :)
Use unbuffered I/O to read/write data to a file, and implement your own paging inside your application. Implementing your own "swap" like this enables you to avoid thrashing.
See here many more details of this problem: How to obtain good concurrent read performance from disk

I'm not familiar with Windows programming, but under Unix you can limit the amount of memory that a program can use with setrlimit(). Maybe there is something similar. The goal is to get the program to abort once it uses to much memory, rather than thrashing. The limit would be a bit less than the total physical memory on the machine. I would guess somewhere between 75% and 90%, but some experimentation would be necessary to find the optimal setting.

Chances are your program could use some memory management. While there are a few programs that do need to hold everything in memory at once, odds are good that with a little bit of foresight you might be able to rework your program to reuse or discard a lot of the memory you need.
Your program will run much faster too. If you are using that much memory, then basically all of your built-in first and second level caches are likely overflowing, meaning the CPU is mostly waiting on memory loads instead of processing your code's instructions.

I'd rather determine reasonable minimum requirements for the computer your program is supposed to run on, and during installation either warn the user if there's not enough memory available, or refuse to install.
Telling him each time he's starting the program is nonsensical.

Related

Release Memory Mapped Memory

I am memory mapping a large file (~200GB) into a single region/view and sequentially writing to it. Every now and then I perform a boost::interprocess::mapped_region::flush(last, current, false).
After a while the process uses up the entire system memory. Which, from what I understand, is normal as it will be releasing the memory as other process request memory.
This works well on Windows 8. However, running on Windows 7 it doesn't seem to play well with the drivers for AJA video cards and it starts affecting performance (dropping IO packets).
Is there any way I can force the Windows 7 to flush parts of the memory to disk (after the data is written it is only interesting for a few seconds, and remember I am writing sequentially through the entire file), as to not use up the entire available system memory?
Flushing has nothing to with reclamation, IYAM. It just makes sure dirty pages are written out (I think you still need a disk sync to make sure it actually /hit the disk/).
So, you're looking for a way to unmap.
Maybe you can use a function like
EmptyWorkingSet to evict as many pages as possible
SetProcessWorkingSetSize to temporarily reduce the allowed process working set.
Of course, in a more portable fashion, you might just get away with unmapping and remapping. If the access is to spinning HDD and remains sequential across remaps, there might not be a performance penalty (there might be though, if the kernel prefetched data e.g. due to madvise() or the windows equivalent thereof)

Memory usage and minimizing

We have a fairly graphical intensive application that uses the FOX toolkit and OpenSceneGraph, and of course C++. I notice that after running the application for some time, it seems there is a memory leak. However when I minimize, a substantial amount of memory appears to be freed (as witnessed in the Windows Task Manager). When the application is restored, the memory usage climbs but plateaus to an amount less than what it was before the minimize.
Is this a huge indicator that we have a nasty memory leak? Or might this be something with how Windows handles graphical applications? I'm not really sure what is going on.
What you are seeing is simply memory caching. When you call free()/delete()/delete, most implementations won't actually return this memory to the OS. They will keep it to be returned in a much faster fashion the next time you request it. When your application is minimized, they will free this memory because you won't be requesting it anytime soon.
It's unlikely that you have an actual memory leak. Task Manager is not particularly accurate, and there's a lot of behaviour that can change the apparent amount of memory that you're using- even if you released it properly. You need to get an actual memory profiler to take a look if you're still concerned.
Also, yes, Windows does a lot of things when minimizing applications. For example, if you use Direct3D, there's a device loss. There's thread timings somethings. Windows is designed to give the user the best experience in a single application at a time and may well take extra cached/buffered resources from your application to do it.
No, there effect you are seeing means that your platform releases resources when it's not visible (good thing), and that seems to clear some cached data, which is not restored after restoring the window.
Doing this may help you find memory leaks. If the minimum amount of memory (while minimized) used by the app grows over time, that would suggest a leak.
You are looking at the working set size of your program. The sum of the virtual memory pages of your program that are actually in RAM. When you minimize your main window, Windows assumes the user won't be interested in the program for a while and aggressively trims the working set. Copying the pages in RAM to the paging file and chucking them out, making room for the other process that the user is likely to start or to switch to.
This number will also go down automatically when the user starts another program that needs a lot of RAM. Windows chucks out your pages to make room for this program. It picks pages that your program hasn't used for a while, making it likely that this doesn't affect the perf of your program much.
When you switch back to your program, Windows needs to swap pages back into RAM. But this is on-demand, it only pages-in pages that your program actually uses. Which will normally be less than what it used before, no need to swap the initialization code of your program back in for example.
Needless to say perhaps, the number has absolutely nothing to do with the memory usage of your program, it is merely a statistical number.
Private bytes would be a better indicator for a memory leak. Taskmgr doesn't show that, SysInternals' ProcMon tool does. It still isn't a great indicator because that number also includes any blocks in the heap that were freed by your program and were added to the list of free blocks, ready to be re-used. There is no good way to measure actual memory in use, read the small print for the HeapWalk() API function for the kind of trouble that causes.
The memory and heap manager in Windows are far too sophisticated to draw conclusions from the available numbers. Use a leak detection tool, like the VC debug allocator (crtdbg.h).

Dealing with large amounts of data in c++

I have an application that sometimes will utilize a large amount of data. The user has the option to load in a number of files which are used in a graphical display. If the user selects more data than the OS can handle, the application crashes pretty hard. On my test system, that number is about the 2 gigs of physical RAM.
What is a good way to handle this situation? I get the "bad alloc" thrown from new and tried trapping that but I still run into a crash. I feel as if I'm treading in nasty waters loading this much data but it is a requirement of this application to handle this sort of large data load.
Edit: I'm testing under a 32 bit Windows system for now but the application will run on various flavors of Windows, Sun and Linux, mostly 64 bit but some 32.
The error handling is not strong: It simply wraps the main instantiation code with a try catch block, the catch looking for any exception per another peer's complaint of not being able to trap the bad_alloc everytime.
I think you guys are right, I need a memory management system that doesn't load all of this data into the RAM, it just seems like it.
Edit2: Luther said it best. Thanks guy. For now, I just need a way to prevent a crash which with proper exception handling should be possible. But down the road I'll be implementing that acception solution.
There is the STXXL library which offers STL like containers for large Datasets.
http://stxxl.sourceforge.net/
Change "large" into "huge". It is designed and optimized for multicore processing of data sets that fit on terabyte-disks only. This might suffice for your problem, or the implementation could be a good starting point to tailor your own solution.
It is hard to say anything about your application crashing, because there are numerous hiccups involved when it comes to tight memory conditions: You could hit a hard address space limit (for example by default 32-bit Windows only has 2GB address space per user process, this can be changed, http://www.fmepedia.com/index.php/Category:Windows_3GB_Switch_FAQ ), or be eaten alive by the OOM killer ( Not a mythical beast:, see http://lwn.net/Articles/104179/ ).
What I'd suggest in any case to think about a way to keep the data on disk and treat the main memory as a kind of Level-4 cache for the data. For example if you have, say, blobs of data, then wrap these in a class which can transparently load the blobs from disk when they are needed and registers to some kind of memory manager which can ask some of the blob-holders to free up their memory before the memory conditions become unbearable. A buffer cache thus.
The user has the option to load in a number of files which are used in a graphical display.
Usual trick is not to load the data into memory directly, but rather use the memory mapping mechanism to make the files look like memory.
You need to make sure that the memory mapping is done in read-only mode to allow the OS to evict it from RAM if it is needed for something else.
If the user selects more data than the OS can handle, the application crashes pretty hard.
Depending on OS it is either: application is missing some memory allocation error handling or you really getting to the limit of available virtual memory.
Some OSs also have an administrative limit on how large the heap of application can grow.
On my test system, that number is about the 2 gigs of physical RAM.
It sounds like:
your application is 32-bits and
your OS uses the 2GB/2GB virtual memory split.
To avoid hitting the limit, your need to:
upgrade your app and OS to 64-bit or
tell OS (IIRC patch for Windows; most Linuxes already have it) to use 3GB/1GB virtual memory split. Some 32-bit OSs are using 2GB/2GB memory split: 2GB of virtual memory for kernel and 2 for the user application. 3/1 split means 1GB of VM for kernel, 3 for the user application.
How about maintaining a header table instead of loading the entire data. Load the actual page when the user requests the data.
Also use some data compression algorithms (like 7zip, znet etc.) which reduce the file size. (In my project they reduced the size from 200MB to 2MB)
I mention this because it was only briefly mentioned above, but it seems a "file paging system" could be a solution. These systems read large data sets in "chunks" by breaking the files into pieces. Once written, they generally "just work" and you hopefully won't have to tinker with them anymore.
Reading Large Files
Variable Length Data in File--Paging
New Link below with very good answer.
Handling Files greater than 2 GB
Search term: "file paging lang:C++" add large or above 2GB for more. HTH
Not sure if you are hitting it or not, but if you are using Linux, malloc will typically not fail, and operator new will typically not throw bad_alloc. This is because Linux will overcommit, and instead kill your process when it decides the system doesn't have enough memory, possibly at a page fault.
See: Google search for "oom killer".
You can disable this behavior with:
echo 2 > /proc/sys/vm/overcommit_memory
Upgrade to a 64-bit CPU, 64-bit OS and 64-bit compiler, and make sure you have plenty of RAM.
A 32-bit app is restricted to 2GB of memory (regardless of how much physical RAM you have). This is because a 32-bit pointer can address 2^32 bytes == 4GB of virtual memory. 20 years ago this seemed like a huge amount of memory, so the original OS designers allocated 2GB to the running application and reserved 2GB for use by the OS. There are various tricks you can do to access more than 2GB, but they're complex. It's probably easier to upgrade to 64-bit.

How to optimize paging for large in memory database

I have an application where the entire database is implemented in memory using a stl-map for each table in the database.
Each item in the stl-map is a complex object with references to other items in the other stl-maps.
The application works with a large amount of data, so it uses more than 500 MByte RAM. Clients are able to contact the application and get a filtered version of the entire database. This is done by running through the entire database, and finding items relevant for the client.
When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine).
After the application have been partly paged out then a client logon takes a long time (10 mins) because it now generates a page fault for each pointer lookup in the stl-map. If running the client logon a second time right after then it is fast (few secs) because all the memory is now back in RAM.
I can see it is possible to tell Windows to lock memory in RAM, but this is generally only recommended for device drivers, and only for "small" amounts of memory.
I guess a poor mans solution could be to loop through the entire memory database, and thus tell Windows we are still interested in keeping the datamodel in RAM.
I guess another poor mans solution could be to disable the pagefile completely on Windows.
I guess the expensive solution would be a SQL database, and then rewrite the entire application to use a database layer. Then hopefully the database system will have implemented means to for fast access.
Are there other more elegant solutions ?
This sounds like either a memory leak, or a serious fragmentation problem. It seems to me that the first step would be to figure out what's causing 500 Mb of data to use up 16 Gb of RAM and still want more.
Edit: Windows has a working set trimmer that actively attempts to page out idle data. The basic idea is that it goes through and marks pages as being available, but leaves the data in them (and the virtual memory manager knows what data is in them). If, however, you attempt to access that memory before it's allocated to other purposes, it'll be marked as being in use again, which will normally prevent it from being paged out.
If you really think this is the source of your problem, you can indirectly control the working set trimmer by calling SetProcessWorkingSetSize. At least in my experience, this is only rarely of much use, but you may be in one of those unusual situations where it's really helpful.
As #Jerry Coffin said, it really sounds like your actual problem is a memory leak. Fix that.
But for the record, none of your "poor mans solutions" would work. At all.
Windows pages out some of your data because there's not room for it in RAM.
Looping through the entire memory database would load in every byte of the data model, yes... which would cause other parts of it to be paged out. In the end, you'd generate a lot of page faults, and the only difference in the end would be which parts of the data structure are paged out.
Disabling the page file? Yes, if you think a hard crash is better than low performance. Windows doesn't page data out because it's fun. It does that to handle situations where it would otherwise run out of memory. If you disable the pagefile, the app will just crash when it would otherwise page out data.
If your dataset really is so big it doesn't fit in memory, then I don't see why an SQL database would be especially "expensive". Unlike your current solution, databases are optimized for this purpose. They're meant to handle datasets too large to fit in memory, and to do this efficiently.
It sounds like you have a memory leak. Fixing that would be the elegant, efficient and correct solution.
If you can't do that, then either
throw more RAM at the problem (the app ends up using 16GB? Throw 32 or 64GB at it then), or
switch to a format that's optimized for efficient disk access (A SQL database probably)
We have a similar problem and the solution we choose was to allocate everything in a shared memory block. AFAIK, Windows doesn't page this out. However, using stl-map here is not for faint of heart either and was beyond what we required.
We are using Boost Shared Memory to implement this for us and it works well. Follow examples closely and you will be up and running quickly. Boost also has Boost.MultiIndex that will do a lot of what you want.
For a no cost sql solution have you looked at Sqlite? They have an option to run as an in memory database.
Good luck, sounds like an interesting application.
I have an application where the entire
database is implemented in memory
using a stl-map for each table in the
database.
That's the start of the end: STL's std::map is extremely memory inefficient. Same applies to std::list. Every element would be allocated separately causing rather serious memory waste. I often use std::vector + sort() + find() instead of std::map in applications where it is possible (more searches than modifications) and I know in advance memory usage might become an issue.
When the application have been running
for an hour or so, then Windows 2003
SP2 starts to page out parts of the
RAM for the application (Eventhough
there is 16 GByte RAM on the machine).
Hard to tell without knowing how your application is written. Windows has the feature to unload from RAM whatever memory of idle applications can be unloaded. But that normally affects memory mapped files and alike.
Otherwise, I would strongly suggest to read up the Windows memory management documentation . It is not very easy to understand, yet Windows has all sorts and types of memory available to applications. I never had luck with it, but probably in your application using custom std::allocator would work.
I can believe it is the fault of flawed pagefile behaviour -i've run my laptops mostly with pagefile turned off since nt4.0. In my experience, at least up to XP Pro, Windows intrusively swaps pages out just to provide the dubious benefit of having a really-really-slow extension to the maximum working set space.
Ask what benefit swapping to harddisk is achieving with 16 Gigabityes of real RAM available? If your working set it so big as to need more virtual memory than +10 Gigs, then once swapping is actualy required processes will take anything from a bit longer, to thousands of times longer to complete. On Windows the untameable file system cache seems to antagonise the relationships.
Now when I (very) occasionaly run out of working set on my XP laptops, there is no traffic jam, the guilty app just crashes. A utility to suspend memory glugging processes before that time and make an alert would be nice, but there is no such thing just a violation, a crash, and sometimes explorer.exe goes down too.
Pagefiles - who needs em'
---- Edit
Given snakefoot explanation, the problem is swapping out memory that is not used for a longer period of time and due to this not having the data in memory when needed. This is the same as this:
Can I tell Windows not to swap out a particular processes’ memory?
and VirtualLock function should do its job:
http://msdn.microsoft.com/en-us/library/aa366895(VS.85).aspx
---- Previous answer
First of all you need to distinguish between memory leak and memory need problems.
If you have a memory leak then it would be bigger effort to convert entire application to SQL than to debug the application.
SQL cannot be faster then a well designed, domain specific in-memory database and if you have bugs, chances are you will have different ones in an SQL version as well.
If this is a memory need problem, then you will need to switch to SQL anyway and this sounds like a good moment.

Random Complete System Unresponsiveness Running Mathematical Functions

I have a program that loads a file (anywhere from 10MB to 5GB) a chunk at a time (ReadFile), and for each chunk performs a set of mathematical operations (basically calculates the hash).
After calculating the hash, it stores info about the chunk in an STL map (basically <chunkID, hash>) and then writes the chunk itself to another file (WriteFile).
That's all it does. This program will cause certain PCs to choke and die. The mouse begins to stutter, the task manager takes > 2 min to show, ctrl+alt+del is unresponsive, running programs are slow.... the works.
I've done literally everything I can think of to optimize the program, and have triple-checked all objects.
What I've done:
Tried different (less intensive) hashing algorithms.
Switched all allocations to nedmalloc instead of the default new operator
Switched from stl::map to unordered_set, found the performance to still be abysmal, so I switched again to Google's dense_hash_map.
Converted all objects to store pointers to objects instead of the objects themselves.
Caching all Read and Write operations. Instead of reading a 16k chunk of the file and performing the math on it, I read 4MB into a buffer and read 16k chunks from there instead. Same for all write operations - they are coalesced into 4MB blocks before being written to disk.
Run extensive profiling with Visual Studio 2010, AMD Code Analyst, and perfmon.
Set the thread priority to THREAD_MODE_BACKGROUND_BEGIN
Set the thread priority to THREAD_PRIORITY_IDLE
Added a Sleep(100) call after every loop.
Even after all this, the application still results in a system-wide hang on certain machines under certain circumstances.
Perfmon and Process Explorer show minimal CPU usage (with the sleep), no constant reads/writes from disk, few hard pagefaults (and only ~30k pagefaults in the lifetime of the application on a 5GB input file), little virtual memory (never more than 150MB), no leaked handles, no memory leaks.
The machines I've tested it on run Windows XP - Windows 7, x86 and x64 versions included. None have less than 2GB RAM, though the problem is always exacerbated under lower memory conditions.
I'm at a loss as to what to do next. I don't know what's causing it - I'm torn between CPU or Memory as the culprit. CPU because without the sleep and under different thread priorities the system performances changes noticeably. Memory because there's a huge difference in how often the issue occurs when using unordered_set vs Google's dense_hash_map.
What's really weird? Obviously, the NT kernel design is supposed to prevent this sort of behavior from ever occurring (a user-mode application driving the system to this sort of extreme poor performance!?)..... but when I compile the code and run it on OS X or Linux (it's fairly standard C++ throughout) it performs excellently even on poor machines with little RAM and weaker CPUs.
What am I supposed to do next? How do I know what the hell it is that Windows is doing behind the scenes that's killing system performance, when all the indicators are that the application itself isn't doing anything extreme?
Any advice would be most welcome.
I know you said you had monitored memory usage and that it seems minimal here, but the symptoms sound very much like the OS thrashing like crazy, which would definitely cause general loss of OS responsiveness like you're seeing.
When you run the application on a file say 1/4 to 1/2 the size of available physical memory, does it seem to work better?
What I suspect may be happening is that Windows is "helpfully" caching your disk reads into memory and not giving up that cache memory to your application for use, forcing it to go to swap. Thus, even though swap use is minimal (150MB), it's going in and out constantly as you calculate the hash. This then brings the system to its knees.
Some things to check:
Antivirus software. These often scan files as they're opened to check for viruses. Is your delay occuring before any data is read by the application?
General system performance. Does copying the file using Explorer also show this problem?
Your code. Break it down into the various stages. Write a program that just reads the file, then one that reads and writes the files, then one that just hashes random blocks of ram (i.e. remove the disk IO part) and see if any particular step is problematic. If you can get a profiler then use this as well to see if there any slow spots in your code.
EDIT
More ideas. Perhaps your program is holding on to the GDI lock too much. This would explain everything else being slow without high CPU usage. Only one app at a time can have the GDI lock. Is this a GUI app, or just a simple console app?
You also mentioned RtlEnterCriticalSection. This is a costly operation, and can hang the system quite easily, i.e. mismatched Enters and Leaves. Are you multi-threading at all? Is the slow down due to race conditions between threads?
XPerf is your guide here - watch the PDC Video about it, and then take a trace of the misbehaving app. It will tell you exactly what's happening throughout the system, it is extremely powerful.
I like the disk-caching/thrashing suggestions, but if that's not it, here are some scattershot suggestions:
What non-MSVC libraries, if any, are you linking to?
Can your program be modified (#ifdef'd) to run without a GUI? Does the problem occur?
You added ::Sleep(100) after each loop in each thread, right? How many threads are you talking about? A handful or hundreds? How long does each loop take, roughly? What happens if you make that ::Sleep(10000)?
Is your program perhaps doing something else that locks a limited resources (ProcExp can show you what handles are being acquired ... of course you might have difficulty with ProcExp not responding:-[)
Are you sure CriticalSections are userland-only? I recall that was so back when I worked on Windows (or so I believed), but Microsoft could have modified that. I don't see any guarantee in the MSDN article Critical Section Objects (http://msdn.microsoft.com/en-us/library/ms682530%28VS.85%29.aspx) ... and this leads me to wonder: Anti-convoy locks in Windows Server 2003 SP1 and Windows Vista
Hmmm... presumably we're all multi-processor now, so are you setting the spin count on the CS?
How about running a debugging version of one of these OSes and monitoring the kernel debugging output (using DbgView)... possibly using the kernel debugger from the Platform SDK ... if MS still calls it that?
I wonder whether VMMap (another SysInternal/MS utility) might help with the Disk caching hypothesis.
It turns out that this is a bug in the Visual Studio compiler. Using a different compiler resolves the issue entirely.
In my case, I installed and used the Intel C++ Compiler and even with all optimizations disabled I did not see the fully-system hang that I was experiencing w/ the Visual Studio 2005 - 2010 compilers on this library.
I'm not certain as to what is causing the compiler to generate such broken code, but it looks like we'll be buying a copy of the Intel compiler.
It sounds like you're poking around fixing things without knowing what the problem is. Take stackshots. They will tell you what your program is doing when the problem occurs. It might not be easy to get the stackshots if the problem occurs on other machines where you cannot use an IDE or a stack sampler. One possibility is to kill the app and get a stack dump when it's acting up. You need to reproduce the problem in an environment where you can get a stack dump.
Added: You say it performs well on OSX and Linux, and poorly on Windows. I assume the ratio of completion time is some fairly large number, like 10 or 100, if you've even had the patience to wait for it. I said this in the comment, but it is a key point. The program is waiting for something, and you need to find out what. It could be any of the things people mentioned, but it is not random.
Every program, all the time while it runs, has a call stack consisting of a hierarchy of call instructions at specific addresses. If at a point in time it is calculating, the last instruction on the stack is a non-call instruction. If it is in I/O the stack may reach into a few levels of library calls that you can't see into. That's OK. Every call instruction on the stack is waiting. It is waiting for the work it requested to finish. If you look at the call stack, and look at where the call instructions are in your code, you will know what your program is waiting for.
Your program, since it is taking so long to complete, is spending nearly all of its time waiting for something to finish, and as I said, that's what you need to find out. Get a stack dump while it's being slow, and it will give you the answer. The chance that it will miss it is 1/the-slowness-ratio.
Sorry to be so elemental about this, but lots of people (and profiler makers) don't get it. They think they have to measure.