How do I set the ideal QPixmapCache::cacheLimit? - c++

I have just started using QPixmapCache and I was wondering, since there is not much documentation, about how to adjust the size based on the system the application is running on.
Some users might have lots of free memory while others have very little. I have no idea what the best setting would be.
What would be the best way to detect the system (free) RAM and adjust the cache size to fit?
http://doc.trolltech.com/4.4/qpixmapcache.html#setCacheLimit

To detect free RAM in Windows, you can use the GlobalMemoryStatus function.
I'm not sure if this will help you size the pixmap cache; perhaps you will need to do some performance measurements and create a lookup table.

Note that QPixmap is window-system specific. This means that QPixmap likely corresponds to graphics card memory more than the general RAM. Just like any other cache, the size of it should "as much as it needs to be, but not more".
The best way to tune the cache is to profile your application running on a typical machine (of the target users). See if the cache starts to trash old pixmaps and get cache misses.

Related

Is there a way to code data directly to the hard drive (similar to how one can do with RAM)?

My question concerns C/C++. It is possible to manipulate the data on the RAM with pretty great flexibility. You can also give the GPU direct commands using OpenGL, allowing one to manipulate VRAM as well.
My curiosity is whether it is possible to do this to the hard drive (even though this would likely be a horrible idea with many, many possibilities of corrupting existing data). The logic of my question comes from an assumption that the hard drive is similar to RAM and VRAM (bytes of data), but just accesses data slower.
I'm not asking about how to perform file IO, but instead how to directly modify bytes of memory on the hard drive (maybe via some sort of "hard-drive pointer").
If my assumption is totally off, a detailed correction about how the hard drive's data storage is different from RAM or VRAM would be very helpful. Thank you!
Modern operating systems in combination with modern CPUs offer the ability to memory-map disk clusters to memory pages.
The memory pages are initially marked as invalid, and as soon as you try to access them an invalid page "trap" or "interrupt" occurs, which is handled by the operating system, which loads the corresponding cluster into that memory page.
If you write to that page there is either a hardware-supported "dirty" bit, or another interrupt mechanism: the memory page is initially marked as read-only, so the first time you try to write to it there is another interrupt, which simply marks the page as dirty and turns it read-write. Then, you know that the page needs to be flushed to disk at a convenient time.
Note that reading and writing is usually done via Direct Memory Access (DMA) so the CPU is free to do other things while the pages are being transferred.
So, yes, you can do it, either with the help of the operating system, or by writing all that very complex code yourself.
Not for you. Being able to write directly to the hard drive would give you infinite potential to mess up things beyond all recognition. (The technical term is FUBAR, and the F doesn't stand for Mess).
And if you write hard disk drivers, I sincerely hope you are not trying to ask for help here.

Limit buffer cache used for mmap

I have a data structure that I'd like to rework to page out on-demand. mmap seems like an easy way to run some initial experiments. However, I want to limit the amount of buffer cache that the mmap uses. The machine has enough memory to page the entire data structure into cache, but for test reasons (and some production reasons too) I don't want to allow it to do that.
Is there a way to limit the amount of buffer cache used by mmap?
Alternatively, an mmap alternative that can achieve something similar and still limit memory usage would work too.
From my understanding, it is not possible. Memory mapping is controlled by the operating system. The kernel will make the decisions how to use the available memory in the best way, but it looks at the system in total. I'm not aware that quotas for caches on a process level are supported (at least, I have not seen such APIs in Linux or BSD).
There is madvise to give the kernel hints, but it does not support to limit the cache used for one process. You can give it hints like MADV_DONTNEED, which will reduce the pressure on the cache of other applications, but I would expect that it will do more harm than good, as it will most likely make caching less efficient, which will lead to more IO load on the system in total.
I see only two alternatives. One is trying to solve the problem at the OS level, and the other is to solve it at the application level.
At the OS level, I see two options:
You could run a virtual machine, but most likely this is not what you want. I would also expect that it will not improve the overall system performance. Still, it would be at least a way to define upper limits on the memory consumption.
Docker is the another idea that comes to mind, also operating at the OS level, but to the best of my knowledge, it does not support defining cache quotas. I don't think it will work.
That leaves only one option, which is to look at the application level. Instead of using memory mapped files, you could use explicit file system operations. If you need to have full control over the buffer, I think it is the only practical option. It is more work than memory mapping, and it is also not guaranteed to perform better.
If you want to stay with memory mapping, you could also map only parts of the file in memory and unmap other parts when you exceed your memory quota. It also has the same problem as the explicit file IO operations (more implementation work and non-trivial tuning to find a good caching strategy).
Having said that, you could question the requirement to limit the cache memory usage. I would expect that the kernel does a pretty good job at allocating memory resources in a good way. At least, it will likely be better than the solutions that I have sketched. (Explicit file IO, plus an internal cache, might be fast, but it is not trivial to implement and tune. Here, is a comparison of the trade-offs: mmap() vs. reading blocks.)
During testing, you could run the application with ionice -c 3 and nice -n 20 to somewhat reduce the impact on the other productive applications.
There is also a tool called nocache. I never used it but when reading through its documentation, it seems somewhat related to your question.
It might be possible to accomplish this through the use of mmap() and Linux Control Groups (more generally, here or here). Once installed, you have the ability to create arbitrary limits on the amount of, among other things, physical memory used by a process. As an example, here we limit the physical memory to 128 megs and swap memory to 256 megs:
cgcreate -g memory:/limitMemory
echo $(( 128 * 1024 * 1024 )) > /sys/fs/cgroup/memory/limitMemory/memory.limit_in_bytes
echo $(( 256 * 1024 * 1024 )) > /sys/fs/cgroup/memory/limitMemory/memory.memsw.limit_in_bytes
I would go the route of only map parts of the file at a time so you can retain full control on exactly how much memory is used.
you may use ipc shared memory segment, you will be the master of your memory segments.

Dumping buffered data when memory limit is near

My application buffers data for likely requests in the background. Currently I limit the size of the buffer based on a command-line parameter, and begin dumping less-used data when we hit this limit. This is not ideal because it relies on the user to specify a performance-critical parameter. Is there a better way to handle this? Is there a way to automatically monitor system memory use and dump the oldest/least-recently-used data before the system starts to thrash?
A complicating factor here is that my application runs on Linux, OSX, and Windows. But I'll take a good way to do this on only one platform over nothing.
Your best bet would likely be to monitor your applications working set/resident set size, and try to react when it doesn't grow after your allocations. Some pointers on what to look for:
Windows: GetProcessMemoryInfo
Linux: /proc/self/statm
OS X: task_info()
Windows also has GlobalMemoryStatusEx which gives you a nice Available Physical Memory figure.
I like your current solution. Letting the user decide is good. It's not obvious everyone would want the buffer to be as big as possible, is it? If you do invest in implemting some sort of memory monitor for automatically adjusting the buffer/cache size, at least let the user choose between the user set limit and the automatic/dynamic one.
I know this isn't a direct answer, but I'd say step back a bit and maybe don't do this.
Even if you have the API to see current physical memory usage, that's not enough to choose an ideal cache size. That would depend on your typical and future workloads for both the program and the machine (and the overall system of all clients running this program + the server(s) they're querying), the platform's caching behavior, whether the system should be tuned for throughput or latency, and so on. In a tight memory situation, you're going to be competing for memory with other opportunistic caches, including the OS's disk cache. On the one hand, you want to be exerting some pressure on them, to force out other low-value data. On the other hand, if you get greedy while there's plenty of memory, you're going to be affecting the behavior of other adaptive caches.
And with speculative caching/prefetching, the LRU value function is odd: you will (hopefully) fetch the most-likely-to-be-called data first, and less-likely data later, so the LRU data in your prefetch cache may be less valuable than older data. This could lead to perverse behavior in the systemwide set of caches by artificially "heating up" less commonly used data.
It seems unlikely that your program would be able to make a cache size choice better than a simple fixed size, perhaps scaled based on the size of overall physical memory on the machine. And there's very little chance it could beat a sysadmin who knows the machine's typical workload and its performance goals.
Using an adaptive cache sizing strategy means that your program's resource usage is going to be both variable and unpredictable. (With respect to both memory and the I/O and server requests used to populate that prefetch cache.) For a lot of server situations, that's not good. (Especially in HPC or DB servers, which this sounds like it might be for, or a high-utilization/high-throughput environment.) Consistency, configurability, and availability are often more important than maximum resource utilization. And locality of reference tends to fall off quickly, so you're likely getting very diminishing returns with larger cache sizes. If this is going to be used server-side, at least leave the option for explicit control of cache sizes, and probably make that the default, if not only, option.
There is a way: it is called virtual memory (vm). All three operating systems listed will use virtual memory (vm), unless there is no hardware support (which may be true in embedded systems). So I will assume that vm support is present.
Here is a quote from the architecture notes of the Varnish project:
The really short answer is that computers do not have two kinds of storage any more.
I would suggest you read the full text here: http://www.varnish-cache.org/trac/wiki/ArchitectNotes
It is a good read, and I believe will answer your question.
You could attempt to allocate some large-ish block of memory then check for a memory allocation exception. If the exception occurs, dump data. The problem is, this will only work when all system memory (or process limit) is reached. This means your application is likely to start swapping.
try {
char *buf = new char[10 * 1024 * 1024]; // 10 megabytes
free(buf);
} catch (const std::bad_alloc &) {
// Memory allocation failed - clean up old buffers
}
The problems with this approach are:
Running out of system memory can be dangerous and cause random applications to be shut down
Better memory management might be a better solution. If there is data that can be freed, why has it not already been freed? Is there a periodic process you could run to clean up unneeded data?

How to optimize paging for large in memory database

I have an application where the entire database is implemented in memory using a stl-map for each table in the database.
Each item in the stl-map is a complex object with references to other items in the other stl-maps.
The application works with a large amount of data, so it uses more than 500 MByte RAM. Clients are able to contact the application and get a filtered version of the entire database. This is done by running through the entire database, and finding items relevant for the client.
When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine).
After the application have been partly paged out then a client logon takes a long time (10 mins) because it now generates a page fault for each pointer lookup in the stl-map. If running the client logon a second time right after then it is fast (few secs) because all the memory is now back in RAM.
I can see it is possible to tell Windows to lock memory in RAM, but this is generally only recommended for device drivers, and only for "small" amounts of memory.
I guess a poor mans solution could be to loop through the entire memory database, and thus tell Windows we are still interested in keeping the datamodel in RAM.
I guess another poor mans solution could be to disable the pagefile completely on Windows.
I guess the expensive solution would be a SQL database, and then rewrite the entire application to use a database layer. Then hopefully the database system will have implemented means to for fast access.
Are there other more elegant solutions ?
This sounds like either a memory leak, or a serious fragmentation problem. It seems to me that the first step would be to figure out what's causing 500 Mb of data to use up 16 Gb of RAM and still want more.
Edit: Windows has a working set trimmer that actively attempts to page out idle data. The basic idea is that it goes through and marks pages as being available, but leaves the data in them (and the virtual memory manager knows what data is in them). If, however, you attempt to access that memory before it's allocated to other purposes, it'll be marked as being in use again, which will normally prevent it from being paged out.
If you really think this is the source of your problem, you can indirectly control the working set trimmer by calling SetProcessWorkingSetSize. At least in my experience, this is only rarely of much use, but you may be in one of those unusual situations where it's really helpful.
As #Jerry Coffin said, it really sounds like your actual problem is a memory leak. Fix that.
But for the record, none of your "poor mans solutions" would work. At all.
Windows pages out some of your data because there's not room for it in RAM.
Looping through the entire memory database would load in every byte of the data model, yes... which would cause other parts of it to be paged out. In the end, you'd generate a lot of page faults, and the only difference in the end would be which parts of the data structure are paged out.
Disabling the page file? Yes, if you think a hard crash is better than low performance. Windows doesn't page data out because it's fun. It does that to handle situations where it would otherwise run out of memory. If you disable the pagefile, the app will just crash when it would otherwise page out data.
If your dataset really is so big it doesn't fit in memory, then I don't see why an SQL database would be especially "expensive". Unlike your current solution, databases are optimized for this purpose. They're meant to handle datasets too large to fit in memory, and to do this efficiently.
It sounds like you have a memory leak. Fixing that would be the elegant, efficient and correct solution.
If you can't do that, then either
throw more RAM at the problem (the app ends up using 16GB? Throw 32 or 64GB at it then), or
switch to a format that's optimized for efficient disk access (A SQL database probably)
We have a similar problem and the solution we choose was to allocate everything in a shared memory block. AFAIK, Windows doesn't page this out. However, using stl-map here is not for faint of heart either and was beyond what we required.
We are using Boost Shared Memory to implement this for us and it works well. Follow examples closely and you will be up and running quickly. Boost also has Boost.MultiIndex that will do a lot of what you want.
For a no cost sql solution have you looked at Sqlite? They have an option to run as an in memory database.
Good luck, sounds like an interesting application.
I have an application where the entire
database is implemented in memory
using a stl-map for each table in the
database.
That's the start of the end: STL's std::map is extremely memory inefficient. Same applies to std::list. Every element would be allocated separately causing rather serious memory waste. I often use std::vector + sort() + find() instead of std::map in applications where it is possible (more searches than modifications) and I know in advance memory usage might become an issue.
When the application have been running
for an hour or so, then Windows 2003
SP2 starts to page out parts of the
RAM for the application (Eventhough
there is 16 GByte RAM on the machine).
Hard to tell without knowing how your application is written. Windows has the feature to unload from RAM whatever memory of idle applications can be unloaded. But that normally affects memory mapped files and alike.
Otherwise, I would strongly suggest to read up the Windows memory management documentation . It is not very easy to understand, yet Windows has all sorts and types of memory available to applications. I never had luck with it, but probably in your application using custom std::allocator would work.
I can believe it is the fault of flawed pagefile behaviour -i've run my laptops mostly with pagefile turned off since nt4.0. In my experience, at least up to XP Pro, Windows intrusively swaps pages out just to provide the dubious benefit of having a really-really-slow extension to the maximum working set space.
Ask what benefit swapping to harddisk is achieving with 16 Gigabityes of real RAM available? If your working set it so big as to need more virtual memory than +10 Gigs, then once swapping is actualy required processes will take anything from a bit longer, to thousands of times longer to complete. On Windows the untameable file system cache seems to antagonise the relationships.
Now when I (very) occasionaly run out of working set on my XP laptops, there is no traffic jam, the guilty app just crashes. A utility to suspend memory glugging processes before that time and make an alert would be nice, but there is no such thing just a violation, a crash, and sometimes explorer.exe goes down too.
Pagefiles - who needs em'
---- Edit
Given snakefoot explanation, the problem is swapping out memory that is not used for a longer period of time and due to this not having the data in memory when needed. This is the same as this:
Can I tell Windows not to swap out a particular processes’ memory?
and VirtualLock function should do its job:
http://msdn.microsoft.com/en-us/library/aa366895(VS.85).aspx
---- Previous answer
First of all you need to distinguish between memory leak and memory need problems.
If you have a memory leak then it would be bigger effort to convert entire application to SQL than to debug the application.
SQL cannot be faster then a well designed, domain specific in-memory database and if you have bugs, chances are you will have different ones in an SQL version as well.
If this is a memory need problem, then you will need to switch to SQL anyway and this sounds like a good moment.

Staying away from virtual memory in Windows\C++

I'm writing a performance critical application where its essential to store as much data as possible in the physical memory before dumping to disc.
I can use ::GlobalMemoryStatusEx(...) and ::GetProcessMemoryInfo(...) to find out what percentage of physical memory is reserved\free and how much memory my current process handles.
Using this data I can make sure to dump when ~90% of the physical memory is in use or ~90 of the maximum of 2GB per application limit is hit.
However, I would like a method for simply recieving how many bytes are actually left before the system will start using the virtual memory, especially as the application will be compiled for both 32bit and 64bit, whereas the 2 GB limit doesnt exist.
How about this function:
int
bytesLeftUntilVMUsed() {
return 0;
}
it should give the correct result in nearly all cases I think ;)
Imagine running Windows 7 in 256Mb of RAM (MS suggest 1GB minimum). That's effectively what you're asking the user to do by wanting to reseve 90% of available RAM.
The real question is: Why do you need so much RAM? What is the 'performance critical' criteria exactly?
Usually, this kind of question implies there's something horribly wrong with your design.
Update:
Using top of the range RAM (DDR3) would give you a theoretical transfer speed of 12GB/s which equates to reading one 32 bit value every clock cycle with some bandwidth to spare. I'm fairly sure that it is not possible to do anything useful with the data coming into the CPU at that speed - instruction processing stalls would interrupt this flow. The extra, unsued bandwidth can be used to page data to/from a hard disk. Using RAID this transfer rate can be quite high (about 1/16th of the RAM bandwidth). So it would be feasible to transfer data to/from the disk and process it without having any degradation of performance - 16 cycles between reads is all it would take (OK, my maths might be a bit wrong here).
But if you throw Windows into the mix, it all goes to pot. Your memory can go away at any moment, your application can be paused arbitrarily and so on. Locking memory to RAM would have adverse affects on the whole system, thus defeating the purpose of locing the memory.
If you explain what you're trying to acheive and the performance critria, there are many people here that will help develop a suitable solution, because if you have to ask about system limits, you really are doing something wrong.
Even if you're able to stop your application from having memory paged out to disk, you'll still run into the problem that the VMM might be paging out other programs to disk and that might potentially affect your performance as well. Not to mention that another application might start up and consume memory that you're currently occupying and thus resulting in some of your applications memory being paged out. How are you planning to deal with that?
There is a way to use non-pageable memory via the non-paged pool but (a) this pool is comparatively small and (b) it's used by device drivers and might only be usable from inside the kernel. It's also not really recommended to use large chunks of it unless you want to make sure your system isn't that stable.
You might want to revisit the design of your application and try to work around the possibility of having memory paged to disk before you either try to write your own VMM or turn a Windows machine into essentially a DOS box with more memory.
The standard solution is to not worry about "virtual" and worry about "dynamic".
The "virtual" part of virtual memory has to be looked at as a hardware function that you can only defeat by writing your own OS.
The dynamic allocation of objects, however, is simply your application program's design.
Statically allocate simple arrays of the objects you'll need. Use those arrays of objects. Increase and decrease the size of those statically allocated arrays until you have performance problems.
Ouch. Non-paged pool (the amount of RAM which cannot be swapped or allocated to processes) is typically 256 MB. That's 12.5% of RAM on a 2GB machine. If another 90% of physical RAM would be allocated to a process, that leaves either -2,5% for all other applications, services, the kernel and drivers. Even if you'd allocate only 85% for your app, that would still leave only 2,5% = 51 MB.