JVM minimum heap size recommendation reasons? - jvm-arguments

BEA recommends to keep both min and max heap sizes same. They didn't elaborate the reason for the suggestion. Can someone provide details?
I also got another recommendation from an architect of not setting anything for minimum and just set the maximum. Any comments on this? If i dont use it, what would be the default?
What is the best tool to monitor and tune JVM settings. I am using JDK1.6, on BEA weblogic 10g. It is on Linux 32bit JVM.
Is Max heap size set to 2GB any good? Server has lots of RAM. Currently it is set at 1.5GB, and it is at 80% usage when there is 40 concurrent users.
Thanks,

When the JVM needs to increase the size of the heap it will invoke a full garbage collection which may reduce throughput or cause a pause, so I would think that this is recommended by them for performance reasons.
Default value is documented as 2MB so if you don't override it you are likely to get a lot of (probably very quick) full collections after starting up as the heap is frequently resized.
Unless you are trying to keep the memory footprint as small as possible, I would follow BEA's advice.
Impossible to say from the information given if 2GB is appropriate, or if the objects that are using up space in it are still in scope - the old generation will just gradually fill up until it runs out of space when a full collection will be invoked. Where does that 80% figure come from?
Use the following JVM arguments to log GC details to a file called gc.log:
-verbose:gc -XX:+PrintGCDetails -Xloggc:gc.log
Then you can analyse this using something like http://www.tagtraum.com/gcviewer.html

Related

How to programatically get the memory usage of the current program?

I would like to get some information about the memory usage of my C++ program. The way I do this is by accessing /proc/self/stat and printing the virtual and resident set size.
You can find an example here.
Is this a good way to go? How accurate is the information I am accessing*?
Could someone recommend a better way to measure memory usage programmatically?
*Asking, because I get unexpected, sudden jumps of mem usage. My expectation was that the information is perfectly accurate.
OS: I am running inside a docker container, which is based on RHEL.
Additional info: If I limit the memory usage of the container with docker run -m, the printed memory is greater than the limit I set.
How to programatically get the memory usage of the current program?
There is no standard way to get memory usage of a program in C++.
The concept of "memory usage" itself is somewhat vague and can mean different things. Depending on what you mean, there may or might not be a system specific way to get the information.
The way I do this is by accessing /proc/self/stat
Is this a good way to go?
I don't think so. As far as I know, /proc filesystem is not portable. Use the getrusage function on POSIX systems.

Limit buffer cache used for mmap

I have a data structure that I'd like to rework to page out on-demand. mmap seems like an easy way to run some initial experiments. However, I want to limit the amount of buffer cache that the mmap uses. The machine has enough memory to page the entire data structure into cache, but for test reasons (and some production reasons too) I don't want to allow it to do that.
Is there a way to limit the amount of buffer cache used by mmap?
Alternatively, an mmap alternative that can achieve something similar and still limit memory usage would work too.
From my understanding, it is not possible. Memory mapping is controlled by the operating system. The kernel will make the decisions how to use the available memory in the best way, but it looks at the system in total. I'm not aware that quotas for caches on a process level are supported (at least, I have not seen such APIs in Linux or BSD).
There is madvise to give the kernel hints, but it does not support to limit the cache used for one process. You can give it hints like MADV_DONTNEED, which will reduce the pressure on the cache of other applications, but I would expect that it will do more harm than good, as it will most likely make caching less efficient, which will lead to more IO load on the system in total.
I see only two alternatives. One is trying to solve the problem at the OS level, and the other is to solve it at the application level.
At the OS level, I see two options:
You could run a virtual machine, but most likely this is not what you want. I would also expect that it will not improve the overall system performance. Still, it would be at least a way to define upper limits on the memory consumption.
Docker is the another idea that comes to mind, also operating at the OS level, but to the best of my knowledge, it does not support defining cache quotas. I don't think it will work.
That leaves only one option, which is to look at the application level. Instead of using memory mapped files, you could use explicit file system operations. If you need to have full control over the buffer, I think it is the only practical option. It is more work than memory mapping, and it is also not guaranteed to perform better.
If you want to stay with memory mapping, you could also map only parts of the file in memory and unmap other parts when you exceed your memory quota. It also has the same problem as the explicit file IO operations (more implementation work and non-trivial tuning to find a good caching strategy).
Having said that, you could question the requirement to limit the cache memory usage. I would expect that the kernel does a pretty good job at allocating memory resources in a good way. At least, it will likely be better than the solutions that I have sketched. (Explicit file IO, plus an internal cache, might be fast, but it is not trivial to implement and tune. Here, is a comparison of the trade-offs: mmap() vs. reading blocks.)
During testing, you could run the application with ionice -c 3 and nice -n 20 to somewhat reduce the impact on the other productive applications.
There is also a tool called nocache. I never used it but when reading through its documentation, it seems somewhat related to your question.
It might be possible to accomplish this through the use of mmap() and Linux Control Groups (more generally, here or here). Once installed, you have the ability to create arbitrary limits on the amount of, among other things, physical memory used by a process. As an example, here we limit the physical memory to 128 megs and swap memory to 256 megs:
cgcreate -g memory:/limitMemory
echo $(( 128 * 1024 * 1024 )) > /sys/fs/cgroup/memory/limitMemory/memory.limit_in_bytes
echo $(( 256 * 1024 * 1024 )) > /sys/fs/cgroup/memory/limitMemory/memory.memsw.limit_in_bytes
I would go the route of only map parts of the file at a time so you can retain full control on exactly how much memory is used.
you may use ipc shared memory segment, you will be the master of your memory segments.

How to optimize paging for large in memory database

I have an application where the entire database is implemented in memory using a stl-map for each table in the database.
Each item in the stl-map is a complex object with references to other items in the other stl-maps.
The application works with a large amount of data, so it uses more than 500 MByte RAM. Clients are able to contact the application and get a filtered version of the entire database. This is done by running through the entire database, and finding items relevant for the client.
When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine).
After the application have been partly paged out then a client logon takes a long time (10 mins) because it now generates a page fault for each pointer lookup in the stl-map. If running the client logon a second time right after then it is fast (few secs) because all the memory is now back in RAM.
I can see it is possible to tell Windows to lock memory in RAM, but this is generally only recommended for device drivers, and only for "small" amounts of memory.
I guess a poor mans solution could be to loop through the entire memory database, and thus tell Windows we are still interested in keeping the datamodel in RAM.
I guess another poor mans solution could be to disable the pagefile completely on Windows.
I guess the expensive solution would be a SQL database, and then rewrite the entire application to use a database layer. Then hopefully the database system will have implemented means to for fast access.
Are there other more elegant solutions ?
This sounds like either a memory leak, or a serious fragmentation problem. It seems to me that the first step would be to figure out what's causing 500 Mb of data to use up 16 Gb of RAM and still want more.
Edit: Windows has a working set trimmer that actively attempts to page out idle data. The basic idea is that it goes through and marks pages as being available, but leaves the data in them (and the virtual memory manager knows what data is in them). If, however, you attempt to access that memory before it's allocated to other purposes, it'll be marked as being in use again, which will normally prevent it from being paged out.
If you really think this is the source of your problem, you can indirectly control the working set trimmer by calling SetProcessWorkingSetSize. At least in my experience, this is only rarely of much use, but you may be in one of those unusual situations where it's really helpful.
As #Jerry Coffin said, it really sounds like your actual problem is a memory leak. Fix that.
But for the record, none of your "poor mans solutions" would work. At all.
Windows pages out some of your data because there's not room for it in RAM.
Looping through the entire memory database would load in every byte of the data model, yes... which would cause other parts of it to be paged out. In the end, you'd generate a lot of page faults, and the only difference in the end would be which parts of the data structure are paged out.
Disabling the page file? Yes, if you think a hard crash is better than low performance. Windows doesn't page data out because it's fun. It does that to handle situations where it would otherwise run out of memory. If you disable the pagefile, the app will just crash when it would otherwise page out data.
If your dataset really is so big it doesn't fit in memory, then I don't see why an SQL database would be especially "expensive". Unlike your current solution, databases are optimized for this purpose. They're meant to handle datasets too large to fit in memory, and to do this efficiently.
It sounds like you have a memory leak. Fixing that would be the elegant, efficient and correct solution.
If you can't do that, then either
throw more RAM at the problem (the app ends up using 16GB? Throw 32 or 64GB at it then), or
switch to a format that's optimized for efficient disk access (A SQL database probably)
We have a similar problem and the solution we choose was to allocate everything in a shared memory block. AFAIK, Windows doesn't page this out. However, using stl-map here is not for faint of heart either and was beyond what we required.
We are using Boost Shared Memory to implement this for us and it works well. Follow examples closely and you will be up and running quickly. Boost also has Boost.MultiIndex that will do a lot of what you want.
For a no cost sql solution have you looked at Sqlite? They have an option to run as an in memory database.
Good luck, sounds like an interesting application.
I have an application where the entire
database is implemented in memory
using a stl-map for each table in the
database.
That's the start of the end: STL's std::map is extremely memory inefficient. Same applies to std::list. Every element would be allocated separately causing rather serious memory waste. I often use std::vector + sort() + find() instead of std::map in applications where it is possible (more searches than modifications) and I know in advance memory usage might become an issue.
When the application have been running
for an hour or so, then Windows 2003
SP2 starts to page out parts of the
RAM for the application (Eventhough
there is 16 GByte RAM on the machine).
Hard to tell without knowing how your application is written. Windows has the feature to unload from RAM whatever memory of idle applications can be unloaded. But that normally affects memory mapped files and alike.
Otherwise, I would strongly suggest to read up the Windows memory management documentation . It is not very easy to understand, yet Windows has all sorts and types of memory available to applications. I never had luck with it, but probably in your application using custom std::allocator would work.
I can believe it is the fault of flawed pagefile behaviour -i've run my laptops mostly with pagefile turned off since nt4.0. In my experience, at least up to XP Pro, Windows intrusively swaps pages out just to provide the dubious benefit of having a really-really-slow extension to the maximum working set space.
Ask what benefit swapping to harddisk is achieving with 16 Gigabityes of real RAM available? If your working set it so big as to need more virtual memory than +10 Gigs, then once swapping is actualy required processes will take anything from a bit longer, to thousands of times longer to complete. On Windows the untameable file system cache seems to antagonise the relationships.
Now when I (very) occasionaly run out of working set on my XP laptops, there is no traffic jam, the guilty app just crashes. A utility to suspend memory glugging processes before that time and make an alert would be nice, but there is no such thing just a violation, a crash, and sometimes explorer.exe goes down too.
Pagefiles - who needs em'
---- Edit
Given snakefoot explanation, the problem is swapping out memory that is not used for a longer period of time and due to this not having the data in memory when needed. This is the same as this:
Can I tell Windows not to swap out a particular processes’ memory?
and VirtualLock function should do its job:
http://msdn.microsoft.com/en-us/library/aa366895(VS.85).aspx
---- Previous answer
First of all you need to distinguish between memory leak and memory need problems.
If you have a memory leak then it would be bigger effort to convert entire application to SQL than to debug the application.
SQL cannot be faster then a well designed, domain specific in-memory database and if you have bugs, chances are you will have different ones in an SQL version as well.
If this is a memory need problem, then you will need to switch to SQL anyway and this sounds like a good moment.

Information about PTE's (Page Table Entries) in Windows

In order to find more easily buffer overflows I am changing our custom memory allocator so that it allocates a full 4KB page instead of only the wanted number of bytes. Then I change the page protection and size so that if the caller writes before or after its allocated piece of memory, the application immediately crashes.
Problem is that although I have enough memory, the application never starts up completely because it runs out of memory. This has two causes:
since every allocation needs 4 KB, we probably reach the 2 GB limit very soon. This problem could be solved if I would make a 64-bit executable (didn't try it yet).
even when I only need a few hundreds of megabytes, the allocations fail at a certain moment.
The second problem is the biggest one, and I think it's related to the maximum number of PTE's (page table entries, which store information on how Virtual Memory is mapped to physical memory, and whether pages should be read-only or not) you can have in a process.
My questions (or a cry-for-tips):
Where can I find information about the maximum number of PTE's in a process?
Is this different (higher) for 64-bit systems/applications or not?
Can the number of PTE's be configured in the application or in Windows?
Thanks,
Patrick
PS. note for those who will try to argument that you shouldn't write your own memory manager:
My application is rather specific so I really want full control over memory management (can't give any more details)
Last week we had a memory overwrite which we couldn't find using the standard C++ allocator and the debugging functionality of the C/C++ run time (it only said "block corrupt" minutes after the actual corruption")
We also tried standard Windows utilities (like GFLAGS, ...) but they slowed down the application by a factor of 100, and couldn't find the exact position of the overwrite either
We also tried the "Full Page Heap" functionality of Application Verifier, but then the application doesn't start up either (probably also running out of PTE's)
There is what i thought was a great series of blog posts by Mark Russinovich on technet called "Pushing the limits of Windows..."
http://blogs.technet.com/markrussinovich/archive/2008/07/21/3092070.aspx
It has a few articles on virtual memory, paged nonpaged memory, physical memory and others.
He mentions little utilities he uses to take measurements about a systems resources.
Hopefully you will find your answers there.
A shotgun approach is to allocate those isolated 4KB entries at random. This means that you will need to rerun the same tests, with the same input repeatedly. Sometimes it will catch the error, if you're lucky.
A slightly smarter approach is to use another algorithm than just random - e.g. make it dependent on the call stack whether an allocation is isolated. Do you trust std::string users, for instance, and suspect raw malloc use?
Take a look at the implementation of OpenBSD malloc. Much of the same ideas (and more) implemented by very skilled folk.
In order to find more easily buffer
overflows I am changing our custom
memory allocator so that it allocates
a full 4KB page instead of only the
wanted number of bytes.
This has already been done. Application Verifier with PageHeap.
Info on PTEs and the Memory architecture can be found in Windows Internals, 5th Ed. and the Intel Manuals.
Is this different (higher) for 64-bit systems/applications or not?
Of course. 64bit Windows has a much larger address space, so clearly more PTEs are needed to map it.
Where can I find information about the
maximum number of PTE's in a process?
This is not so important as the maximum amount of user address space available in a process. (The number of PTEs is this number divided by the page size.)
This is 2GB on 32 bit Windows and much bigger on x64 Windows. (The actual number varies, but it's "big enough").
Problem is that although I have enough
memory, the application never starts
up completely because it runs out of
memory.
Are you a) leaking memory? b) using horribly inefficient algorithms?

How do I set the ideal QPixmapCache::cacheLimit?

I have just started using QPixmapCache and I was wondering, since there is not much documentation, about how to adjust the size based on the system the application is running on.
Some users might have lots of free memory while others have very little. I have no idea what the best setting would be.
What would be the best way to detect the system (free) RAM and adjust the cache size to fit?
http://doc.trolltech.com/4.4/qpixmapcache.html#setCacheLimit
To detect free RAM in Windows, you can use the GlobalMemoryStatus function.
I'm not sure if this will help you size the pixmap cache; perhaps you will need to do some performance measurements and create a lookup table.
Note that QPixmap is window-system specific. This means that QPixmap likely corresponds to graphics card memory more than the general RAM. Just like any other cache, the size of it should "as much as it needs to be, but not more".
The best way to tune the cache is to profile your application running on a typical machine (of the target users). See if the cache starts to trash old pixmaps and get cache misses.