Why my systems doesnt free the buffers/cache - c++

I have a memory hungry application. I let it run over night on a system with 32GB RAM. Also ran free -m -s 20 along with it to see how the memory status changes. My application was the only thing I manually started after restarting my Ubuntu (except the terminal of course). let's look at parts of output:
when the application started:
total used free shared buffers cached
Mem: 32100 1428 30671 35 69 594
-/+ buffers/cache: 765 31335
Swap: 32693 0 32693
before the application ends:
total used free shared buffers cached
Mem: 32100 31860 240 84 2 17420
-/+ buffers/cache: 14437 17663
Swap: 32693 12 32681
right after the application ends:
total used free shared buffers cached
Mem: 32100 18723 13376 84 2 17434
-/+ buffers/cache: 1285 30814
Swap: 32693 12 32681
and the status remain same for many hours until I came back in the morning.
My question is:
Why most of my memory is still considered a free part of buffers/cache ? when is this part of memory going to be the free part of the overall Mem: again?
I then opened a browser, an IDE and some other GUI application to see how and from where the memory is allocated to the new applications:
total used free shared buffers cached
Mem: 32100 20378 11721 88 160 18075
-/+ buffers/cache: 2143 29956
Swap: 32693 12 32681
Apparently, Free memory from both Mem: as well as buffers/cache: was allocated to new applications. Can you please interpret this for me?

The cached data is also part of the used memory. After a program ends, the data loaded from disk by your program will become a part of the cache. So your system will not free any of the data, but it drops the cached data, if your memory will be needed again. A more or less funny page, that describes the problem.

Related

memory mapped file access is very slow

I am writing to a 930GB file (preallocated) on a Linux machine with 976 GB memory.
The application is written in C++ and I am memory mapping the file using Boost Interprocess. Before starting the code I set the stack size:
ulimit -s unlimited
The writing was very fast a week ago, but today it is running slow. I don't think the code has changed, but I may have accidentally changed something in my environment (it is an AWS instance).
The application ("write_data") doesn't seem to be using all the available memory. "top" shows:
Tasks: 559 total, 1 running, 558 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 98.5%id, 1.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1007321952k total, 149232000k used, 858089952k free, 286496k buffers
Swap: 0k total, 0k used, 0k free, 142275392k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4904 root 20 0 2708m 37m 27m S 1.0 0.0 1:47.00 dockerd
56931 my_user 20 0 930g 29g 29g D 1.0 3.1 12:38.95 write_data
57179 root 20 0 0 0 0 D 1.0 0.0 0:25.55 kworker/u257:1
57512 my_user 20 0 15752 2664 1944 R 1.0 0.0 0:00.06 top
I thought the resident size (RES) should include the memory mapped data, so shouldn't it be > 930 GB (size of the file)?
Can someone suggest ways to diagnose the problem?
Memory mappings generally aren't eagerly populated. If some other program forced the file into the page cache, you'd see good performance from the start, otherwise you'd see poor performance as the file was paged in.
Given you have enough RAM to hold the whole file in memory, you may want to hint to the OS that it should prefetch the file, reducing the number of small reads triggered by page faults, substituting larger bulk reads. The posix_madvise API can be used to provide this hint, by passing POSIX_MADV_WILLNEED as the advice, indicating it should prefetch the whole file.

Limiting Java 8 Memory Consumption

I'm running three Java 8 JVMs on a 64 bit Ubuntu VM which was built from a minimal install with nothing extra running other than the three JVMs. The VM itself has 2GB of memory and each JVM was limited by -Xmx512M which I assumed would be fine as there would be a couple of hundred MB spare.
A few weeks ago, one crashed and the hs_err_pid dump showed:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 196608 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
I restarted the JVM with a reduced heap size of 384MB and so far everything is fine. However when I currently look at the VM using the ps command and sort in descending RSS size I see
RSS %MEM VSZ PID CMD
708768 35.4 2536124 29568 java -Xms64m -Xmx512m ...
542776 27.1 2340996 12934 java -Xms64m -Xmx384m ...
387336 19.3 2542336 6788 java -Xms64m -Xmx512m ...
12128 0.6 288120 1239 /usr/lib/snapd/snapd
4564 0.2 21476 27132 -bash
3524 0.1 5724 1235 /sbin/iscsid
3184 0.1 37928 1 /sbin/init
3032 0.1 27772 28829 ps ax -o rss,pmem,vsz,pid,cmd --sort -rss
3020 0.1 652988 1308 /usr/bin/lxcfs /var/lib/lxcfs/
2936 0.1 274596 1237 /usr/lib/accountsservice/accounts-daemon
..
..
and the free command shows
total used free shared buff/cache available
Mem: 1952 1657 80 20 213 41
Swap: 0 0 0
Taking the first process as an example, there is an RSS size of 708768 KB even though the heap limit would be 524288 KB (512*1024).
I am aware that extra memory is used over the JVM heap but the question is how can I control this to ensure I do not run out of memory again ? I am trying to set the heap size for each JVM as large as I can without crashing them.
Or is there a good general guideline as to how to set JVM heap size in relation to overall memory availability ?
There does not appear to be a way of controlling how much extra memory the JVM will use over the heap. However by monitoring the application over a period of time, a good estimate of this amount can be obtained. If the overall consumption of the java process is higher than desired, then the heap size can be reduced. Further monitoring is needed to see if this impacts performance.
Continuing with the example above and using the command ps ax -o rss,pmem,vsz,pid,cmd --sort -rss we see usage as of today is
RSS %MEM VSZ PID CMD
704144 35.2 2536124 29568 java -Xms64m -Xmx512m ...
429504 21.4 2340996 12934 java -Xms64m -Xmx384m ...
367732 18.3 2542336 6788 java -Xms64m -Xmx512m ...
13872 0.6 288120 1239 /usr/lib/snapd/snapd
..
..
These java processes are all running the same application but with different data sets. The first process (29568) has stayed stable using about 190M beyond the heap limit while the second (12934) has reduced from 156M to 35M. The total memory usage of the third has stayed well under the heap size which suggests the heap limit could be reduced.
It would seem that allowing 200MB extra non heap memory per java process here would be more than enough as that gives 600MB leeway total. Subtracting this from 2GB leaves 1400MB so the three -Xmx parameter values combined should be less than this amount.
As will be gleaned from reading the article pointed out in a comment by Fairoz there are many different ways in which the JVM can use non heap memory. One of these that is measurable though is the thread stack size. The default for a JVM can be found on linux using java -XX:+PrintFlagsFinal -version | grep ThreadStackSize In the case above it is 1MB and as there are about 25 threads, we can safely say that at least 25MB extra will always be required.

Not enough space to cache rdd in memory warning

I am running a spark job, and I got Not enough space to cache rdd_128_17000 in memory warning. However, in the attached file, it obviously saying only 90.8 G out of 719.3 G is used. Why is that? Thanks!
15/10/16 02:19:41 WARN storage.MemoryStore: Not enough space to cache rdd_128_17000 in memory! (computed 21.4 GB so far)
15/10/16 02:19:41 INFO storage.MemoryStore: Memory use = 4.1 GB (blocks) + 21.2 GB (scratch space shared across 1 thread(s)) = 25.2 GB. Storage limit = 36.0 GB.
15/10/16 02:19:44 WARN storage.MemoryStore: Not enough space to cache rdd_129_17000 in memory! (computed 9.4 GB so far)
15/10/16 02:19:44 INFO storage.MemoryStore: Memory use = 4.1 GB (blocks) + 30.6 GB (scratch space shared across 1 thread(s)) = 34.6 GB. Storage limit = 36.0 GB.
15/10/16 02:25:37 INFO metrics.MetricsSaver: 1001 MetricsLockFreeSaver 339 comitted 11 matured S3WriteBytes values
15/10/16 02:29:00 INFO s3n.MultipartUploadOutputStream: uploadPart /mnt1/var/lib/hadoop/s3/959a772f-d03a-41fd-bc9d-6d5c5b9812a1-0000 134217728 bytes md5: qkQ8nlvC8COVftXkknPE3A== md5hex: aa443c9e5bc2f023957ed5e49273c4dc
15/10/16 02:38:15 INFO s3n.MultipartUploadOutputStream: uploadPart /mnt/var/lib/hadoop/s3/959a772f-d03a-41fd-bc9d-6d5c5b9812a1-0001 134217728 bytes md5: RgoGg/yJpqzjIvD5DqjCig== md5hex: 460a0683fc89a6ace322f0f90ea8c28a
15/10/16 02:42:20 INFO metrics.MetricsSaver: 2001 MetricsLockFreeSaver 339 comitted 10 matured S3WriteBytes values
This is likely to be caused by the configuration of spark.storage.memoryFraction being too low. Spark will only use this fraction of the allocated memory to cache RDDs.
Try either:
increasing the storage fraction
rdd.persist(StorageLevel.MEMORY_ONLY_SER) to reduce memory usage by serializing the RDD data
rdd.persist(StorageLevel.MEMORY_AND_DISK) to partially persist onto disk if memory limits are reached.
This could be due to the following issue if you're loading lots of avro files:
https://mail-archives.apache.org/mod_mbox/spark-user/201510.mbox/%3CCANx3uAiJqO4qcTXePrUofKhO3N9UbQDJgNQXPYGZ14PWgfG5Aw#mail.gmail.com%3E
With a PR in progress at:
https://github.com/databricks/spark-avro/pull/95
I have a Spark-based batch application (a JAR with main() method, not written by me, I'm not a Spark expert) that I run in local mode without spark-submit, spark-shell, or spark-defaults.conf. When I tried to use IBM JRE (like one of my customers) instead of Oracle JRE (same machine and same data), I started getting those warnings.
Since the memory store is a fraction of the heap (see the page that Jacob suggested in his comment), I checked the heap size: IBM JRE uses a different strategy to decide default heap size and it was too small, so I simply added appropriate -Xms and -Xmx params and the problem disappeared: now the batch works fine both with IBM and Oracle JRE.
My usage scenario is not typical, I know, however I hope this can help someone.

What information does "Top" output while using MPI

I am trying to figure out how much memory my program which uses MPI needs. It was suggested to use the function "top" to obtain the usage of memory. However, I am unclear on what the information means.
I wish to know how to estimate the system memory and how much it utilizes?
top - 13:52:41 up 208 days, 19:50, 1 user, load average: 0.68, 0.15, 0.05
Tasks: 86 total, 6 running, 80 sleeping, 0 stopped, 0 zombie
Cpu(s): 98.5% us, 0.6% sy, 0.0% ni, 0.8% id, 0.0% wa, 0.0% hi, 0.1% si
Mem: 1024708k total, 225144k used, 799564k free, 104232k buffers
Swap: 0k total, 0k used, 0k free, 37276k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12052 amohan 16 0 9024 4756 5504 R 99.0 0.5 0:09.65 greet
12054 amohan 16 0 9024 4756 5504 R 99.0 0.5 0:09.64 greet
12055 amohan 16 0 9024 4752 5504 R 98.7 0.5 0:09.65 greet
12053 amohan 16 0 9024 4760 5504 R 98.7 0.5 0:09.63 greet
This question is related to an earlier post Fatal Error in MPI_Irecv: Aborting Job
The standard information displayed by top is, in order:
The Process ID
The owning username
The kernel-assigned process priority (higher is "lower" priority)
The "niceness" of the process (higher is "nicer" to other processes and gives the process a "lower" priority)
The virtual memory allocated for the process in KiB
The resident memory in use by the process (the memory that has been malloc()ed and is not swapped out) in KiB
The shared memory (the memory that is accessible to both the running process, its sibling processes, and any other processes that have been granted access) accessible to the process in KiB
The run state (R is running, Z is zombie, S is sleeping, etc.)
The % of CPU in use (obviously)
The % of memory in use relative to total system memory
The cumulative time that the process has been in a Run state
More detailed information should be available in man top.
In particular, the memory that MPI likely uses is contained in the shared memory area. More detailed information can be pulled from the /proc/ directory, but I don't know the specifics off the top of my head.

Opensuse 11.0 Memory error

I am using OpenSuse 11.0. The system got hung and i have to do hard reboot.After investigating the logs i got following error:-
Mem-info:
kernel: Node 0 DMA per-cpu:
kernel: CPU 0: hi: 0, btch: 1 usd: 0
kernel: Node 0 DMA32 per-cpu:
kernel: CPU 0: hi: 186, btch: 31 usd: 174
kernel: Active:229577 inactive:546 dirty:0 writeback:0 unstable:0
kernel: free:1982 slab:5674 mapped:18 pagetables:10359 bounce:0
kernel: Node 0 DMA free:4000kB min:32kB low:40kB high:48kB active:2800kB inactive:2184kB present:8860kB pages_scanned:9859 all_unreclaimable? yes
kernel: lowmem_reserve[]: 0 994 994 994
kernel: Node 0 DMA32 free:3928kB min:4016kB low:5020kB high:6024kB active:915508kB inactive:0kB present:1018016kB pages_scanned:2233186 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 0*4kB 8*8kB 6*16kB 2*32kB 3*64kB 6*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
Node 0 DMA32: 4*4kB 9*8kB 0*16kB 4*32kB 2*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3928kB
19418 total pagecache pages
Swap cache: add 36342342, delete 36342340, find 14356263/18138459
Free swap = 0kB
Total swap = 771080kB
Free swap: 0kB
262144 pages of RAM
kernel: 5430 reserved pages
It is something related to memory leakage but not sure.
If anybody is having solution of similar issue please let me know..
Thanks in advance
Rajiv
I suggest doing something like running "while true ; do sleep 10 ; ps auxw >> ~/processes ; done". Then you can probably spot the memory hog that chewed through 700 megs of swap after your system comes back up by reading through the file and spotting the growing program.
When you find the program that is eating all your memory, you can use rlimits (man bash, search for 'ulimit') to limit how much memory that program uses before you start it, and maybe keep your system a little more sane.