I wrote a program that manipulates with files. In the idle state, the memory occupied by the process is ~ 60 MBs. Periodically, say every 2 minutes, the process allocates memory (~ 40 MBs), performs something with files, then frees the allocated memory. The procedure takes around 10 ~ 20 seconds. As the result, the memory usage of my process looks like in the below picture:
My question here is: should I reserve some memory in advance then use the memory when I need? This would make memory usage trend more stable. And the stability would be better for system, am I right?
No, any 21st century OS won't blink an eye if you do this. Things might get interesting if you try to allocate more than 4GB per millisecond across 100 CPU cores, but you're not even close to that.
If you allocate memory in advance say additional 50MB as you say the graph would be a straight line (i.e. stable) but programs competing for memory might not get enough and they would suffer due to it.
Related
Is there a possibility to reserve memory for later allocation in a c++ program?
Background:
I'm working on Debian with Preempt RT patch. My program uses roughly 100MB memory. ALl pages are prevent from swapping with mlockall(). There are mainly 2 running threads, one run in real time and don't allocate memory. The other thread runs with slightly lower priority and allocate/free memory.
In some rare situations a background process allocates all free memory and the system is starting swapping. Now my 'fast' thread want a little piece of ram. Now the kernel give me that new little piece BUT from swap. So my program is interrupted with an huge latency, let say 3sec.
Question:
Is there an way to reserve memory, let say 200MB. If my program will allocate it is definitely possible without swapping?
Even if you allocate all the memory you need, at the beginning of your program, the case you afraid of is that ANOTHER process will use memory. Unless you are the only process on that machine, there will always be another running process. Therefore the solution you want is a "reserved" RAM space that no one but your process can access. Which mean the kernel will never swap this space into HD (and therefore the kernel won't perform any physical access).
fortunately, it is impossible unless you change your kernel and recompile it. Think about the possibility you have more than one process who "reserve" memory for themselves. If you have 4Gb RAM then you are stuck :(
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Let us say my system has 4GB of RAM, and a certain program only consumes 100 MB of memory throughout its execution, and also runs for a limited time, like only 30 seconds. Don't you think its a good idea (by improving performance) not to deallocate memory at all during the execution of that program? Those 100 MBs would be freed anyway at the termination of that program.
The most important part of deallocating memory in a short running program is fascilitating memory reuse. Your processor has a limited cache, probably only a few MBs, and if you continuously allocate more memory you can never take advantage of that.
If however you deallocate some memory, and then reallocate and reuse it, there's a good chance that on reuse the memory will already be "hot" and resident in cache. This means you won't incur the penalty of a cache miss and your program will run faster.
Of course you are trading off the deallocation/reallocation cost for cache misses, but continuously allocating more and more memory is guaranteed to incur lowest level cache misses. Each of these cache misses cost ~100 instructions on a modern processor, way less than the cost of an allocation/deallocation.
Edit
So I made a little test program to test this.
#include <stdio.h>
#include <stdlib.h>
process1(){
char * ptr = malloc(1 << 20);
int i = 0;
while (i < (1<<20)) ptr[i++] = rand();
free(ptr);
}
process2(){
char * ptr = malloc(1 << 20);
int i = 0;
while (i < (1<<20)) ptr[i++] = rand();
}
main(){
int i = 100;
while(i--) process1();
i = 100;
while(i--) process2();
}
Both processes chew through 100 MB of data, the first one deallocates it as it goes, the second does not. I profiled this with valgrind's cachegrind tool. Here are the results
fn=process1
0 943720200 2 2 419430900 100 0 209715800 1638400 32767
fn=process2
0 943719900 0 0 419430800 100 0 209715700 1638400 1605784
Okay not that exciting, i'll translate. By avoiding the free you saved less than .1% of instructions cycles, and you incurred about a million and a half LL cache misses. If we use the standard cycle estimation formula CEst = lr + 10 Llm + 100 LLm thats about a 16% degradation in performance from avoiding the free in the function alone.
I reran in callgrind to get the full story (i'm not gonna post the detailed results because they are way more complicated than the cache grind ones) but when we include the full call to free the result is the same, a 16% more cycles are used when you don't call free.
Feel free to do this same test on your own program, but in this case the results are clear. The cost of managing dynamic memory is truly trivial compared to the cost of continuously refreshing the cache with new memory.
Also, I didn't mention this, but the direct comparison that should be made is the cost of the free to the cost of the cache misses. The frees cost in total 38930 cycles, the cache misses 157332000, you saved 39 thousand cycles, and paid for it with 150 million.
Don't you think its a good idea not to deallocate memory at all during the execution of that program?
Cutting corners is not a "good idea". I can think of plenty of reasons to deallocate memory; I can't think of quite so many to leave unused memory allocated. If you don't want to deal with memory issues, don't write in a language that doesn't have automatic garbage collection- or, since you're writing in C++, use smart pointers.
only consumes 100 MB of memory
That's the most egregious misuse of the word "only" I have ever seen. 100 MB is a serious chunk of change. That's ~1/40th of the memory you're using, and if even 40 other programs running on the system were programmed with the same perspective, at best you'd have the system come to a grinding halt because of swap delays, and at worst you'd have a complete system crash. Usually, there are more than 40 applications running on any given (modern) machine while it's turned on.
also runs for a limited time, like only 30 seconds
Again, 30 seconds is a long time for a computer. But, to clarify, do you mean 30 seconds ever, or 30 seconds every day? Because if it's a one-off and you're low on time, maybe that's an acceptable loss, but it certainly isn't good practice.
TL;DR?
If you have more important, immediate considerations (e.g. a product deadline is approaching) and you've got not other reasonable choice, do it. Otherwise, you've got no excuse. Hogging memory is never something you should feel good about; it can seriously slow down processes that really do need that memory. Yes, the O/S cleans up resources after the program ends. No, that does not give you carte blanche to write programs which abuse their resources.
No, it is not good programming style. Virtually all modern OS's will cleanup the memory afterwards, but it will remain a leak for as long as the program runs.
If you want to avoid overhead from cleanup you can implement an allocator that permits that while still maintaining good practices. For example, allocate a memory pool on program start-up, replace the global operator new with a version that takes memory from the pool, and replace global operator delete with a no-op. Then free the memory pool before program exit. You could also use more find grained pools to cut down on the deallocation overhead but still allow the program to run without continuously growing memory usage.
Another alternative is to simply use a faster implementation of new/delete. The built-in versions are traditionally slow, but they don't have to be.
I have a process in Linux 64 bit (Redhat Enterprise) which enrolled one million of records into memory, each record is 4KB so total memory consumption is about 4 Gigabytes.
My computer has 2GB of RAM and 3 GB of swap memory. So obviously part of data will be put into swap memory. The problem is that I don't know why it really takes too long time to traverse across all those records. I have a function that loop through each record and do some stuff things. It works well with about 500,000 records, the function just need couple of minutes to accomplish. However, with double amount of that records, i.e 1,000,000 records, it needs hours to do the same function. I used top command in Linux to check the cpu load, and see that it's about 90%wa (waiting time for I/O). I guess this might cause the problem but really don't know why it happens.
I would thank you so much any helpful idea.
Swap area is disk. Disk bandwidth is two or three order of magnitude less than memory bandwidth.
There are two options:
The process works over the records sequentially. Than it was the stupidest thing on Earth to roll them all up to memory.
If you can fix the process, fix it to only load a bit at a time.
If you can't fix the process, you'll have to buy more memory.
The process works over the records in random order or multiple times (and can't do otherwise). Well, you'll have to buy more memory.
If you want to use your swap space efficiently, you should make sure that you traverse your data sequentially in contiguous memory blocks. I.e. blocks of several megabytes. That way, when a new chunk is loaded into ram from swap space, this chunk will contain the next few records as well.
Sounds like either cache or swap thrashing is happening. Check vmstat to verify. You can remedy swap thrashing if you load only as much data as you can fit into memory, process them, load another block, and so on. This way you don't have to impose processing order (random or sequential doesn't matter much). Alternatively, we'd have to have more details on your algorithm / program architecture to comment.
The speed of your swap memory depends on the speed of the underlying hardware where the swap resides.
Usually in the operating systems, Windows calls it pagefile.sys, Linux calls it is the swap partition(s), the hardware of the swap is one of the hard drives in the system, so it is orders of magnitude slower than RAM.
Before buying more RAM, you could try using part of your RAM as a compressed swap. I heard of compcache, but I have not used it myself. The idea is the following:
If the data you put in RAM can be compressed (lets say a ratio 3 to 1),
Allocate 1 GB of your 2 GB RAM to a $in memory* swap,
You then have a loo latency RAM of 4 GB.
I would be curious to know if it improves the amount of record you can handle without thrashing.
My C++ program caches lots of objects, and in beginning of each major API call, I want to ensure that there is at least 500 MB available for the API call. I may either be running out of RAM+swap space (consider system with 1 GB RAM + 1 GB SWAP file), or I may be running out of Virtual Address in my process.(I may already be using 3.7 GB out of total 4GB address space). It's not easy for me to approximate how much data I have cached, but I can purge some of it if it is becoming an issue, and do so iteratively till I have 500 MB available in system or address space (whichever is becoming bottleneck). So my requirements are to find in C++ on 32 bit Linux:
A) Find how much RAM + SWAP space is free.
B) How much user space address space is available to my process.
C) How much Virtual Memory the process is already using. Consider it similar to 'Commit Size' or 'Working Set Size' of a process on Windows.
Any answers would be greatly appreciated.
Look at /proc/vmstat there is a lot of information about the system wide memory.
The /proc//maps will give you a lot of information about your process memory layout.
Note that if you check the memory before running a long job, another process may eat all the available memory and your program may crash anyway !
I do not know anything about your cached classes but if these objects are quite small you probably have overridden the new/delete operators. By this it is quite easy to keep track of the memory consumption (at least by counting objects)
Why not change your cache policy ? And flush old unused object.
Another ugly way is to try to allocate several chunk of memory and see the program can allocate it, and release it after that. On 32 bits it may fail because the heap may be fragmented, but if it works you sure that you have enough memory at this time.
Take a look at the source for the vmstat : here. Then search for domem() function, which gather all information about the memory (occupied and free).
I'm writing a performance critical application where its essential to store as much data as possible in the physical memory before dumping to disc.
I can use ::GlobalMemoryStatusEx(...) and ::GetProcessMemoryInfo(...) to find out what percentage of physical memory is reserved\free and how much memory my current process handles.
Using this data I can make sure to dump when ~90% of the physical memory is in use or ~90 of the maximum of 2GB per application limit is hit.
However, I would like a method for simply recieving how many bytes are actually left before the system will start using the virtual memory, especially as the application will be compiled for both 32bit and 64bit, whereas the 2 GB limit doesnt exist.
How about this function:
int
bytesLeftUntilVMUsed() {
return 0;
}
it should give the correct result in nearly all cases I think ;)
Imagine running Windows 7 in 256Mb of RAM (MS suggest 1GB minimum). That's effectively what you're asking the user to do by wanting to reseve 90% of available RAM.
The real question is: Why do you need so much RAM? What is the 'performance critical' criteria exactly?
Usually, this kind of question implies there's something horribly wrong with your design.
Update:
Using top of the range RAM (DDR3) would give you a theoretical transfer speed of 12GB/s which equates to reading one 32 bit value every clock cycle with some bandwidth to spare. I'm fairly sure that it is not possible to do anything useful with the data coming into the CPU at that speed - instruction processing stalls would interrupt this flow. The extra, unsued bandwidth can be used to page data to/from a hard disk. Using RAID this transfer rate can be quite high (about 1/16th of the RAM bandwidth). So it would be feasible to transfer data to/from the disk and process it without having any degradation of performance - 16 cycles between reads is all it would take (OK, my maths might be a bit wrong here).
But if you throw Windows into the mix, it all goes to pot. Your memory can go away at any moment, your application can be paused arbitrarily and so on. Locking memory to RAM would have adverse affects on the whole system, thus defeating the purpose of locing the memory.
If you explain what you're trying to acheive and the performance critria, there are many people here that will help develop a suitable solution, because if you have to ask about system limits, you really are doing something wrong.
Even if you're able to stop your application from having memory paged out to disk, you'll still run into the problem that the VMM might be paging out other programs to disk and that might potentially affect your performance as well. Not to mention that another application might start up and consume memory that you're currently occupying and thus resulting in some of your applications memory being paged out. How are you planning to deal with that?
There is a way to use non-pageable memory via the non-paged pool but (a) this pool is comparatively small and (b) it's used by device drivers and might only be usable from inside the kernel. It's also not really recommended to use large chunks of it unless you want to make sure your system isn't that stable.
You might want to revisit the design of your application and try to work around the possibility of having memory paged to disk before you either try to write your own VMM or turn a Windows machine into essentially a DOS box with more memory.
The standard solution is to not worry about "virtual" and worry about "dynamic".
The "virtual" part of virtual memory has to be looked at as a hardware function that you can only defeat by writing your own OS.
The dynamic allocation of objects, however, is simply your application program's design.
Statically allocate simple arrays of the objects you'll need. Use those arrays of objects. Increase and decrease the size of those statically allocated arrays until you have performance problems.
Ouch. Non-paged pool (the amount of RAM which cannot be swapped or allocated to processes) is typically 256 MB. That's 12.5% of RAM on a 2GB machine. If another 90% of physical RAM would be allocated to a process, that leaves either -2,5% for all other applications, services, the kernel and drivers. Even if you'd allocate only 85% for your app, that would still leave only 2,5% = 51 MB.