LInux reserve memory for later allocation c++ - c++

Is there a possibility to reserve memory for later allocation in a c++ program?
Background:
I'm working on Debian with Preempt RT patch. My program uses roughly 100MB memory. ALl pages are prevent from swapping with mlockall(). There are mainly 2 running threads, one run in real time and don't allocate memory. The other thread runs with slightly lower priority and allocate/free memory.
In some rare situations a background process allocates all free memory and the system is starting swapping. Now my 'fast' thread want a little piece of ram. Now the kernel give me that new little piece BUT from swap. So my program is interrupted with an huge latency, let say 3sec.
Question:
Is there an way to reserve memory, let say 200MB. If my program will allocate it is definitely possible without swapping?

Even if you allocate all the memory you need, at the beginning of your program, the case you afraid of is that ANOTHER process will use memory. Unless you are the only process on that machine, there will always be another running process. Therefore the solution you want is a "reserved" RAM space that no one but your process can access. Which mean the kernel will never swap this space into HD (and therefore the kernel won't perform any physical access).
fortunately, it is impossible unless you change your kernel and recompile it. Think about the possibility you have more than one process who "reserve" memory for themselves. If you have 4Gb RAM then you are stuck :(

Related

What part of the process virtual memory does Windows Task Manager display

My question is a bit naive. I'm willing to have an overview as simple as possible and couldn't find any resource that made it clear to me. I am a developer and I want to understand what exactly is the memory displayed in the "memory" column by default in Windows Task Manager:
To make things a bit simpler, let's forget about the memory the process shares with other processes, and imagine the shared memory is negligible. Also I'm focussed on the big picture and mainly care for things at GB level.
As far as I know, the memory reserved by the process called "virtual memory", is partly stored in the main memory (RAM), partly on the disk. The system decides what goes where. The system basically keeps in RAM the parts of the virtual memory that is accessed sufficiently frequently by the process. A process can reserve more virtual memory than RAM available in the computer.
From a developer point of view, the virtual memory may only be partially allocated by the program through its own memory manager (with malloc() or new X() for example). I guess the system has no awareness of what part of the virtual memory is allocated since this is handled by the process in a "private" way and depends on the language, runtime, compiler... Q: Is this correct?
My hypothesis is that the memory displayed by the task manager is essentially the part of the virtual memory being stored in RAM by the system. Q: Is it correct? And is there a simple way to know the total virtual memory reserved by the process?
Memory on windows is... extremely complicated and asking 'how much memory does my process use' is effectively a nonsensical question. TO answer your questions lets get a little background first.
Memory on windows is allocated via ptr = VirtualAlloc(..., MEM_RESERVE, ...) and committed later with VirtualAlloc(ptr+n, MEM_COMMIT, ...).
Any reserved memory just uses up address space and so isn't interesting. Windows will let you MEM_RESERVE terabytes of memory just fine. Committing the memory does use up resources but not in the way you'd think. When you call commit windows does a few sums and basically works out (total physical ram + total swap - current commit) and lets you allocate memory if there's enough free. BUT the windows memory manager doesn't actually give you physical ram until you actually use it.
Later, however, if windows is tight for physical RAM it'll swap some of your RAM out to disk (it may compress it and also throw away unused pages, throw away anything directly mapped from a file and other optimisations). This means your total commit and total physical ram usage for your program may be wildly different. Both numbers are useful depending on what you're measuring.
There's one last large caveat - memory that is shared. When you load DLLs the code, the read-only memory [and even maybe the read/write section but this is COW'd] can be shared with other programs. This means that your app requires that memory but you cannot count that memory against just your app - after all it can be shared and so doesn't take up as much physical memory as a naive count would think.
(If you are writing a game or similar you also need to count GPU memory but I'm no expert here)
All of the above goodness is normally wrapped up by the heap the application uses and you see none of this - you ask for and use memory. And its just as optimal as possible.
You can see this by going to the details tab and looking at the various options - commit-size and working-set are really useful. If you just look at the main window in task-manager and it has a single value I'd hope you understand now that a single value for memory used has to be some kind of compromise as its not a question that makes sense.
Now to answer your questions
Firstly the OS knows exactly how much memory your app has reserved and how much it has committed. What it doesn't know is if the heap implementation you (or more likely the CRT) are using has kept some freed memory about which it hasn't released back to the operation system. Heaps often do this as an optimisation - asking for memory from the OS and freeing it back to the OS is a fairly expensive operation (and can only be done in large chunks known as pages) and so most of them keep some around.
Second question: Dont use that value, go to details and use the values there as only you know what you actually want to ask.
EDIT:
For your comment, yes, but this depends on the size of the allocation. If you allocate a large block of memory (say >= 1MB) then the heap in the CRT generally directly defers the allocation to the operating system and so freeing individual ones will actually free them. For small allocations the heap in the CRT asks for pages of memory from the operating system and then subdivides that to give out in allocations. And so if you then free every other one of those you'll be left with holes - and the heap cannot give those holes back to the OS as the OS generally only works in whole pages. So anything you see in task manager will show that all the memory is still used. Remember this memory isn't lost or leaked, its just effectively pooled and will be used again if allocations ask for that size. If you care about this memory you can use the crt heap statistics famliy of functions to keep an eye on those - specifically _CrtMemDumpStatistics

cannot allocate memory fast enough?

Assume you are tasked to address a performance bottleneck in an application. Via profiling we discover the bottleneck is related to memory allocation. We discover that the application can only perform N memory allocations per second, no matter how many threads we have allocating memory. Why would we be seeing this behavior and how might we increase the rate at which the application can allocate memory. (Assume that we cannot change the size of the memory blocks that we are allocating. Assume that we cannot reduce the use of dynamically allocated memory.)
Okay, a few solutions exist - however almost all of them seem to be excluded via some constraint or another.
1. Have more threads allocate memory
We discover that the application can only perform N memory allocations per second, no matter how many threads we have allocating memory.
From this, we can cross-off any ideas of adding more threads (since "no matter how many threads"...).
2. Allocate more memory at a time
Assume that we cannot change the size of the memory blocks that we are allocating.
Fairly obviously, we have to allocate the same block size.
3. Use (some) static memory
Assume that we cannot reduce the use of dynamically allocated memory.
This one I found most interesting.. Reminded me of a story I heard about a FORTRAN programmer (before Fortran had dynamic memory allocation) whom just used a HUGE static array allocated on the stack as a private heap.
Unfortunately, this constraint prevents us from using such a trick.. However, it does give a glean into one aspect of a (the) solution.
My Solution
At the start of execution (either of the program, or on a per-thread basis) make several^ memory allocation system calls. Then use the memory from these later in the program (along with the existing dynamic memory allocations).
* Note: The 'several' would probably be an exact number, determined from your profiling, which the question mentions in the beginning.
TL;DR
The trick is to modify the timing of the memory allocations.
Looks like a challenging problem, though without details, you can only do some guesses. (Which is most likely the idea of this question)
The limitation here is the number of allocations, not the size of the allocation.
If we can assume that you are in control of where it allocations occur, you can allocate the memory for multiple instances at once. Please consider the code below as pseudo code, as it's only for illustration purpose.
const static size_t NR_COMBINED_ALLOCATIONS = 16;
auto memoryBuffer = malloc(size_of(MyClass)*NR_COMBINED_ALLOCATIONS);
size_t nextIndex = 0;
// Some looping code
auto myNewClass = new(memoryBuffer[nextIndex++]) MyClass;
// Some code
myNewClass->~MyClass();
free(memoryBuffer);
Your code will most likely become a lot more complex, though you will most likely tackle this bottleneck. In case you have to return this new class, you even need even more code just to do memory management.
Given this information, you can write your own implementation of allocators for your STL, override the 'new' and 'delete' operators ...
If that would not be enough, try challenging the limitations. Why can you only do a fixed number of allocations, is this because of unique locking? If so, can we improve this? Why do you need that many allocations, would changing the algorithm that is being used fix this issue ...
... the application can only perform N memory allocations per second,
no matter how many threads we have allocating memory. Why would we be
seeing this behavior and how might we increase the rate at which the
application can allocate memory.
IMHO, the most likely cause is that the allocations are coming from a common system pool.
Because they share a pool, each thread has to gain access thru some critical section blocking mechanism (perhaps a semaphore).
The more threads competing for dynamic memory (i.e. using new) will cause more critical section blocking.
The context switch between tasks is the time waste here.
How increase the rate?
option 1 - serialize the usage ... and this means, of course, that you can not simply try to use a semaphore at another level. For one system I worked on, a high dynamic memory utilization happened during system start up. In that case, it was easiest to change the start up such that thread n+1 (of this collection) only started after thread n had completed its initialization and fell into its wait-for-input loop. With only 1 thread doing its start up thing at a time, (and very few other dynamic memory users yet running) no critical section blockage occurred. 4 simultaneous start ups would take 30 seconds. 4 serialized startups finished in 5 seconds.
option 2 - provide a pool of ram and a private new/delete for each particular thread. If only one thread access a pool at a time, a critical section or semaphore is not needed. In an embedded system, the challenge here is allocate a reasonable amount of private pool for the thread and not too much waste. On a desktop with multi-gigabytes of ram, this is probably less of a problem.
I believe you could use a separate thread which could be responsible for memory allocation. This thread would have a queue containing a map of thread identifiers and needed memory allocation. Threads would not directly allocate memory, but rather send an allocation request to the queue and go into a wait state. The queue, on its turn would try to process each requested memory allocation from the queue and wake the corresponding sleeping thread up. When the thread responsible for memory handling can not process an allocation due to limitation, it should wait until memory can be allocated again.
One could build another layer into the solution as #Tersosauros's solution suggested to slightly optimize speed, but it should be based on something like the idea above nonetheless.

How to guarantee that when a process calls malloc(), it will allocate physical memory immediately?

I am looking for a way to pre-allocate memory to a process (PHYSICAL memory), so that it will be absolutely guaranteed to be available to the C++ heap when I call new/malloc. I need this memory to be available to my process regardless of what other processes are trying to do with the system memory. In other words, I want to reserve physical memory to the C++ heap, so that it will be available immediately when I call malloc().
Here are the details:
I am developing a real-time system. The system is composed of several memory-hungry processes. Process A is the mission-critical process and it must survive and be immune to bad behavior of any other processes. It usually fits in 0.5 GB of memory, but it sometimes needs as much as 2.5 GB. The other processes attempt to use any amount of memory.
My concern is that the other processes may allocate lots of memory, exhausting the physical memory reserves in the system. Then, when Process A needs more memory FAST, it's not available, and the system will have to swap pages, which would take a long time.
It is critical that Process A get all the memory it needs without delay, whereas I'm fine with the other processes failing.
I'm running on Windows 7 64-bit.
Edit:
Would SetProcessWorkingSetSize work? Meaning: Would calling this for a big enough amount of memory protect my process A from any other process in the system.
VirtualLock is what you're looking for. It will force the OS to keep the pages in memory, as long as they're in the working set size (which is the function linked to by MK in his answer). However, there is no way to feed this memory to malloc/new- you'll have to implement your own memory allocator.
I think this question is weird because Windows 7 is not exactly the OS of choice for realtime applications. That said, there appears to be an interface that might help you:
AllocateUserPhysicalPages

How to write a C or C++ program to act as a memory and CPU cycle filler?

I want to test a program's memory management capabilities, for example (say, program name is director)
What happens if some other processes take up too much memory, and there is too less memory for director to run? How does director behave?
What happens if too many of the CPU cycles are used by some other program while director is running?
What happens if memory used by the other programs is freed after sometime? How does director claim the memory and start working at full capabilities.
etc.
I'll be doing these experiments on a Unix machine. One way is to limit the amount of memory available to the process using ulimit, but there is no good way to have control over the CPU cycle utilization.
I have another idea. What if I write some program in C or C++ that acts as a dynamic memory and CPU filler, i.e. does nothing useful but eats up memory and/or CPU cycles anyways?
I need some ideas on how such a program should be structured. I need to have dynamic(runtime) control over memory used and CPU used.
I think that creating a lot of threads would be a good way to clog up the CPU cycles. Is that right?
Is there a better approach that I can use?
Any ideas/suggestions/comments are welcome.
http://weather.ou.edu/~apw/projects/stress/
Stress is a deliberately simple workload generator for POSIX systems. It imposes a configurable amount of CPU, memory, I/O, and disk stress on the system. It is written in C, and is free software licensed under the GPLv2.
The functionality you seek overlaps the feature set of "test tools". So also check out http://ltp.sourceforge.net/tooltable.php.
If you have a single core this is enough to put stress on a CPU:
while ( true ) {
x++;
}
If you have lots of cores then you need a thread per core.
You make it variably hungry by adding a few sleeps.
As for memory, just allocate lots.
There are several problems with such a design:
In a virtual memory system, memory size is effectively unlimited. (Well, it's limited by your hard disk...) In practice, systems usually run out of address space well before they run out of backing store -- and address space is a per-process resource.
Any reasonable (non realtime) operating system is going to limit how much CPU time and memory your process can use relative to other processes.
It's already been done.
More importantly, I don't see why you would ever want to do this.
Dynamic memory control, you could just allocate or free buffers of a certain size to use or free more or less memory. As for CPU utilization, you will have to get an OS function to check this and periodically check it and see if you need to do useful work.

Staying away from virtual memory in Windows\C++

I'm writing a performance critical application where its essential to store as much data as possible in the physical memory before dumping to disc.
I can use ::GlobalMemoryStatusEx(...) and ::GetProcessMemoryInfo(...) to find out what percentage of physical memory is reserved\free and how much memory my current process handles.
Using this data I can make sure to dump when ~90% of the physical memory is in use or ~90 of the maximum of 2GB per application limit is hit.
However, I would like a method for simply recieving how many bytes are actually left before the system will start using the virtual memory, especially as the application will be compiled for both 32bit and 64bit, whereas the 2 GB limit doesnt exist.
How about this function:
int
bytesLeftUntilVMUsed() {
return 0;
}
it should give the correct result in nearly all cases I think ;)
Imagine running Windows 7 in 256Mb of RAM (MS suggest 1GB minimum). That's effectively what you're asking the user to do by wanting to reseve 90% of available RAM.
The real question is: Why do you need so much RAM? What is the 'performance critical' criteria exactly?
Usually, this kind of question implies there's something horribly wrong with your design.
Update:
Using top of the range RAM (DDR3) would give you a theoretical transfer speed of 12GB/s which equates to reading one 32 bit value every clock cycle with some bandwidth to spare. I'm fairly sure that it is not possible to do anything useful with the data coming into the CPU at that speed - instruction processing stalls would interrupt this flow. The extra, unsued bandwidth can be used to page data to/from a hard disk. Using RAID this transfer rate can be quite high (about 1/16th of the RAM bandwidth). So it would be feasible to transfer data to/from the disk and process it without having any degradation of performance - 16 cycles between reads is all it would take (OK, my maths might be a bit wrong here).
But if you throw Windows into the mix, it all goes to pot. Your memory can go away at any moment, your application can be paused arbitrarily and so on. Locking memory to RAM would have adverse affects on the whole system, thus defeating the purpose of locing the memory.
If you explain what you're trying to acheive and the performance critria, there are many people here that will help develop a suitable solution, because if you have to ask about system limits, you really are doing something wrong.
Even if you're able to stop your application from having memory paged out to disk, you'll still run into the problem that the VMM might be paging out other programs to disk and that might potentially affect your performance as well. Not to mention that another application might start up and consume memory that you're currently occupying and thus resulting in some of your applications memory being paged out. How are you planning to deal with that?
There is a way to use non-pageable memory via the non-paged pool but (a) this pool is comparatively small and (b) it's used by device drivers and might only be usable from inside the kernel. It's also not really recommended to use large chunks of it unless you want to make sure your system isn't that stable.
You might want to revisit the design of your application and try to work around the possibility of having memory paged to disk before you either try to write your own VMM or turn a Windows machine into essentially a DOS box with more memory.
The standard solution is to not worry about "virtual" and worry about "dynamic".
The "virtual" part of virtual memory has to be looked at as a hardware function that you can only defeat by writing your own OS.
The dynamic allocation of objects, however, is simply your application program's design.
Statically allocate simple arrays of the objects you'll need. Use those arrays of objects. Increase and decrease the size of those statically allocated arrays until you have performance problems.
Ouch. Non-paged pool (the amount of RAM which cannot be swapped or allocated to processes) is typically 256 MB. That's 12.5% of RAM on a 2GB machine. If another 90% of physical RAM would be allocated to a process, that leaves either -2,5% for all other applications, services, the kernel and drivers. Even if you'd allocate only 85% for your app, that would still leave only 2,5% = 51 MB.