Boost managed_mapped_file: setting maximum allowed memory usage

Boost managed_mapped_file: setting maximum allowed memory usage - c++

Is there any way to set the maximum allowed memory used by managed_mapped_file? For example, I have 64GB of memory and I create a 20GB file. This is all loaded into memory. Is there a way to specify to only use 1GB of memory for example? Even approximately will suffice.
EDIT: I should add I use boost::interprocess::vector.. maybe there is a way to specialize the allocator?
typedef bi::allocator<Node, bi::managed_mapped_file::segment_manager> allocator_node_t;
typedef bi::vector<Node, allocator_node_t> vector_node_t;
bi::managed_mapped_file* nodeFile = new bi::managed_mapped_file(bi::open_or_create, "nodes_m.bin", bigSize);
allocator_node_t alloc_n(nodeFile->get_segment_manager());
vector_node_t* nodes = nodeFile->find_or_construct<vector_node_t>("nodes")(alloc_n);

There's not such a way (portably).
Also the premise is wrong:
For example, I have 64GB of memory and I create a 20GB file. This is all loaded into memory
Wrong: it will load only the pages that are used. Yes, this may mean that you might end up having the full 20GB in memory. The OS is free to do that as long as no other process requires the physical memory for other tasks.
It would be silly for the OS to arbitrarily unmap that data any reason. You want the OS to take advantage of available memory. Otherwise the money on those silicon chips was wasted.
EDIT: I should add I use boost::interprocess::vector.. maybe there is a way to specialize the allocator?
Using boost::interprocess::vector without a custom allocator doesn't use shared memory in the first place. You need to use e.g. boost::interprocess::allocator<T, boost::interprocess::managed_mapped_file::segment_manager> to use the mapped file in the first place.
And no, nothing in the allocator can override the OS virtual memory tuning parameters.
Nothing needs to be specialized (in the C++ sense of the word)
bi::managed_mapped_file* nodeFile = new bi::managed_mapped_file(bi::open_or_create, "nodes_m.bin", bigSize);
allocator_node_t alloc_n(nodeFile->get_segment_manager());
vector_node_t* nodes = nodeFile->find_or_construct<vector_node_t>("nodes")(alloc_n);
Executing this code the first time around (i.e. creating "nodes_m.bin") will not load bigSize. In fact it will not even allocate bigSize on disk! On all systems that support it (I know of no mainstream OS that doesn't) the file is created sparse.

Related

Can I resize Linux shared memory with shmctl?

I have a C++ application that allocates shared memory on a Linux system via shmget(2). The data that I store in the shared memory grows periodically, and I'd like to resize the shared memory in a way analogous to the way realloc() grows regular memory. Is there a way to do this? I found a doc on IBM's site that mentions a SHM_SIZE command, but the Linux and BSD manpages do not have it, even in the Linux-specific sections.

Simple answer: there is no easy way.
The reasons are pretty logical. Shared memory is being attached to virtual space of every process individually. Each process has it's own virtuall address space. Each process is free to attach the segment at any (not literally, alignment sets some restrictions) arbitrary address. How can system guarantee that, let's say by extending area by 4MiB, every 'user' of this segment will be able to fit bigget block at the same starting address where the smaller segment previously was?
But you should not give up! You can be creative. You can come up with the idea of having one header segment, where you store information about real payload segment. You can make every process to obey some rules as for example: reattach payload segment when its id, as described somewhere in header segment, does not match the known one.
The advice: I suspect you know this, but never keep pointers to data within shared region, only offset.
I hope you'll have some use of my gibberish.

It seems for me that you can write your own memory manager for your purpose. The conception is quite simple:
You have a shared memory block which size is N bytes;
Allocate new block of shared memory with 2*N size;
Copy memory from one block to another;
Free the old shared memory block;
Wrap the #2-4 into some routine and use it;
I'm afraid we have nothing more to do with that. This is how std::vector is implemented. And void *realloc() in most of cases will return you the pointer to the new block of memory (but not to the extended old block).

It seems to me that the function mremap was implemented to perform what you want.
You just have to precise in argument the old size and the new one of the shared memory segment. And if you add the flag MREMAP_MAYMOV, it allows to move the shared memory segment if needed (i.e if not enough free space just after the old shared memory segment).
Look at the manpage : http://man7.org/linux/man-pages/man2/mremap.2.html.

how to manage large arrays

I have a c++ program that uses several very large arrays of doubles, and I want to reduce the memory footprint of this particular part of the program. Currently, I'm allocating 100 of them and they can be 100 Mb each.
Now, I do have the advantage, that eventually parts of these arrays become obsolete during later parts of the program's execution, and there is little need to ever have the whole of any one of then in memory at any one time.
My question is this:
Is there any way of telling the OS after I have created the array with new or malloc that a part of it is unnecessary any more ?
I'm coming to the conclusion that the only way to achieve this is going to be to declare an array of pointers, each of which may point to a chunk say 1Mb of the desired array, so that old chunks that are not needed any more can be reused for new bits of the array. This seems to me like writing a custom memory manager which does seem like a bit of a sledgehammer, that's going to create a bit of a performance hit as well
I can't move the data in the array because it is going to cause too many thread contention issues. the arrays may be accessed by any one of a large number of threads at any time, though only one thread ever writes to any given array.

It depends on the operating system. POSIX - including Linux - has the system call madvise to do improve memory performance. From the man page:
The madvise() system call advises the kernel about how to handle paging input/output in the address range beginning at address addr and with size length bytes. It allows an application to tell the kernel how it expects to use some mapped or shared memory areas, so that the kernel can choose appropriate read-ahead and caching techniques. This call does not influence the semantics of the application (except in the case of MADV_DONTNEED), but may influence its performance. The kernel is free to ignore the advice.
See the man page of madvise for more information.
Edit: Apparently, the above description was not clear enough. So, here are some more details, and some of them are specific to Linux.
You can use mmap to allocate a block of memory (directly from the OS instead of the libc), that is not backed by any file. For large chunks of memory, malloc is doing exactly the same thing. You have to use munmap to release the memory - regardless of the usage of madvise:
void* data = ::mmap(nullptr, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// ...
::munmap(data, size);
If you want to get rid of some parts of this chunk, you can use madvise to tell the kernel to do so:
madvise(static_cast<unsigned char*>(data) + 7 * page_size,
3 * page_size, MADV_DONTNEED);
The address range is still valid, but it is no longer backed - neither by physical RAM nor by storage. If you access the pages later, the kernel will allocate some new pages on the fly and re-initialize them to zero. Be aware, that the dontneed pages are also part of the virtual memory size of the process. It might be necessary to make some configuration changes to the virtual memory management, e.g. activating over-commit.

It would be easier to answer if we had more details.
1°) The answer to the question "Is there any way of telling the OS after I have created the array with new or malloc that a part of it is unnecessary any more ?" is "not really". That's the point of C and C++, and any language that let you handle memory manually.
2°) If you're using C++ and not C, you should not be using malloc.
3°) Nor arrays, unless for a very specific reason. Use a std::vector.
4°) Preferably, if you need to change often the content of the array and reduce the memory footprint, use a linked list (std::list), though it'll be more expensive to "access" individually the content of the list (but will be almost as fast if you only iterate through it).

A std::deque with pointers to std::array<double,LARGE_NUMBER> may do the job, but you better make a dedicated container with the deque, so you can remap the indexes and most importantly, define when entries are not used anymore.
The dedicated container can also contain a read/write lock, so it can be used in a thread-safe way.

You could try using lists instead of arrays. Of course list is 'heavyer' than array but on the other hand it is easy to reconstruct a list so that you can throw away a part of it when it becomes obsolete. You could also use a wrapper which would only contain indexes saying which part of the list is up-to-date and which part may be reused.
This will help you improve performance, but will require a little bit more (reusable) memory.

Allocating by chunk and delete[]-ing and new[]-ing on the way seems like the good solution. It may be possible to do as little as memory management as possible. Do not reuse chunk yourself, simply deallocate old one and allocate new chunks when needed.

Does any OS allow moving memory from one address to another without physically copying it?

memcpy/memmove duplicate (copy the data) from source to destination. Does anything exist to move pages from one virtual address to another without doing an actual byte by byte copy of the source data? It seems to be perfectly possible to me, but does any operating system actually allow this? It seems odd to me that dynamic arrays are such a widespread and popular concept but that growing them by physically copying is such a wasteful operation. It just doesn't scale when you start talking about array sizes in the gigabytes (e.g. imagine growing a 100GB array into a 200GB array. That's a problem that's entirely possible on servers in the < $10K range now.
void* very_large_buffer = VirtualAlloc(NULL, 2GB, MEM_COMMIT);
// Populate very_large_buffer, run out of space.
// Allocate buffer twice as large, but don't actually allocate
// physical memory, just reserve the address space.
void* even_bigger_buffer = VirtualAlloc(NULL, 4GB, MEM_RESERVE);
// Remap the physical memory from very_large_buffer to even_bigger_buffer without copying
// (i.e. don't copy 2GB of data, just copy the mapping of virtual pages to physical pages)
// Does any OS provide support for an operation like this?
MoveMemory(very_large_buffer, even_bigger_buffer, 2GB)
// Now very_large_buffer no longer has any physical memory pages associated with it
VirtualFree(very_large_buffer)

To some extent, you can do that with mremap on Linux.
That call plays with the process's page table to do a zero-copy reallocation if it can. It is not possible in all cases (address space fragmentation, and simply the presence of other existing mappings are an issue).
The man page actually says this:
mremap() changes the mapping between virtual addresses and memory pages. This can be used to implement a very efficient realloc(3).

Yes it's a common use of memory mapped files to 'move' or copy memory between process by mapping different views of the file

Every POSIX system is able to do this. If you use mmap with a file descriptor (obtained by open or shm_open) and not anonymously you can unmap it, then truncate (shrink or grow) and then map it again. You may and often will get a different virtual address for the same pages.

I mean, you'd never be able to absolutely guarantee that there would be no active memory in that next 100GB, so you might not be able to make it contiguous.
On the other hand, you could use a ragged array (an array of arrays) where the arrays do not have to be next to each other (or even the same size). Many of the advantages of dynamic arrays may not scale to the 100GB realm.

Is there a way to make sure an array variable (unsigned int*) will be in memory?

I need to set some default value for all entires in a very large array.
It takes me quite long time (110-120 ms) and i suspect it happens because of misses in memory.
I use memset/std:fill to set the default value. Is there a way to make sure that the array will reside in memory before the memset/fill?

Assuming this is a large memory-mapped file, you can use the madvise() libc call with the MADV_WILLNEED argument to hint to the OS that you'll be wanting to access the region mentioned soon.
However YMMV, as the array needs to be large enough that the benefit of the resulting syscall isn't outweighed by the cost of making the call.

You can lock memory at per-page granuality using mlock, though only up to a fixed amount (I'm not sure what the limit is on OS X, but you can check it using getrlimit with RLIMIT_MEMLOCK).

Most likely you have a multiple core processor and functions like memset actually degrade in performance when not used on single core CPUs. It's possible that mutex locking are causing the slowdown. Try allocating memory on the stack instead of dynamic memory. Since it's a very large array then I would experiment making my own memory manager and store segments of it in multiple threads (but that's just an idea I had after reading an article fast). A standard way of doing it would be to use one memory allocator per thread. In any case I would look into something else than memset.
Maybe the following aticle would help

new[] doesn't decrease available memory until populated

This is in C++ on CentOS 64bit using G++ 4.1.2.
We're writing a test application to load up the memory usage on a system by n Gigabytes. The idea being that the overall system load gets monitored through SNMP etc. So this is just a way of exercising the monitoring.
What we've seen however is that simply doing:
char* p = new char[1000000000];
doesn't affect the memory used as shown in either top or free -m
The memory allocation only seems to become "real" once the memory is written to:
memcpy(p, 'a', 1000000000); //shows an increase in mem usage of 1GB
But we have to write to all of the memory, simply writing to the first element does not show an increase in the used memory:
p[0] = 'a'; //does not show an increase of 1GB.
Is this normal, has the memory actually been allocated fully? I'm not sure if it's the tools we are using (top and free -m) that are displaying incorrect values or whether there is something clever going on in the compiler or in the runtime and/or kernel.
This behavior is seen even in a debug build with optimizations turned off.
It was my understanding that a new[] allocated the memory immediately. Does the C++ runtime delay this actual allocation until later on when it is accessed. In that case can an out of memory exception be deferred until well after the actual allocation of the memory until the memory is accessed?
As it is it is not a problem for us, but it would be nice to know why this is occurring the way it is!
Cheers!
Edit:
I don't want to know about how we should be using Vectors, this isn't OO / C++ / the current way of doing things etc etc. I just want to know why this is happening the way it is, rather than have suggestions for alternative ways of trying it.

When your library allocates memory from the OS, the OS will just reserve an address range in the process's virtual address space. There's no reason for the OS to actually provide this memory until you use it - as you demonstrated.
If you look at e.g. /proc/self/maps you'll see the address range. If you look at top's memory use you won't see it - you're not using it yet.

Please look up for overcommit. Linux by default doesn't reserve memory until it is accessed. And if you end up by needing more memory than available, you don't get an error but a random process is killed. You can control this behavior with /proc/sys/vm/*.
IMO, overcommit should be a per process setting, not a global one. And the default should be no overcommit.

About the second half of your question:
The language standard doesn't allow any delays in throwing a bad_alloc. That must happen as an alternative to new[] returning a pointer. It cannot happen later!
Some OSs might try to overcommit memory allocations, and fail later. That is not conforming to the C++ language standard.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js