I'm trying to put up a simple but portable way to use shared memory. Boost::interprocess seemed like a good place to start, but I ran into some problems/concerns.
Can I somehow query existence of shm segment, preferably using boost API? I could always try to create it using create_only and catch an exception, but that's bad design, I don't want stack unwinding in the "good" path.
Can I truncate the segment even while other processes are attached? (provided I'll handle the synchronization, ofcourse) I suppose all the other processes would have to re-map, would they also have to re-attach?
Boost doc says that on windows the portable shared_memory_object isn't actually shared memory per se, but rather a memory-mapped file. Did I understand that correctly? This means I'll have to use specialized code on windows, which I was trying to avoid. Makes me question Boost's fitness for my purpose, is there an alternative? Instead of fighting with boost, I might as well write platform-specific code myself - in your opinion, would that be worth the effort?
no that is not bad design. It is a standard way to do so with IPCs. You (usually) don't have access to the naming system and have a kind of list of existing objects like a filesystem.
If I remember well: if you truncate while mapped, there will be memory violations on new invalidated addresses (if memory protection is supported). You don't have to remap, you just have to take care about what you are doing. I'm not sure that you really need to truncate a SHM but that's your problem.
that is not a problem, it just means that the underlying object is a file because standard semantic for SHM include persistence. But don't care, that is boost internals tricks. The semantic is the one you want, so use it to get portability!
Related
I've developed a library in C++ that allows multi-threaded usage. I want to support an option for the caller to specify a cap on the memory allocated by a given thread. (We can ignore the case of one thread allocating memory and others using it.)
Possibly making this more complicated is that my library uses various open source components (boost, ICU, etc), some of which are statically linked and others dynamically.
One option I've been looking into is overriding the allocation functions (new/delete/etc) to do the bookkeeping per thread ID. Natural concerns come up around the bookkeeping: performance, etc.
But an even bigger question/concern is whether this approach will work with the open source components without code changes to them?
I can't seem to find pre-existing solutions for this, though it seems to me like it's not very unusual.
Any suggestions on this approach, or another approach?
EDIT: More background: The library can allocate a significantly large range of memory per calling thread depending on the input provided (ie. KBs to GBs).
So the goal of this request is to (more graciously & deterministically) support running in RAM-constrained environments. This is not for a hard-real-time environment with strict memory limits--it's to support a number of concurrent threads which each have a "safe" allocation cap to avoid engaging the page/swap file.
Basic example use case: a system with 32GB RAM, 20GB free, the application using my library may configure itself to use a max of 10 threads and configure the library to use a max of 1GB per thread.
Upon hitting the cap the current thread's call into the library will cease further work and return a suitable error. (The code is already fully RAII so unwinding cleanly is easy.)
BTW I found some interesting content on the web already, sadly none provide a lot of hope for a "simple & effective" solution. But this one is especially insightful.
Why doesn't boost interprocess support write only memory mapping?
Maybe I'm missing something but wouldn't a write only mapping be significantly faster than a read/write mapping as the OS doesn't have to read in the pages from the disk, just flush out pages from memory to the disk? Also it would have the benefit of being entirely non blocking (except for flushing and destruction).
Would I benefit by switching from boost to native OS memory mapping?
In fact if you allocate a new memory-mapped file of size, say, 20Gb, you'll get a sparse file allocation of that size.
When "mapping in" pages of that files, there need to be a read operation (as the OS might be able to tell that the page is not physically present yet on disk), and only when (if) those pages are dirtied need they be written out.
Of course, this is implementation dependent and I don't think POSIX (can) guarantee this, but it's not unreasonable behaviour IYAM, and would be the equivalent of write-only mapping.
Actually, a write-only memmap would not be faster, as the OS can only keep track of changes / provide those mappings in whole-page granularity.
At least, if you want to avoid the prohibitive cost of simulating all access to such pages in kernel-land (not implemented) instead of just mapping a page.
Somehow, I doubt going directly to the OS API instead of going through the Boost-API could provide any significant speed-ups:
The boost API is a thin wrapper over the OS-specific interface and will be completely inlined and thus compiled out by any decent compiler.
How can I reserve and allocate shared memory without the backing of a file? I'm trying to reserve a large (many tens of GiBs) chunk of shared memory and use it in multiple processes as a form of IPC. However, most of this chunk won't be touched at all (the access will be really sparse; maybe a few hundred megabytes throughout the life of the processes) and I don't care about the data when the applications end.
So preferably, the method to do this should have the following properties:
Doesn't commit the whole range. I will choose which parts to commit (actually use.) (But the pattern is quite unpredictable.)
Doesn't need a memory-mapped file or anything like that. I don't need to preserve the data.
Lets me access the memory area from multiple processes (I'll handle the locking explicitly.)
Works in both Linux and Windows (obviously a 64-bit OS is needed.)
Actually uses shared memory. I need the performance.
(NEW) The OS or the library doesn't try to initialize the reserved region (to zero or whatever.) This is obviously impractical and unnecessary.
I've been experimenting with boost::interprocess::shared_memory_object, but that causes a large file to be created on the filesystem (with the same size as my mapped memory region.) It does remove the file afterwards, but that hardly helps.
Any help/advice/pointers/reference is appreciated.
P.S. I do know how to do this on Windows using the native API. And POSIX seems to have the same functionality (only with a cleaner interface!) I'm looking for a cross-platform way here.
UPDATE: I did a bit of digging, and it turns out that the support that I thought existed in Windows was only a special case of memory-mapped files, using the system page file as a backing. (I'd never noticed it before because I had used at most a few megabytes of shared memory in the past projects.)
Also, I have a new requirement now (the number 6 above.)
On Windows, all memory has to be backed by the disk one way or another.
The closest I think you can accomplish on windows would be to memory map a sparse file. I think this will work in your case for two reasons:
On Windows, only the shared memory that you actually touch will become resident. This meets the first requirement.
Because the file is sparse, only the parts that have been modified will actually be stored on disk. If you looked at the properties of the file, it would say something like, "Size: 500 MB, Size on disk: 32 KB".
I realize that this technically doesn't meet your 2nd requirement, but, unfortunately, I don't think that is possible on Windows. At least with this approach, only the regions you actually use will take up space. (And only when Windows decides to commit the memory to disk, which it may or may not do at its discretion.)
As for turning this into a cross-platform solution, one option would be to modify boost::interprocess so that it creates sparse files on Windows. I believe boost::interprocess already meets your requirements on Linux, where POSIX shared memory is available.
I am looking for a way to cure at least the symptoms of a leaky DLL i have to use. While the library (OpenCascade) claims to provides a memory manager, i have as of yet being unable to make it release any memory it allocated.
I would at least wish to put the calls to this module in a 'sandbox', in order to keep my application from not losing memory while the OCC-Module isn't even running any more.
My question is: While I realise that it would be an UGLY HACK (TM) to do so, is it possible to preallocate a stretch of memory to be used specifically by the libraries, or to build some kind of sandbox around it so i can track what areas of memory they used in order to release them myself when i am finished?
Or would that be to ugly a hack and I should try to resolve the issues otherwise?
The only reliable way is to separate use of the library into a dedicated process. You will start that process, pass data and parameters to it, run the library code, retrieve results. Once you decide the memory consumption is no longer tolerable you restart the process.
Using a library that isn't broken would probably be much easier, but if a replacement ins't available you could try intercepting the allocation calls. If the library isn't too badly 'optimized' (specifically function inlining) you could disassemble it and locate the malloc and free functions; on loading, you could replace every 4 (or 8 on p64 system) byte sequence that encodes that address with one that points to your own memory allocator. This is almost guaranteed to be a buggy, unreadable timesink, though, so don't do this if you can find a working replacement.
Edit:
Saw #sharptooth's answer, which has a much better chance of working. I'd still advise trying to find a replacement though.
You should ask Roman Lygin's opinion - he used to work at occ. He has at least one post that mentions memory management http://opencascade.blogspot.com/2009/06/developing-parallel-applications-with_23.html.
If you ask nicely, he might even write a post that explains mmgt's internals.
I've long had a desire for an STLish container that I could place into a shared memory segment or a memory mapped file.
I've considered the use of a custom allocator and placement new to place a regular STL container into a shared memory segment. (like this ddj article). The problem is that STL containers will internally have pointers to the memory they own. Therefore, if the shared memory segment or memory mapped file loads at a different base address (perhaps on a subsequent run, or in a second process), then the internal pointers are suddenly invalid. As far as I can figure out, the custom allocator approach only works if you can always map the memory segment into your process at the same address. At least with memory mapped files, I have lots of experience of that NOT being the case if you just let the system map it where ever it feels like.
I've had some thoughts on how to do this, but I'd like to avoid it if someone else has already done the work (that's me, being lazy).
I'm currently leaving locking out of the discussion, as the best locking strategy is highly application dependent.
The best starting point for this is probably the boost Interprocess libraries. They have a good example of a map in shared memory here:
interprocess map
You will probably also want to read the section on offset smart pointers, which solves the internal pointer problem you were referring to.
Offset Pointer
You may also want to checkout the Intel Threading Building Blocks (TBB) Containers.
I always had good experiences (years ago) with ACE. Its a networking/communication framework, but has a section on shared memory.
I only know of proprietary versions. Bloomberg and EA have both published about their STL versions, but havent released ( to my knowledge ) the fruits of their labor.
Try using Qt's QSharedMemory Implementation.