Store and manipulate an array of objects in shared memory - c++

My original question is here.
I'd like to write an array of objects into shared memory. Let's assume we know the starting address of the shared memory. How should I store the array of objects into the shared memory and manipulate the array later(e.g.,access one particular object in the array and even the fields of that object)? Do I have to serialize the objects into the memory and implement relevant access methods myself or does C++ have memory management mechanisms to deal with the details?

This isn't a particularly thought out answer, but I can't see where you're stuck since you didn't provide any code to give us a hint.
There is a sample program here - Sample Shared Memory Program - with sufficient commenting to make you understand how to achieve what you're asking though.
So, I'd say read that through carefully and give it a shot :)

Related

Should I free long-lived memory that would normally be freed at the very end of the program?

I am currently writing a library that parse some structured binary data into a set of objects. These objects are expected to outlive any user code, and would normally be freed at the end or after the main function.
I am using shared (and weak) pointers to manage the memory of each object, but it is causing a lot of added complexity to the program, and raises structural issues that I will not get into in this particular question.
Considering that:
traversing the entirety of the binary data is expensive and I cannot afford to do it more than one time,
each visited entry is used to build an object, that then gets registered (i.e. added into the set),
entries in the binary data may rely on other entries that appears later but gets parsed immediately, and registered when the entry is visited again,
duplicate entries may appear at any moment, but I need to merge those duplicates into one instance (and update any pointer referencing those duplicates to the new merged entry) before registration,
every single one of those objects is guaranteed to be of one of many POD types deriving a common class, so nothing except memory needs to be cleaned up,
the resulting program will run on a modern OS (or in this case, that collects memory from dead processes),
I am very tempted to just use raw pointers, never free the memory taken by those objects and let the OS do its cleanup after the process exits.
What would be the best course of action?
If you're writing reusable code, you need to at least provide the option of cleaning up. What if some program uses your library for one operation, and then continues running? It's not safe to assume that the process exits immediately after your library's task is complete.
The other answers cover the general and standard approach: in an ideal world, yes, you'd clean up your memory, because it makes the code more generic and more reusable and helps with tooling. As others have said, std::unique_ptr for owning pointers and raw pointers for non-owning pointers should work well.
There are a couple of more specialized approaches that may or may not be useful:
Use a pool allocator (such as Boost.Pool, or roll your own) to allocate a bunch of memory up front then dole out pieces of it for your objects. You can then free every object at once by deleting the pool.
Intentionally not freeing memory is occasionally a valid technique. See, e.g., "Increasing Compiler Performance by Over 75%", by Walter Bright. Of course, a compiler is a specialized problem domain, and Walter Bright is probably one of the top compiler developers alive, so techniques that work for his problem domain shouldn't be blindly applied elsewhere.
the resulting program will run on a modern OS (or in this case, that collects memory from dead processes)
I am very tempted to just use raw pointers, never free the memory taken by those objects and let the OS do its cleanup after the process exits.
If you take this approach, then anyone who uses your library and then uses valgrind to try to detect memory leaks in their program will report massive leaks coming from your library and complain to you about it, so if I were you I definitely would not do this.
If you are writing a library then you should provide a cleanup function that frees all memory that you allocated.
A practical example of why this is useful is if a Windows DLL uses your library. When the library is loaded, static data is initialized. When the library is unloaded, static data is cleared. If your library has some global pointers to memory that is never freed, then load-unload cycles of the DLL will leak memory.
If the objects are all of the same type, then rather than allocating each one independently, you could just put them all into a vector and have them refer to each other by index number instead of using pointers. The vector's built-in memory management takes care of allocating space as needed, and when you're done with the objects, you can just destroy the vector to deallocate them all at once. (Note that vector::clear() doesn't actually free the memory, though it does make it available to store a new set of objects in the vector.)
If your objects aren't all the same type, you'll want to look into the more general concept of region-based memory management. As above, the idea is that you can allocate all your objects in a relatively small number of memory chunks (possibly just one), which can be freed later without having to track all the
individual objects allocated within.
If your ownership and lifetimes are clear I suggest you use unique_ptr for the owning pointers and raw pointers for the non-owning pointers. It should be less complex than shared_ptr and weak_ptr whilst still managing memory automatically.
I don't think not managing memory at all is an option. But I think using smart pointers to express ownership is not just about good memory management it also makes code easier to reason about.
Try to think of future maintenance work. Suppose your code needs to be broken up or other stuff done after it. In this case you're opening yourself up to leaks or being a resource hog later down the line.
Cleaning up (or being able to do s) is good. It may seem obvious now that an application should work with a single structured binary dataset throughout its entire lifetime, but you'll start kicking yourself once you realize you need to write an application that needs to reset half-way through and start over with another dataset.
(a related thing that's easy to overlook is that an application may need to work with two completely independent datasets at the same time, so try not to design your library to exclude that use case!)
That said, I think you may be focusing too much on the extremes. Code that shouldn't participate in memory management can use raw pointers, and this is reasonable when there is no risk of these pointers outliving your structured dataset in memory.
However, that doesn't mean that code that does participate in memory management needs to use raw pointers too. You can use smart pointers to manage your data structures even if you are passing raw pointers out to the user.
That aside, keep in mind that, in my experience, pointers are usually the wrong semantics — usually, most use cases are most natural with reference or value semantics, which means you should be passing around raw references, or passing around lightweight wrapper class that have reference or value semantics, but are implemented as containing a pointer to the actual data. Or even as a copy of the actual data if appropriate.

Memory usage of loaded shared objects

I'm working on a Linux-based program that loads many plug-ins in the form of shared objects. What I want to find out is how much memory each shared object, and all its data structures, take at a certain point in time. Is it possible to do that? I can modify both the main program and the plug-in shared objects if needed.
It is not possible dynamically, since it may happen that a shared object A.so is dynamically creating at runtime some object data B which is used then destroyed by shared object C.so
So you cannot say that some data like B "belongs to" a particular shared object; you may (and should) have conventions about that. See RAII, rule of three, smart pointers, ....
The point is that the question "how many memory is used by a given library or shared object" has no sense. Memory and address-space are global to the process, so shared by the main program and all shared objects, libraries, plugins...!
You could however use proc(5) to get information about the entire process. From inside the program, read sequentially /proc/self/maps to get the map of its address space. From outside the program, read /proc/1234/maps for process of pid 1234.
You might want to use valgrind. Read more about memory management, garbage collection, reference counting. You could view your issues as related to resource management or garbage collection. You might want to use Boehm's conservative garbage collector (if using standard C++ containers, you'll want to use Boehm gc_allocator, see this). The point is that liveness of some given data is a global property of the program, not of any particular plugin or function. Think about circular references
What I want to find out is how much memory each shared object, and all
its data structures, take at a certain point in time. Is it possible
to do that?
If the program is running and you have its pid you can examing its memory mappings. For example:
% pmap 1234
[...]
00007f8702f6a000 148K r-x-- libtinfo.so.5.9
00007f8702f8f000 2044K ----- libtinfo.so.5.9
00007f870318e000 16K r---- libtinfo.so.5.9
[...]
This doesn't tell you much about the data structures et al though.

C++ Library free pointer

I am creating a simple C++ MIDI library and I ran into a small problem. Currently, I have implemented everything but the reading of files. Internally, I have a std::vector with raw, dynamically allocated pointers (with the new() keyword) of events, which is an abstract class and can therefore not be instantiated. Now that I've finally started on the reading part, there is a slight problem. I will have to allocate the objects myself and later free them too. This creates problems though, since the user of the library may include events in between or append them. This will mean that there are dynamic pointers that have been created elsewhere in my std::vector, which makes freeing a difficult question.
To make this question more general, I was wondering what I should do with pointers provided by the user of the library. What should I do with them? I was thinking one of the points on the list:
Free all pointers and note that pointers given to the class do not have to be freed any more (which seems strange and counter-intuitive, since the new is matched with the delete in a completely different setting)
Maintain a list of pointers provided by the user and simply skip any pointer in that list (probably not really a solution, because the entire list would have to be checked every time)
Making the creation of events only available with the class, so the user cannot create pointers with the new keyword but only by letting the handling class allocate them.
Forcing the use of shared pointers and using them exclusively in my code, so that they will be automatically freed when they go out of scope.
Maintaining a list of your own pointers and only freeing them, and let the user given pointers go out of scope / they will have to clean them up themselves.
...? (something I have maybe not thought of?)
So please, tell me what is customary in a situation like this, where the user of the library provides pointers which is added to a list maintained by you and then the list goes out of scope, while you have your own pointers mixed with theirs.
Thanks in advance!
Pick a consistent policy. Don't choose any of those options that lead to you have to destroy some objects in some places and other objects in other places. Especially not those approaches in which the user is responsible for destroying some objects but not others. Ownership should be handled uniformly.
My first suggestion would be to avoid dynamically allocating objects completely. Can you not store a std::vector<Event> and pass Events by value to and from your library? Then the user can happily not care about the ownership of objects, but they can choose to dynamically allocate them if they wish.
If you really need to dynamically allocate objects, I suggest that you always wrap them in smart pointers so that ownership is managed automatically. If, for example, you are allocating some object on behalf of the user, the standard interface for this is something like:
std::unique_ptr<Object> createObject();
If, on the other hand, your library has some internal dynamically allocated that it needs to share with the user, return a std::shared_ptr.

Using std::set or std::map with shared memory

I am working in a project which have two different processes.
The first process is a cache base on a std::map or std::set which allocate all the data in a share memory region.
The second process is a producer/consumer which will have access to the share memory, so whenever it needs some data it will ask through an unix pipe to the cache process the starting address of the shared memory which contain the requested data.
So far, I came up with two approaches, first is changing the allocation function for std::set to always allocate in the shared memory, or maybe in a easier approach storing as the value of the map a pointer to that shared region:
map<key, pointer to share region>
Any idea? :D
Thanks!!
In theory, you can use a custom allocator for std::set or std::map to do this. Of course, you'll have to ensure that any contents that might dynamically allocate also use the same custom allocator.
The real problem is that the mapped addresses of the shared memory might not be the same. It's often possible to work around this by using mmap and specifying the address, but the address range must be free in both processes. I've done this under Solaris, which always allocates (or allocated) static and heap at the bottom of the address space, and stack at the top, leaving a big hole in the middle, but even there, I don't think there was any guarantee, and other systems have different policies. Still, if the processes aren't too big otherwise, you may be able to find a solution empirically. (I'd recommend making the address and the size a configuration parameter.)
Alternatively, in theory, the allocator defines a pointer type, which the container should use; you should be able to define a pointer type which works with just the offset into the shared memory. I've no experience with this, however, and I fear that it could be very tricky, since the reference type will still be a true reference (and thus a pointer under the hood), and you cannot change this.

Instantiating objects in shared memory C++

We have a need for multiple programs to call functions in a common library. The library functions access and update a common global memory. Each program’s function calls need to see this common global memory. That is one function call needs to see the updates of any prior function call even if called from another program.
For compatibility reasons we have several design constraints on how the functions exposed by the shared library must operate:
Any data items (both standard data types and objects) that are declared globally must be visible to all callers regardless of the thread in which the code is running.
Any data items that are declared locally in a function are only visible inside that function.
Any standard data type or an instance of any class may appear either locally or globally or both.
One solution is to put the library’s common global memory in named shared memory. The first library call would create the named shared memory and initialize it. Subsequent program calls would get the address of the shared memory and use it as a pointer to the global data structure. Object instances declared globally would need to be dynamically allocated in shared memory while object instances declared locally could be placed on the stack or in the local heap of the caller thread. Problems arise because initialized objects in the global memory can create and point to sub-objects which allocate (new) additional memory. These new allocations also need to be in the shared memory and seen by all library callers. Another complication is these objects, which contain strings, files, etc., can also be used in the calling program. When declared in the calling program, the object’s memory is local to the calling program, not shared. So the object’s code needs to handle either case.
It appears to us that the solution will require that we override the global placement new, regular new and delete operators. We found a design for a memory management system that looks like it will work but we haven’t found any actual implementations. If anyone knows of an implementation of Nathan Myers’ memory management design (http://www.cantrip.org/wave12.html?seenIEPage=1) I would appreciate a link to it. Alternatively if anyone knows of another shared memory manager that accommodates dynamically allocating objects I would love to know about it as well. I've checked the Boost libraries and all the other sources I can find but nothing seems to do what we need. We prefer not to have to write one ourselves. Since performance and robustness are important it would be nice to use proven code. Thanks in advance for any ideas/help.
Thanks for the suggestions about the ATL and OSSP libraries. I am checking them out now although I'm afraid ATL is too Wincentric if are target turns out to be Unix.
One other thing now seems clear to us. Since objects can be dynamically created during execution, the memory management scheme must be able to allocate additional pages of shared memory. This is now starting to look like a full-blown heap replacement memory manager.
Take a look at boost.interprocess.
OSSP mm - Shared Memory Allocation:
man 3 mm
As I'm sure you have found, this is a very complex problem, and very difficult to correctly implement. A few tips from my experiences. First of all, you'll definitely want to synchronize access to the shared memory allocations using semaphores. Secondly, any modifications to the shared objects by multiple processes need to be protected by semaphores as well. Finally, you need to think in terms of offsets from the start of the shared memory region, rather than absolute pointer values, when defining your objects and data structures (it's generally possible for the memory to be mapped at a different address in each attached process, although you can choose a fixed mapping address if you need to). Putting it all together in a robust manner is the hard part. It's easy for shared memory based data structures to become corrupted if a process should unexpectedly die, so some cleanup / recovery mechanism is usually required.
Also study mutexes and semaphores. When two or more entities need to share memory or data, there needs to be a "traffic signal" mechanism to limit write access to only one user.