fork(), shared memory and pointers in C - c++

I cannot share a pointer to an object among processes with shared memory.
I can successfully share a struct like the one below among different processes:
// Data to be shared among processes
struct Shared_data {
int setPointCounter;
struct Point old_point;
pthread_mutex_t my_mutex;
} *shd;
The struct is declared as global (it is located before the main(), let's say).
I initialize the shared memory into the main:
shd = (struct Shared_data *) (mmap(NULL, sizeof *shd, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0));
and then at a certain point I do a fork(). It works fine.
The problem arises when I have to share something like this:
// Data to be shared among processes
struct Shared_data {
int setPointCounter;
struct Point old_point;
MyObject *obj;
pthread_mutex_t my_mutex;
} *shd;
In the main I call a function from a third-party library which returns a pointer to an object of MyObject type and I would like to share it in some way. How can I do this?
By searching on the web I've found something related to relative pointers but I'm not sure it will work and, at the same time, I don't know how to do it the right way.
I'm working on Linux (Slackware 64 14.2) and the language is C/C++ (mostly is C, actually).
Thank you in advance for your time.

Basically a pointer is an address of some data in the virtual space of one process. You cannot share pointers between processes because those pointers will be interpreted in a different address space (in general, this also applies to the kernel mode virtual space and the user mode of the same process). Let's use an example... you have make some allocations (between them you have made a mmap(2) allocation) and then you fork(2) another process. Until then, all memory addresses pointed to the same places if we consider that both processes had existed before the fork, they would have been doing the same, getting the same results, etc.
But once both processes depart from each other, all the allocations made in one process can be differently positioned than the same allocations in the other process, as the different histories of each process could provide different results from any allocator. Just assume some temporal order in the tasks and some time dependant state change... after it, the states of both processes will be different.
Now let's suppose you have a pointer in one process, that points to an structure of data, that you want to share with the other. But the other process has not even allocated the same thing... so passing e.g. 0x00456211ff to the other process will point to a place in the other process where nothing has been allocated (leading to a SIGSEGV signal and process abort)
Once the processes divert their virtual spaces store different things, and virtual pointers in one space are not interpretable in the other process'.

Related

Sharing heap memory in a dll between two separate applications

Sorry if this question has been answered before; however all of the questions that are similar seem to be related to global or static variables in a DLL and sharing of those.
Is it possible to have one instance of a dll shared between two separate applications?
I have two applications (appA, appB) and a DLL (theDLL).
I am seeing if it is possible for appA to call a function in theDLL which then creates a variable on the heap and stores that in theDLL. At a later time, I would like to have appB connect to theDLL and be able to access that variable created earlier. Again, sorry if this answer is the same as static and global variables in dlls.
Here is some psuedo code:
(theDLL)
class VariableHolder
{
public:
void StoreVariable(int x)
{
mInt = new int(x);
}
int GetVariable()
{
return mInt;
}
private:
int mInt;
}
(appA)
int main()
{
...
(assuming we got access to a VariableHolder singleton created in theDLL)
theVarialbeHolder.StoreVariable(5);
...
}
(appB)
int main()
{
...
(assuming we got access to a VariableHolder singleton created in theDLL)
if (theVarialbeHolder.GetVariable() == 5)
{
cout << "Hurray, how did you know?";
}
...
}
This exactly is not possible - as the address spaces for the two processes are different (because they're virtual, having been created by the kernel), so a valid pointer in one won't work within the other. However, you can use shared memory to transport raw scalar data (strings, integers) between processes - here's how.
Yes, this is possible using shared memory. It doesn't need to use a shared DLL though.
Depending on the operating, the approaches are somewhat different:
On Windows, a shared file is used on mapped into memory (see Creating Named Shared Memory).
On Linux and Unix, there are direct functions to create shared memory areas, e.g. System V IPC. Just google for it.
Shared libraries on almost any modern operating system are implemented by shared read-only executable and data pages, mapped simultaneously into the address space of any process that uses the given library. On Windows though (in contrast to most Unix system) this sharing can also be extended to read-write data segments in DLLs, so it is possible to have global variables in a DLL, shared among all images that have the DLL loaded. To achieve this, there is a two-step process involved. First you tell the compiler to put the shared variables in a new named data section:
#pragma data_seg (".myshared")
int iscalar = 0;
int iarray[10] = { 0 };
#pragma data_seg ()
It is important to have all those variables statically intialised otherwise they will end up into the .bss section instead. Then you have to tell the linker that you'd like to have the .myshared section with shared read-write attributes using the /SECTION:.myshared,RWS option.
This mechanism is much simpler than creating and binding to named shared memory objects but it only allows to share statically allocated global variables - you cannot use it to share data on the heap as the heap is private to the process. For anything more complex you should use shared memory mappings, i.e. as shown on the MSDN page, linked in the answer from H2CO3.
This is not possible. The DLL can be shared in the 2 process but the data isn't. It's the code or program image (i.e. the logic or instructions) that is shared and not the data. Every Dll is mapped into the virtual address space of the process that loads it so the data either is on the data section of the process or on stack if it is local to the function. When a process is executing the address of the other process data is not visible.
You need to do some reading on virtual memory and how memory management unit(MMU) works. The OS, CPU, MMU works together to make it possible. The reliable way to do this is inter process communication. You can use shared memory where each process has a copy of data in form of virtual address but it is eventually mapped to same location into the real memory i.e the real address. The OS makes it possible.
This as #H2CO3 pointed out, is not possible because of different address spaces.
However, from your problem, it looks like you need either a surrogate process around that DLL or a Service and then different processes can connect to that surrogate process/exe and use the shared memory.
You must use shared memory (as was written above).
I recommend to use boost interprocess library. See documentation about shared memory - Shared memory between processes

Sharing heap memory with fork()

I am working on implementing a database server in C that will handle requests from multiple clients. I am using fork() to handle connections for individual clients.
The server stores data in the heap, which consists of a root pointer to hash tables of dynamically allocated records. The records are structs that have pointers to various data-types. I would like for the processes to be able to share this data so that, when a client makes a change to the heap, the changes will be visible for the other clients.
I have learned that fork() uses COW (Copy On Write), and my understanding is that it copies the heap (and stack) memory of the parent process when the child tries to modify the data in memory.
I have found out that I can use the shm library to share memory.
Would the code below be a valid way to share heap memory (in shared_string)? If a child were to use similar code (i.e. starting from //start), would other children be able to read/write to it while the child is running and after it's dead?
key_t key;
int shmid;
key = ftok("/tmp",'R');
shmid = shmget(key, 1024, 0644 | IPC_CREAT);
//start
char * string;
string = malloc(sizeof(char) * 10);
strcpy(string, "a string");
char * shared_string;
shared_string = shmat(shmid, string, 0);
strcpy(shared_string, string);
Here are some of my thoughts/concerns regarding this:
I'm thinking about sharing the root pointer of the database. I'm not sure if that would work or if I have to mark all allocated memory as shared.
I'm not sure if the parent / other children are able to access memory allocated by a child.
I'm not sure if a child's allocated memory stays on the heap after it is killed, or if that memory is released.
First of all, fork is completely inappropriate for what you're trying to achieve. Even if you can make it work, it's a horrible hack. In general, fork only works for very simplistic programs anyway, and I would go so far as to say that fork should never be used except followed quickly by exec, but that's aside from the point here. You really should be using threads.
With that said, the only way to have memory that's shared between the parent and child after fork, and where the same pointers are valid in both, is to mmap (or shmat, but that's a lot fuglier) a file or anonymous map with MAP_SHARED prior to the fork. You cannot create new shared memory like this after fork because there's no guarantee that it will get mapped at the same address range in both.
Just don't use fork. It's not the right tool for the job.
I think you are basically looking to do what is done by Redis (and probably others).
They describe it in http://redis.io/topics/persistence (search for "copy-on-write").
threads defeat the purpose
classic shared memory (shm, mapped memory) also defeats the purpose
The primary benefit to using this method is avoidance of locking, which can be a pain to get right.
As far as I understand it the idea of using COW is to:
fork when you want to write, not in advance
the child (re)writes the data to disk, then immediately exits
the parent keeps on doing its work, and detects (SIGCHLD) when the child exited.
If while doing its work the parent ends up making changes to the hash, the kernel
will execute a copy for the affected blocks (right terminology?).
A "dirty flag" is used to track if a new fork is needed to execute a new write.
Things to watch out for:
Make sure only one outstanding child
Transactional safety: write to a temp file first, then move it over so that you always have a complete copy, maybe keeping the previous around if the move is not atomic.
test if you will have issues with other resources that get duplicated (file descriptors, global destructors in c++)
You may want to take gander at the redis code as well
I'm thinking about sharing the root pointer of the database. I'm not sure if that would work or if I have to mark all allocated memory as shared.
Each process will have its own private memory range. Copy-on-write is a kernel-space optimization that is transparent to user space.
As others have said, SHM or mmap'd files are the only way to share memory between separate processes.
If you must you fork, the shared memory seems to be the 'only' choice.
Actually, I think in your scene, the thread is more suitable.
If you don't want to be multi-threaded. Here is another choice,you can only use one-process & one-thread mode, like redis
With this mode,you don't need worry about something like lock and if you want to scale, just design a route policy,as route with the hash value of the key
As you have discovered, if you want to share memory between separate processes (from fork or otherwise), you need to use shared memory, either the SYSV shm library or mmap with MAP_SHARED. Unfortunately, these are coarse-grained tools, suitable only for dealing with a small number of large blocks, and not suitable for fine-grained memory management as you would do with malloc/free.
In order to have useful shared memory between processes, you need to build a heap on top of shm or mmap. You can do that with my small shm_malloc library, which allows you to use calls to shm_malloc and shm_free exactly as you would use malloc/free.

Multi-processing with singletons C++ on Linux x86_64

For the following question, I am looking for an answer that is based on "pure" C/C++ fundamentals, so I would appreciate a non-Boost answer. Thanks.
I have an application (for example, a telecommunications infrastructure server) which will, when started, spawn several processes on a Linux environment (one for logging, one for Timer management, one for protocol messaging, one for message processing etc.). It is on an x86_64 environment on Gentoo. The thing is, I need a singleton to be able to be accessible from all the processes.
This is different from multi-threading using say, POSIX threads, on Linux because the same address space is used by all POSIX threads, but that is not the case when multiple processes, generated by fork () function call, is used. When the same address space is used, the singleton is just the same address in all the threads, and the problem is trivially solved (using the well known protections, which are old hat for everybody on SO). I do enjoy the protections offered to me by multiple processes generated via fork().
Going back to my problem, I feel like the correct way to approach this would be to create the singleton in shared memory, and then pass a handle to the shared memory into the calling tasks.
I imagine the following (SomeSingleton.h):
#include <unistd.h>
#... <usual includes>
#include "SomeGiantObject.h"
int size = 8192; // Enough to contain the SomeSingleton object
int shm_fd = shm_open ("/some_singleton_shm", O_CREAT | O_EXCL | O_RDWR, 0666);
ftruncate (shm_fd, size);
sharedMemoryLocationForSomeSingleton = mmap (NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0);
class SomeSingleton
{
public:
SomeSingleton* getInstance ()
{
return reinterpret_cast<SomeSingleton*>sharedMemoryLocationForSomeSingleton;
}
private:
SomeSingleton();
/*
Whole bunch of attributes that is shared across processes.
These attributes also should be in shared memory.
e.g., in the following
SomeGiantObject* obj;
obj should also be in shared memory.
*/
};
The getInstance() method returns the shared memory location for the SomeSingleton object.
My questions are as follows:
Is this a legitimate way to handle the problem? How have folks on SO handled this problem before?
For the code above to work, I envision a global declaration (static by definition) that points to the shared memory as shown before the class declaration.
Last, but not the least, I know that on Linux, the overheads of creating threads vs. processes is "relatively similar," but I was wondering why there is not much by way of multi-processing discussions on SO (gob loads of multi-threading, though!). There isn't even a tag here! Has multi-processing (using fork()) fallen off favors among the C++ coding community? Any insight on that is also appreciated. Also, may I request someone with a reputation > 1500 to create a tag "multi-processing?" Thanks.
If you create the shared memory region before forking, then it will be mapped at the same address in all peers.
You can use a custom allocator to place contained objects inside the shared region also. This should probably be done before forking as well, but be careful of repetition of destructor calls (destructors that e.g. flush buffers are fine, but anything that makes an object unusable should be skipped, just leak and let the OS reclaim the memory after all processes close the shared memory handle).

Can I pass an object to another process just passing its' pointer to a shared memory?

I have a very complicated class(it has unordered_map and so on on inside it) and I want to share an object of it withit two my processes. Can I simply pass just a pointer to it from one process to another? I think, no, but hope to hear "Yes!".
If "no", I'd be grateful to see any links how to cope in such cases.
I need to have only one instance of this object for all processes because it's very large and all of the processes will work woth it for read only.
You certainly can use IPC to accomplish this, and there are plenty of cases where multiple processes make more sense than a multithreaded process (at least one of the processes is built on legacy code to which you can't make extensive modifications, they would best be written in different languages, you need to minimize the chance of faults in one process affecting the stability of the others, etc.) In a POSIX-compatible environment, you would do
int descriptor = shm_open("/unique_name_here", O_RDWR | O_CREAT, 0777);
if (descriptor < 0) {
/* handle error */
} else {
ftruncate(descriptor, sizeof(Object));
void *ptr = mmap(NULL, sizeof(Object), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED, descriptor, 0);
if (!ptr || ptr == MAP_FAILED)
/* handle error */ ;
Object *obj = new (ptr) Object(arguments);
}
in one process, and then
int descriptor = shm_open("/the_same_name_here", O_RDWR | O_CREAT, 0777);
if (descriptor < 0) {
/* handle error */
} else {
Object *obj = (Object *) mmap(NULL, sizeof(Object), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED, descriptor, 0);
if (!obj || obj == MAP_FAILED)
/* handle error */ ;
}
in the other. There are many more options, and I didn't show the cleanup code when you're done, so you still ought to read the shm_open() and mmap() manpages, but this should get you started. A few things to remember:
/All/ of the memory the object uses needs to be shared. For example, if the Object contains pointers or references to other objects, or dynamically allocated members (including things like containers, std::string, etc.), you'll have to use placement new to create everything (or at least everything that needs to be shared with the other processes) inside the shared memory blob. You don't need a new shm_open() for each object, but you do have to track (in the creating process) their sizes and offsets, which can be error-prone in non-trivial cases and absolutely hair-pulling if you have fancy auto-allocating types such as STL containers.
If any process will be modifying the object after it's been shared, you'll need to provide a separate synchronization mechanism. This is no worse than what you'd do in a multithreaded program, but you do have to think about it.
If the 'client' processes do not need to modify the shared object, you should you should open their handles with O_RDONLY instead of O_RDWR and invoke mmap() without the PROT_WRITE permission flag. If the client processes might make local modifications that need not be shared with the other processes, invoke mmap() with MAP_PRIVATE instead of MAP_SHARED. This will greatly reduce the amount of synchronization required and the risks of screwing it up.
If these processes will be running on a multiuser system and/or the shared data may be sensitive and/or this is a high-availability application, you're going to want more sophisticated access control than what is shown above. Shared memory is a common source of security holes.
No, process do not (naturally) share memory. If boost is an option, so you can have a look on Boost.Interprocess for easy memory sharing.
No, the pointer is meaningless to the other process. The OS creates a separate address space for other processes; by default, they have no idea that other processes are running, or even that such a thing is possible.
The trick here is that the memory has to be mapped the same way in both your processes. If your mapped shared memory can be arranged that way, it'll work, but I bet it'll be very difficult.
There are a couple of other possibilties. First one is to use an array; array indices will work across both processes.
You can also use placement new to make sure you're allocating objects at a known location within the shared memory, and use those offsets.
If you are on linux, you could use shared memory to store common data between processes. For general case, take a look into boost IPC library.
But pointer from one process can not be used in another (it's address can be used, if accessing IO, or some special devices)
If you use Qt4 there's QSharedMemory or you could use sockets and a custom serialization protocol.

Is it possible to use function pointers across processes?

I'm aware that each process creates it's own memory address space, however I was wondering,
If Process A was to have a function like :
int DoStuff() { return 1; }
and a pointer typedef like :
typedef int(DoStuff_f*)();
and a getter function like :
DoStuff_f * getDoStuff() { return DoStuff; }
and a magical way to communicate with Process B via... say boost::interprocess
would it be possible to pass the function pointer to process B and call
Process A's DoStuff from Process B directly?
No. All a function pointer is is an address in your process's address space. It has no intrinsic marker that is unique to different processes. So, even if your function pointer just happened to still be valid once you've moved it over to B, it would call that function on behalf of process B.
For example, if you had
////PROCESS A////
int processA_myfun() { return 3; }
// get a pointer to pA_mf and pass it to process B
////PROCESS B////
int processB_myfun() { return 4; } // This happens to be at the same virtual address as pA_myfun
// get address from process A
int x = call_myfun(); // call via the pointer
x == 4; // x is 4, because we called process B's version!
If process A and B are running the same code, you might end up with identical functions at identical addresses - but you'll still be working with B's data structures and global memory! So the short answer is, no, this is not how you want to do this!
Also, security measures such as address space layout randomization could prevent these sort of "tricks" from ever working.
You're confusing IPC and RPC. IPC is for communicating data, such as your objects or a blob of text. RPC is for causing code to be executed in a remote process.
In short, you cannot use function pointer that passed to another process.
Codes of function are located in protected pages of memory, you cannot write to them. And each process has isolated virtual address space, so address of function is not valid in another process. In Windows you could use technique described in this article to inject your code in another process, but latest version of Windows rejects it.
Instead of passing function pointer, you should consider creating a library which will be used in both processes. In this case you could send message to another process when you need to call that function.
If you tried to use process A's function pointer from process B, you wouldn't be calling process A - you'd call whatever is at the same address in process B. If they are the same program you might get lucky and it will be the same code, but it won't have access to any of the data contained in process A.
A function pointer won't work for this, because it only contains the starting address for the code; if the code in question doesn't exist in the other process, or (due to something like address space randomization) is at a different location, the function pointer will be useless; in the second process, it will point to something, or nothing, but almost certainly not where you want it to.
You could, if you were insane^Wdaring, copy the actual instruction sequence onto the shared memory and then have the second process jump directly to it - but even if you could get this to work, the function would still run in Process B, not Process A.
It sounds like what you want is actually some sort of message-passing or RPC system.
This is why people have invented things like COM, RPC and CORBA. Each of them gives this general kind of capability. As you'd guess, each does so the job a bit differently from the others.
Boost IPC doesn't really support remote procedure calls. It will enable putting a variable in shared memory so its accessible to two processes, but if you want to use a getter/setter to access that variable, you'll have to do that yourself.
Those are all basically wrappers to produce a "palatable" version of something you can do without them though. In Windows, for example, you can put a variable in shared memory on your own. You can do the same in Linux. The Boost library is a fairly "thin" library around those, that lets you write the same code for Windows or Linux, but doesn't try to build a lot on top of that. CORBA (for one example) is a much thicker layer, providing a relatively complete distributed environment.
If both processes are in the same application, then this should work. If you are trying to send function pointers between applications then you are out of luck.
My original answer was correct if you assume a process and a thread are the same thing, which they're not. The other answers are correct - different processes cannot share function pointers (or any other kind of pointers, for that matter).