I am making a server and I use fork() for creating child processes but I have doubts about that. Here are some:
Why do you need to close main socket in a child process and new connection accepted socket in the parent process? (after accepting a new connection) I thought that sockets are only integers with some id which are used to access opened sockets in some system-wide object which is only acceptable through system functions calls. In that case fork would only copy the integer but would have no affect on socket opened.
I have checked and found out that if I fork a process inside a class method, all members are copied. Well, I have found out it is copy-on-edit so does it mean that my server class will be copied in each child that uses a non-constant function? How do I make some memory shared between all such processes (like a list of tasks and each child puts something into it while the parent is reading stuff from it)? I guess fork is not the right function for that. What is the best way?
P.S. I am pretty sure I know the answer to the second question, which is clone(), but just wanted to make sure that's the right function.
Sockets in Unix are File descriptors and they are indeed integers, as seen by the user, but they really are indexes into a table that the kernel maintains per process. In this table each file descriptor (FD) refers to an open file description (OFD) that are system-wide objects maintained in kernel. When you do fork() the opened file descriptors are duplicated and both child's and parent's point to the same OFD. Having two FDs that refer to the same OFD is not usually a problem, but particularly with sockets they can have subtle problems, as the connection is closed only when you close all the FDs that refer to it.
You should really consider using threads (do not close the sockets if you use threads!). clone is a linux system call and is not intended to be used directly. Your alternative is to use shared memory but it is kind of more complex.
The int is a handle, but the socket itself is still associated with the process. The child closes the listening socket mainly for safety reasons (it doesn't need it, and if the child ever spawns another process, that process would inherit the socket as well); the server process closes the new connection's socket because otherwise the connection would remain open until the server process exits (the socket exists as long as at least one process still has a handle to it).
You either want multithreading or a proper shared memory approach. This is where the fun begins.
Shared memory between independent processes comes with interesting problems, but also provides otherwise impossible capabilities (for example, you can restart the master server process and leave the processes serving open connections running, which is difficult to get right as two different versions of the service then have to talk to each other, but allows seamless upgrades without disconnecting clients or interrupting service).
Sharing memory between threads is relatively easy, but threads share the same set of file descriptors, so you do not win much here.
Last, there is a third alternative: an event loop watching multiple sockets, giving attention to each only if something is actually happening. Look at the documentation for the select and poll functions.
Forking duplicates file descriptors, so you have to close the duplicate.
Forking also effectively copies all memory (though in practice it's copy-on-write, so it's not very expensive). You make a new process which is entirely separate from the parent process, unless you explicitly set up some shared memory.
Maybe you intended to spawn a new thread rather than forking a new process?
I think you might want to look through this book as a reference on fork().
Yes you do need to close the socket bound to listen in the child and accepted socket in the parent. The integers aka file handles point to real structures see this so unless you want the kernel to dump a new connection on a child or parent being able to send the data to the connected client you might want to prevent this outright.
To share data for between the processes the best way is shared memory. The book I referred you to will have quite a bit of information regarding that too. In general if you need to share memory without shared memory then you might want to look at threads.
P.S. I'm not sure which clone() method you are referring to. Object copying is done via copy constructors.
Related
I am studying for my final in systems programming and I have a few questions that I cannot answer.
Say a parent process forks off a child process that creates a large object. Can the child pass this object back to the parent fairly easily using just signals?
A parent process forks off a child process, and the child continues running the same program. Is the best way for the parent to give a data structure to a child that was created before the fork to write to a separate file and give that to the child? (This one I am thinking it is not a good way, because the child will still share some of the parents data, including said data structure)
Let us say you write a program to measure how quick a
person's fingers are by trapping SIGINT and then asking them to press
Ctrl-C as rapidly as possible. The SIGINT signal handler increments a
global counter every time Ctrl-C is typed. After a predefined time it
stops and prints the global counter divided by the time used.
What is a fundamental problem with this program?
Any help is appreciated.
Some quick thoughts on your questions;
No, signals are not good for transferring data. Signals involve a lot of overhead and are not queued very effectively.
Many methods of IPC are available. The two that are most popular for UNIX are sockets and shared memory (see shm for instance). Sockets are generally better when talking to un-trusted applications. In your example of forking an application pipes would be applicable as well.
As long as you can handle the interrupt much faster than they are coming in you are OK. Probably in the case with your ctrl-c example you could do the same thing using poll and fcntl (on UNIX) and you would be likely to get better precision.
I know the answer to "why is it this way" is because the language was invented so, but it seems like a lot of wasted effort that fork() spawns a copy of the process that called it. Perhaps it is useful sometimes, but surely the majority of time someone wants to start a new process its not to be a duplicate of the calling one? Why does fork create an identical process and not an empty one or one defined by passing an argument?
From yolinux
The fork() system call will spawn a new child process which is an
identical process to the parent except that has a new system process
ID
In other words when is it useful to start with a copy of the parent process?
One big advantage of having the parent process duplicated in the child is that it allows the parent program to make customizations to the child process' environment before executing it. For example, the parent might want to read the child process' stdout, in which case it needs to set up the pipes in order to allow it to read that before execing the new program.
It's also not as bad as it sounds, efficiency wise. The whole thing is implemented on Linux using copy-on-write semantics for the process' memory (except in the special cases noted in the man page):
Under Linux (and in most unices since version 7, parent of all unices alive now), fork() is implemented using copy-on-write pages, so the only
penalty that it incurs is the time and memory required to duplicate the
parent's page tables (which can be also copy-on-write), and to create a unique task structure for the child.
There are some very legitimate uses of the fork system call. Here are a few examples:
Memory saving. Because fork on any modern UNIX/Linux system shares memory between the child and parent (via copy-on-write semantics), a parent process can load some static data which can be instantly shared to a child process. The zygote process on Android does this: it preloads the Java (Dalvik) runtime and many classes, then simply forks to create new application processes on demand (which inherit a copy of the parent's runtime and loaded classes).
Time saving. A process can perform some expensive initialization procedure (such as Apache loading configuration files and modules), then fork off workers to perform tasks which use the preloaded initialization data.
Arbitrary process customization. On systems that have direct process creation methods (e.g. Windows with CreateProcess, QNX with spawn, etc., these direct process creation APIs tend to be very complex since every possible customization of the process has to be specified in the function call itself. By contrast, with fork/exec, a process can just fork, perform customizations via standard system calls (close, signal, dup, etc.) and then exec when it's ready. fork/exec is consequently one of the simplest process creation APIs in existence, yet simultaneously one of the most powerful and flexible.
To be fair, fork also has its fair share of problems. For example, it doesn't play nice with multithreaded programs: only one thread is created in the new process, and locks are not correctly closed (leading to the necessity of atfork handlers to reset lock states across a fork).
Contrary to all expectations, it's mainly fork that makes process creation so incredibly fast on Unices.
AFAIK, on Linux, the actual process memory is not copied upon fork, the child starts with the same virtual memory mapping as the parent, and pages are copied only where and when the child makes changes. The majority of pages are read-only code anyway, so they are never copied. This is called copy-on-write.
Use cases where copying the parent process is useful:
Shells
When you say cat foo >bar, the shell forks, and in the child process (still the shell) prepares the redirection, and then execs cat foo. The executed program runs under the same PID as the child shell and inherits all open file descriptors. You would not believe how easy it is to write a basic Unix shell.
Daemons (services)
Daemons run in the background. Many of them fork after some initial preparation, the parent exits, and the child detaches from the terminal and remains running in the background.
Network servers
Many networking daemons have to handle multiple connections at the same time. Example sshd. The main daemon runs as root and listens for new connections on port 22. When a new connection comes in it forks a child. The child just keeps the new socket representing that connection, authenticates the user, drops privileges and so on.
Etc
Why fork()? It had nothing to do with C. C was itself only coming into existence at the time. It's because of the way the original UNIX memory page and process management worked, it was trivial to cause a process to be paged out, and then paged back in at a different location, without unloading the first copy of the process.
In The Evolution of the Unix Time-sharing System (http://cm.bell-labs.com/cm/cs/who/dmr/hist.html), Dennis Ritchie says "In fact, the PDP-7's fork call required precisely 27 lines of assembly code." See the link for more.
Threads are evil. With threads, you essentially have a number of processes all with access to the same memory space, which can dance all over each others' values. There's no memory protection at all. See The Art of Unix Programming, Chapter 7 (http://www.faqs.org/docs/artu/ch07s03.html#id2923889) for a fuller explanation.
I have a question about Windows IPC. I implemented IPC with mutex on Windows, but there is a problem when I made the connection with another thread;when the thread terminated, the connection is closed.
The connection thread(A) makes connection to the server
Main thread(B) uses the connection handle(global variable) returned by A
A terminates
B cannot refer the handle any more - because connection is closed
It is natural that mutex is released when the process terminated. However, in the case of thread, I need the way to hold mutex to maintain connection even though the thread terminated, if the process is alive.
Semaphore can be the alternative on Linux, however, on Windows, it is impossible to use semaphor because it cannot sense the abnormal disconnection.
Does someone have any idea?
There is no way to prevent the ownership of a mutex from being released when the thread that owns it exits.
There are a number of other ways you might be able to fix the problem, depending on the circumstances.
1) Can you change any of the code on the client? For example, if the client executable is using a DLL that you have provided to establish and maintain the connection, you could change the DLL so that it uses a more appropriate object (such as a named pipe) rather than a mutex, or you could get the DLL to start its own thread to own the mutex.
2) Is there more than one client? Presumably, since you are using a mutex, you are only expecting one client to connect at a time. If you can safely assume that only one client will be connected at a time, then when the server detects that the mutex has been abandoned, it could close its own handle to the mutex. When the client process exits, the mutex will automatically be deleted, so the server could periodically check to see whether it still exists or not.
3) How is the client communicating with the server? The server is presumably doing something useful for the client, so there must be another communications channel as well as the mutex. For example, if the client is opening a named pipe to the server, you could use that connection instead of the mutex to detect when the client process exits. Or, if the communications channel allows you to determine the process ID of the client, you could open a handle to the process and use that to detect when the client process exits.
4) If no other solution will work, and you are forced to rewrite the client as well as the server, consider using a more appropriate form of IPC, such as a named pipe.
Additional
5) It is common practice to use a process handle to wait for (or test for) process termination. Most often, these handles are the ones generated for the parent when a process is created, but there is no reason not to use a handle generated by OpenProcess. As far as precedent goes, I assure you there is at least as much precedent for using a handle generated by OpenProcess to monitor a client process as there is for using a mutex; it is entirely possible that you are the first person to ever try to use a Windows mutex to detect that a process has exited. :-)
6) Presumably the SQLDisconnect() function is calling ReleaseMutex in order to disconnect from the server. Since it is doing so from a thread that doesn't own the mutex, that won't do anything except return an error code, so there's no reasonable way for your server to detect that happening. Does the function also call CloseHandle on the mutex? If so, you could use the approach in (2) to detect when this happens. This would work both for calls to SQLDisconnect() and when the process exits. It shouldn't matter that there are multiple clients, since they are using different mutexes.
6a) I say "no reasonable way" because you could conceivably use hooking to change the behaviour of ReleaseMutex. This would not be a good option.
7) You should examine carefully what the SQLDisconnect() function does apart from calling ReleaseMutex and/or CloseHandle. It is entirely possible that you can detect the disconnection by some means other than the mutex.
I am developing an application based on one library.
I am facing problem related to communication in between Parent Process and Forked Process from Parent process.
I need to access Function in the library, and pointer for library is in Parent process and i am calling functins of library from Forked process using pointer in Parent Process. Function in Parent process get called from Forked process but currusponding function in Library stays in bloked state as it should be called from Parent Process only not from Forked process.
What should be the solution for this problem.
***Update:
Library which i mentioned is not exatly loaded library it has one class and i have instantiated that class through my Parent Process and then the library will create its own threds and keep running in it,
So when i do call to library through Parent Process it goes through and when i call Library functions using the Parent Pointer through Forked Process it does not do through. It does not segements but it gets bloked in the function call.
After the fork, both your processes have the library loaded and any existing pointers are valid in both of them, but there's no further connection between them in terms of function calls or access to data. If the parent calls a function, the parent will run the function. If the child calls a function, the child will run the function. There's no notion of one process calling a function in another process.
If you want your two processes to communicate, you need to write code to make them do so. Read about interprocess communication to learn about the various ways to do that. One simple option is to call pipe() prior to forking, and after the fork, have the parent close one of the file descriptors and the child close the other. Now you have a way for one process to send messages to the other. Do that twice and you have two-way communication. You can make the parent run a loop that waits for messages from the child via one pipe, acts upon them, and sends back the results via the other pipe.
You don't say what OS you are using, but generally pointers are not valid across processes. Most OSes give each process its own virtual memory space, so address 0x12345678 may be a pointer to something in one process, but not even an available valid address in another.
If the forked process wants to call a function, it will have to gain access to it itself (link or open the library itself, etc.)
If you're trying to share memory across two processes, but the same executable, then you should be using threads instead of forking a separate process. As others have mentioned, forking gives you a separate memory space, so you can't pass a pointer. With threads, you'd share the same memory space.
You have reached the Inter Process Communication (IPC) problem, where one program wants to make another one do something.
Since you fork()ed your child process, it now lives on his own, and to be able to execute a function on the parent process you'll have to figure out a way for them to communicate:
Child : Dad, please execute this function with these arguments, and give me the result in this pointer, please.
The problem is very widely known, you have many solutions, one of which is to design your own IPC language and implement a Remote Procedure Call (RPC) over it.
Now, people have solved the problem before, so you can take a look at some of these things:
IPC Methods
pipes
sockets (unix and network)
message queues
shared memory
RPC protocols
D-Bus
Corba
TPL (not an RPC protocol, but you can build one with it)
I have a count variable that should get counted up by a few processes I forked and used/read by the mother process.
I tried to create a pointer in my main() function of the mother process and count that pointer up in the forked children. That does not work! Every child seems to have it's own copy even though the address is the same in every process.
What is the best way to do that?
Each child gets its own copy of the parent processes memory (at least as soon as it trys to modify anything). If you need to share betweeen processes you need to look at shared memory or some similar IPC mechanism.
BTW, why are you making this a community wiki - you may be limiting responses by doing so.
2 processes cannot share the same memory. It is true that a forked child process will share the same underlying memory after forking, but an attempt to write to this would cause the operating system to allocate a new writeable space for it somewhere else.
Look into another form of IPC to use.
My experience is, that if you want to share information between at least two processes, you almost never want to share just some void* pointer into memory. You might want to have a look at
Boost Interprocess
which can give you an idea, how to share structured data (read "classes" and "structs") between processes.
No, use IPC or threads. Only file descriptors are shared (but not the seek pointer).
You might want to check out shared memory.
the pointers are always lies in the same process. It's private to the process, relative to the process's base address. There different kind of IPC mechanisms available in any operating systems. You can opt for Windows Messaging, Shared memory, socket, pipes etc. Choose one according to your requirement and size of data. Another mechanism is to write data in target process using Virtual memory APIs available and notify the process with corresponding pointer.
One simple option but limited form of IPC that would work well for a shared count is a 'shared data segment'. On Windows this is implemented using the #pragma data_seg directive.
See this article for an example.