I am studying for my final in systems programming and I have a few questions that I cannot answer.
Say a parent process forks off a child process that creates a large object. Can the child pass this object back to the parent fairly easily using just signals?
A parent process forks off a child process, and the child continues running the same program. Is the best way for the parent to give a data structure to a child that was created before the fork to write to a separate file and give that to the child? (This one I am thinking it is not a good way, because the child will still share some of the parents data, including said data structure)
Let us say you write a program to measure how quick a
person's fingers are by trapping SIGINT and then asking them to press
Ctrl-C as rapidly as possible. The SIGINT signal handler increments a
global counter every time Ctrl-C is typed. After a predefined time it
stops and prints the global counter divided by the time used.
What is a fundamental problem with this program?
Any help is appreciated.
Some quick thoughts on your questions;
No, signals are not good for transferring data. Signals involve a lot of overhead and are not queued very effectively.
Many methods of IPC are available. The two that are most popular for UNIX are sockets and shared memory (see shm for instance). Sockets are generally better when talking to un-trusted applications. In your example of forking an application pipes would be applicable as well.
As long as you can handle the interrupt much faster than they are coming in you are OK. Probably in the case with your ctrl-c example you could do the same thing using poll and fcntl (on UNIX) and you would be likely to get better precision.
Related
I know the answer to "why is it this way" is because the language was invented so, but it seems like a lot of wasted effort that fork() spawns a copy of the process that called it. Perhaps it is useful sometimes, but surely the majority of time someone wants to start a new process its not to be a duplicate of the calling one? Why does fork create an identical process and not an empty one or one defined by passing an argument?
From yolinux
The fork() system call will spawn a new child process which is an
identical process to the parent except that has a new system process
ID
In other words when is it useful to start with a copy of the parent process?
One big advantage of having the parent process duplicated in the child is that it allows the parent program to make customizations to the child process' environment before executing it. For example, the parent might want to read the child process' stdout, in which case it needs to set up the pipes in order to allow it to read that before execing the new program.
It's also not as bad as it sounds, efficiency wise. The whole thing is implemented on Linux using copy-on-write semantics for the process' memory (except in the special cases noted in the man page):
Under Linux (and in most unices since version 7, parent of all unices alive now), fork() is implemented using copy-on-write pages, so the only
penalty that it incurs is the time and memory required to duplicate the
parent's page tables (which can be also copy-on-write), and to create a unique task structure for the child.
There are some very legitimate uses of the fork system call. Here are a few examples:
Memory saving. Because fork on any modern UNIX/Linux system shares memory between the child and parent (via copy-on-write semantics), a parent process can load some static data which can be instantly shared to a child process. The zygote process on Android does this: it preloads the Java (Dalvik) runtime and many classes, then simply forks to create new application processes on demand (which inherit a copy of the parent's runtime and loaded classes).
Time saving. A process can perform some expensive initialization procedure (such as Apache loading configuration files and modules), then fork off workers to perform tasks which use the preloaded initialization data.
Arbitrary process customization. On systems that have direct process creation methods (e.g. Windows with CreateProcess, QNX with spawn, etc., these direct process creation APIs tend to be very complex since every possible customization of the process has to be specified in the function call itself. By contrast, with fork/exec, a process can just fork, perform customizations via standard system calls (close, signal, dup, etc.) and then exec when it's ready. fork/exec is consequently one of the simplest process creation APIs in existence, yet simultaneously one of the most powerful and flexible.
To be fair, fork also has its fair share of problems. For example, it doesn't play nice with multithreaded programs: only one thread is created in the new process, and locks are not correctly closed (leading to the necessity of atfork handlers to reset lock states across a fork).
Contrary to all expectations, it's mainly fork that makes process creation so incredibly fast on Unices.
AFAIK, on Linux, the actual process memory is not copied upon fork, the child starts with the same virtual memory mapping as the parent, and pages are copied only where and when the child makes changes. The majority of pages are read-only code anyway, so they are never copied. This is called copy-on-write.
Use cases where copying the parent process is useful:
Shells
When you say cat foo >bar, the shell forks, and in the child process (still the shell) prepares the redirection, and then execs cat foo. The executed program runs under the same PID as the child shell and inherits all open file descriptors. You would not believe how easy it is to write a basic Unix shell.
Daemons (services)
Daemons run in the background. Many of them fork after some initial preparation, the parent exits, and the child detaches from the terminal and remains running in the background.
Network servers
Many networking daemons have to handle multiple connections at the same time. Example sshd. The main daemon runs as root and listens for new connections on port 22. When a new connection comes in it forks a child. The child just keeps the new socket representing that connection, authenticates the user, drops privileges and so on.
Etc
Why fork()? It had nothing to do with C. C was itself only coming into existence at the time. It's because of the way the original UNIX memory page and process management worked, it was trivial to cause a process to be paged out, and then paged back in at a different location, without unloading the first copy of the process.
In The Evolution of the Unix Time-sharing System (http://cm.bell-labs.com/cm/cs/who/dmr/hist.html), Dennis Ritchie says "In fact, the PDP-7's fork call required precisely 27 lines of assembly code." See the link for more.
Threads are evil. With threads, you essentially have a number of processes all with access to the same memory space, which can dance all over each others' values. There's no memory protection at all. See The Art of Unix Programming, Chapter 7 (http://www.faqs.org/docs/artu/ch07s03.html#id2923889) for a fuller explanation.
I'm taking my first steps in GTK+ (C++ and gtkmm more specificaly) and I have a rather conceptual doubts about how to best structure my program. Right now I just want my GUI to show what is happening in my C++ program by printing several values, and since my main thread is halted while the GUI window is running, I've come across solutions that separated both the processing/computing operations and the graphical interface in separate threads. Is this commonly accepted as the best way to do it, not at all, or not even relevant?
Unless you have a good reason, you are generally better off not creating new threads. Synchronization is hard to get right.
GUI programming is event driven (click on a button and something happens). So you will probably need to tie your background processing into the GUI event system.
In the event that your background processing takes a long time, you will need to break it into a number of fast chunks. At the end of each chunk, you can update a progress bar and schedule the next chunk.
This will mean you will need to probably use some state machine patterns.
Also make sure that any IO is non-blocking.
Here's an example of lengthy operation split in smaller chunks using the main loop without additional threads. Lazy Loading using the main loop.
Yes, absolutely! (in response to your title)
The GUI must be run in a separate thread. If you have ever come across those extremely annoying interfaces that lock up while an operation is in progress1, you'd know why it's very important to have the GUI always running regardless of operation happening.
It's a user experience thing.
1 I don't mean the ones that disable some buttons during operation (that's normal), but the ones that everything seems frozen.
This is the reverse: the main thread should be the Gtk one, and the long processing/computing tasks should be done in threads.
The documentation gives a clear example:
https://pygobject.readthedocs.io/en/latest/guide/threading.html
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
fork in multi-threaded program
If I have an application which employs fork() and might be developed as multithreaded, what are the thumb rules/guidelines to consider to safely program this kind of applications?
The basic thumb rules, according to various internet articles like ( http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them , fork in multi-threaded program ) are:
(Main) Process[0] Monothread --> fork() --> (Child) Process[1] Multithreaded: OK!
If Process[1] crashes or messes around with memory it won't touch address space of Process[0] (unless you use shared R/W memory... but this is another topic of its own).In Linux by default all fork()ed memory is Copy On Write. Given that Process[0] is monothreaded, when we invoke fork() all possible mutual exclusion primitives should be generally in an unlocked state.
(Main) Process[0] Multithreaded --> fork() --> (Child) Process[1] Mono/Multithread: BAD!
If you fork() a Multithreaded process your mutexes and many other thread synchronization primitives will likely be in an undefined state in Process[1]. You can work around with pthread_atfork() but if you use libraries you might as well roll a dice and hope to be lucky. Because generally you don't (want to) know the implementation details of libraries.
The advantages of fork() into a multithreaded process are that you could manipulate/read/aggregate your data quicker (in the Child process), without having to care about stability of the process you fork() from (Main). This is useful if your main process has a dataset of a lot of memory and you don't want to duplicate/reload it to safely process the data in another process (Child). This way the original process is stable and independent from the data aggregation/manipulation process (fork()ed).
Of course this means that the original process will generally be slower than it might be if developed in multithreaded fashion. But again, this is the price you might want to be paying for more stability.
If instead your main process is multithreaded, refrain from using fork(). It's going to be a proper mess to implement it in a stable way.
Cheers
On Linux, threads are implemented in terms of processes. In other words, threads are really just a fork() with mostly shared memory, instead of completely copy-on-write memory. What this means, is that when you use fork() in a thread (main or other), you end up copying the entire shared memory space of all of the threads, and the thread specific storage of the thread you call fork() from.
Now all of this sounds good, but that doesn't mean that this is what will happen or work well. If you want to make a cloned process, try to do a fork before starting any other threads, and then use read-only virtual memory to keep the forked process up to date with current memory values.
So although it may work, I just suggest testing, and try to find another way first. And be prepared for a lot of:
Segmentation fault
I am making a server and I use fork() for creating child processes but I have doubts about that. Here are some:
Why do you need to close main socket in a child process and new connection accepted socket in the parent process? (after accepting a new connection) I thought that sockets are only integers with some id which are used to access opened sockets in some system-wide object which is only acceptable through system functions calls. In that case fork would only copy the integer but would have no affect on socket opened.
I have checked and found out that if I fork a process inside a class method, all members are copied. Well, I have found out it is copy-on-edit so does it mean that my server class will be copied in each child that uses a non-constant function? How do I make some memory shared between all such processes (like a list of tasks and each child puts something into it while the parent is reading stuff from it)? I guess fork is not the right function for that. What is the best way?
P.S. I am pretty sure I know the answer to the second question, which is clone(), but just wanted to make sure that's the right function.
Sockets in Unix are File descriptors and they are indeed integers, as seen by the user, but they really are indexes into a table that the kernel maintains per process. In this table each file descriptor (FD) refers to an open file description (OFD) that are system-wide objects maintained in kernel. When you do fork() the opened file descriptors are duplicated and both child's and parent's point to the same OFD. Having two FDs that refer to the same OFD is not usually a problem, but particularly with sockets they can have subtle problems, as the connection is closed only when you close all the FDs that refer to it.
You should really consider using threads (do not close the sockets if you use threads!). clone is a linux system call and is not intended to be used directly. Your alternative is to use shared memory but it is kind of more complex.
The int is a handle, but the socket itself is still associated with the process. The child closes the listening socket mainly for safety reasons (it doesn't need it, and if the child ever spawns another process, that process would inherit the socket as well); the server process closes the new connection's socket because otherwise the connection would remain open until the server process exits (the socket exists as long as at least one process still has a handle to it).
You either want multithreading or a proper shared memory approach. This is where the fun begins.
Shared memory between independent processes comes with interesting problems, but also provides otherwise impossible capabilities (for example, you can restart the master server process and leave the processes serving open connections running, which is difficult to get right as two different versions of the service then have to talk to each other, but allows seamless upgrades without disconnecting clients or interrupting service).
Sharing memory between threads is relatively easy, but threads share the same set of file descriptors, so you do not win much here.
Last, there is a third alternative: an event loop watching multiple sockets, giving attention to each only if something is actually happening. Look at the documentation for the select and poll functions.
Forking duplicates file descriptors, so you have to close the duplicate.
Forking also effectively copies all memory (though in practice it's copy-on-write, so it's not very expensive). You make a new process which is entirely separate from the parent process, unless you explicitly set up some shared memory.
Maybe you intended to spawn a new thread rather than forking a new process?
I think you might want to look through this book as a reference on fork().
Yes you do need to close the socket bound to listen in the child and accepted socket in the parent. The integers aka file handles point to real structures see this so unless you want the kernel to dump a new connection on a child or parent being able to send the data to the connected client you might want to prevent this outright.
To share data for between the processes the best way is shared memory. The book I referred you to will have quite a bit of information regarding that too. In general if you need to share memory without shared memory then you might want to look at threads.
P.S. I'm not sure which clone() method you are referring to. Object copying is done via copy constructors.
I have a boost threadpool which I use to do certain tasks. I also have a Sensor class that has the pure virtual function doWork(int total) = 0;. Whenever it is requested, my main process gets the necessary Sensor pointer and tells the threadpool to run Sensor::doWork(int total).
threadpool->schedule(boost::bind(&Sensor::doWork,this,123456));
I am dynamically loading libraries of type Sensor, thus it is out of my control if someone else has faulty coding which results in SEGFAULTS and such. So is there a way for me to (in my main process) handle any errors thrown by Sensor::doWork(int total), clean up the thread, delete that sensor object and notify the console what and where the error has occurred?
Really the only way to handle a segmentation fault here is to run Sensor::doWork in a completely separate process.
In UNIX, this involves using fork (or some other similar means), running Sensor::doWork in the child process, and then somehow shuttling the results back to the parent process.
I assume similar means are available in Windows.
EDIT: I thought I'd flesh out a bit some of the things you can do.
Solution #1: you can work with processes in the same fashion as you would threads. For example, you could create process pool that sit there in a loop of
Wait for a task to be passed in over a pipe or queue or some similar object
Perform the task
Return the results over a pipe or queue or some similar object
And since you're executing the tasks in the other processes, you're protected against them crashing. The main difficulty with this solution is actually communicating between processes; maybe boost's interprocess library will help with that. I've mainly done this sort of thing in python, which has a standard multiprocessing module that handles this stuff for you.
Solution #2: You could divide your application into "safe" and "risky" portions that run in different processes. The "risky" portion executes the Sensor::doWork methods and anything else you might want to do in that process -- but only work that is acceptable to be spontaneously lost if it crashes. The "safe" portion deals with any precious information that you cannot afford to lose, and monitors the "risky" portion, performing some recovery operations when the child crashes. And, of course, whatever other work you decide you want to do in the safe part.
If you got a SIGSEGV, even if you caught it you have no guarantee about your program state so there's pretty much no way to recover.
If you're working with 3rd party libraries, and they're buggy, and the library maintainer won't fix it (and you don't have the source) then your only recourse is to run the third party library from within a totally separate binary that talks to the main binary by some means. See for example firefox and plugin-container.
You might want to register a function callback to catch SIGSEV. In C this can be done using signal. Be aware, however, there is not much you can do, when the OS sends you a SIGSEV (note that it isn't required to). You don't really know in what state your program is in, I'd guess. If for example the heap got corrupt, new and delete operations may fail, so even a plain simple
std::cout << std::string("hello world") << std::endl;
statement, might not work since memory from the heap needs to be allocated.
Best, Christoph