how does this fork() work - c++

can you tell me, why is output of this program this:
1
2
2
5
5
3
4
5
4
5
3
4
5
4
5
And quick explanation why is that like this? Thanks
main()
{
printf("1\n");
fork();
printf("2\n");
if(fork()==0)
{
printf("3\n");
fork();
printf("4\n");
}
printf("5\n");
}

The output of your program, assuming no calls to fork fail, should be thought of like this:
1
2 2
3 3
4 4 4 4
5 5 5 5 5 5
Each column represents the output of one process. They all get serialized onto stdout in some random order, subject only to the following constraints: within a column, each character cannot appear before the character immediately above it; the topmost character in each column cannot appear before the character above and to the left of it.
Note that right now your program is relying on the C library noticing that stdout is a terminal and therefore setting it to line-buffered. If you run the program with stdout redirected to a file or pipe you are likely to get rather different output, e.g.
$ ./a.out | tr '\n' ' '
1 2 5 1 2 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
... because in that case all of the output is buffered until returning from main, and the buffers are copied into each child process. Adding
setvbuf(stdout, 0, _IONBF, 0);
before the first printf statement will prevent duplication of output. (In this case you could get away with _IOLBF instead, but _IONBF is safer for code like this.)

It would be a little easier to show graphically, but everytime you call fork() you have another process continue through the same code. So:
Process 1 (original process): prints 1, then creates Process 2, prints 2, then creates Process 3 but doesn't return 0, and prints 5.
Process 2: prints 2, then creates Process 4 but doesn't return 0, and prints 5.
Process 3: prints 3, then creates Process 5, prints 4, prints 5
Process 4: prints 3, then creates Process 6, prints 4, prints 5
Process 5: prints 4, prints 5
Process 6: prints 4, prints 5
But they are all happening in similar time so that why you get all those numbers.
Hope that helps. First time answering!

See in some flavor suppose in fedora parent get chance first to execute and then child but in other like ubuntu child get first preference on the basis of that you will see the out put. No relation with printf function in this scope but we can predict how many times the body of the method will execute here I am attaching one image may it is helpful for you.
Here when 1 print only one process. After execution of first fork then two different process each is having fork but inside your if statement. So one again creates two process but only one will get the chance to enter in the if body. Again fork will execute and again new process will generates. The formula is total number of process=2*n. Where n is the number of fork() method inside your function. So total six methods will have some condition to printing any number like 2,3,4 but 5 is common to all so 5 will print six times.
May be my post helpful for you
Thanks
asif aftab

Its since you're not always testing for the fork() calls result. The zero result path will remain the parent process and the else part will be executed as child process. As you're not testing that every code following a fork(); call will be duplicated (and executed) in both processes.

the output is not deterministic due to the order of execution and inheritance of the output buffers, a way to be deterministic is with
#include <stdlib.h>
#include <stdio.h>
main()
{
printf("1\n");
fflush(stdout);
if (fork()) wait(0);
printf("2\n");
fflush(stdout);
if(fork()==0)
{
printf("3\n");
fflush(stdout);
if (fork()) wait(0);
printf("4\n");
} else wait();
printf("5\n");
}

Using fork() we create a child process, and there is no execution pattern gurantee for either parent or child, as discussed here.
If you want to have an execution pattern, its better to put a check on fork() for the parent [pid is not 0] or child [pid is 0] and make either of them to sleep so that scheduler puts the other one on execution.
You can find more information here.

Micheal, I would need to see the code inside of the fork() method to know for sure, but being that it is printing numbers extra numbers, the only possible explanation that I can think of is that your fork() method might have print methods of its own.
Robin

Related

Niether 'MPI_Barrier' nor 'BLACS_Barrier' doesn't stop a processors executing its commands

I'm working on ScaLAPACK and trying to get used to BLACS routines which is essential using ScaLAPACK.
I've had some elementary course on MPI, so have some rough idea of MPI_COMM_WORLD stuff, but has no deep understanding on how it works internally and so on.
Anyway, I'm trying following code to say hello using BLACS routine.
program hello_from_BLACS
use MPI
implicit none
integer :: info, nproc, nprow, npcol, &
myid, myrow, mycol, &
ctxt, ctxt_sys, ctxt_all
call BLACS_PINFO(myid, nproc)
! get the internal default context
call BLACS_GET(0, 0, ctxt_sys)
! set up a process grid for the process set
ctxt_all = ctxt_sys
call BLACS_GRIDINIT(ctxt_all, 'c', nproc, 1)
call BLACS_BARRIER(ctxt_all, 'A')
! set up a process grid of size 3*2
ctxt = ctxt_sys
call BLACS_GRIDINIT(ctxt, 'c', 3, 2)
if (myid .eq. 0) then
write(6,*) ' myid myrow mycol nprow npcol'
endif
(**) call BLACS_BARRIER(ctxt_sys, 'A')
! all processes not belonging to 'ctxt' jump to the end of the program
if (ctxt .lt. 0) goto 1000
! get the process coordinates in the grid
call BLACS_GRIDINFO(ctxt, nprow, npcol, myrow, mycol)
write(6,*) 'hello from process', myid, myrow, mycol, nprow, npcol
1000 continue
! return all BLACS contexts
call BLACS_EXIT(0)
stop
end program
and the output with 'mpirun -np 10 ./exe' is like,
hello from process 0 0 0 3 2
hello from process 4 1 1 3 2
hello from process 1 1 0 3 2
myid myrow mycol nprow npcol
hello from process 5 2 1 3 2
hello from process 2 2 0 3 2
hello from process 3 0 1 3 2
Everything seems to work fine except that 'BLACS_BARRIER' line, which I marked (**) in the code's leftside.
I've put that line to make the output like below whose title line always printed at the top of the it.
myid myrow mycol nprow npcol
hello from process 0 0 0 3 2
hello from process 4 1 1 3 2
hello from process 1 1 0 3 2
hello from process 5 2 1 3 2
hello from process 2 2 0 3 2
hello from process 3 0 1 3 2
So the question goes,
I've tried BLACS_BARRIER to 'ctxt_sys', 'ctxt_all', and 'ctxt' but all of them does not make output in which the title line is firstly printed. I've also tried MPI_Barrier(MPI_COMM_WORLD,info), but it didn't work either. Am I using the barriers in the wrong way?
In addition, I got SIGSEGV when I used BLACS_BARRIER to 'ctxt' and used more than 6 processes when executing mpirun. Why SIGSEGV takes place in this case?
Thank you for reading this question.
To answer your 2 questions (in future it is best to give then separate posts)
1) MPI_Barrier, BLACS_Barrier and any barrier in any parallel programming methodology I have come across only synchronises the actual set of processes that calls it. However I/O is not dealt with just by the calling process, but at least one and quite possibly more within the OS which actually the process the I/O request. These are NOT synchronised by your barrier. Thus ordering of I/O is not ensured by a simple barrier. The only standard conforming ways that I can think of to ensure ordering of I/O are
Have 1 process do all the I/O or
Better is to use MPI I/O either directly, or indirectly, via e.g. NetCDF or HDF5
2) Your second call to BLACS_GRIDINIT
call BLACS_GRIDINIT(ctxt, 'c', 3, 2)
creates a context for 3 by 2 process grid, so holding 6 process. If you call it with more than 6 processes, only 6 will be returned with a valid context, for the others ctxt should be treated as an uninitialised value. So for instance if you call it with 8 processes, 6 will return with a valid ctxt, 2 will return with ctxt having no valid value. If these 2 now try to use ctxt anything is possible, and in your case you are getting a seg fault. You do seem to see that this is an issue as later you have
! all processes not belonging to 'ctxt' jump to the end of the program
if (ctxt .lt. 0) goto 1000
but I see nothing in the description of BLACS_GRIDINIT that ensures ctxt will be less than zero for non-participating processes - at https://www.netlib.org/blacs/BLACS/QRef.html#BLACS_GRIDINIT it says
This routine creates a simple NPROW x NPCOL process grid. This process
grid will use the first NPROW x NPCOL processes, and assign them to
the grid in a row- or column-major natural ordering. If these
process-to-grid mappings are unacceptable, BLACS_GRIDINIT's more
complex sister routine BLACS_GRIDMAP must be called instead.
There is no mention of what ctxt will be if the process is not part of the resulting grid - this is the kind of problem I find regularly with the BLACS documentation. Also please don't use goto, for your own sake. You WILL regret it later. Use If ... End If. I can't remember when I last used goto in Fortran, it may well be over 10 years ago.
Finally good luck in using BLACS! In my experience the documentation is often incomplete, and I would suggest only using those calls that are absolutely necessary to use ScaLAPACK and using MPI, which is much, much better defined, for the rest. It would be so much nicer if ScaLAPACK just worked with MPI nowadays.

Learning about multithreading. Tried to make a prime number finder

I'm studying for a uni project and one of the requirements is to include multithreading. I decided to make a prime number finder and - while it works - it's rather slow. My best guess is that this has to do with the amount of threads I'm creating and destroying.
My approach was to take the range of primes that are below N, and distribute these evenly across M threads (where M = number of cores (in my case 8)), however these threads are being created and destroyed every time N increases.
Pseudocode looks like this:
for each core
# new thread
for i in (range / numberOfCores) * currentCore
if !possiblePrimeIsntActuallyPrime
if possiblePrime % i == 0
possiblePrimeIsntActuallyPrime = true
return
else
return
Which does work, but 8 threads being created for every possible prime seems to be slowing the system down.
Any suggestions on how to optimise this further?
Use thread pooling.
Create 8 threads and store them in an array. Feed it new data each time one ends and start it again. This will prevent them from having to be created and destroyed each time.
Also, when calculating your range of numbers to check, only check up to ceil(sqrt(N)) as anything after that is guaranteed to either not go into it or the other corresponding factor has already been checked. i.e. ceil(sqrt(24)) is 5.
Once you check 5 you don't need to check anything else because 6 goes into 24 4 times and 4 has been checked, 8 goes into it 3 times and 3 has been checked, etc.

the meaning for exit status of the child process

Have some code like this:
unsigned pid = waitpid(mPid, &status, WNOHANG);
mExitStatus = WEXITSTATUS(status);
Get the debug print for the variable like:
mExitStatus = 15
status = 3840
For "mExitStatus = WEXITSTATUS(status)", I got following statment which explains
evaluates to the least significant eight bits of the return code of the child which terminated
3840 = F00; F is 15 which is assigned to mExitStatus
But question is how I can use this 15 to judge whether if the child process is terminated correctly or not?
15 is from 3840. But 3840 is return by linux process? Any meaning for that?
In a general description, my main started 4 child_process running 4 tests. I would like to judge in my main if those 4 tests are passed or not. So I think I need to jude on the exit status of my child process.
Thanks
The standard is that an exit status of zero means "success" and anything else is some sort of failure. On *nix systems, a value from 129 to 150 or so can usually be interpreted as "the process was terminated due to a signal" and the signal number is the return value minus 128. A generic failure often returns 1, but sometimes 2 or 3 or some other small number.
In the end, what a program returns is completely up to the program, but those are the typical values.

Need help understanding fork C++

I am trying to create a program that uses fork to create 4 processes, which I am understanding to be 2 parents and 2 children.
The way I have set up my code is:
for(int i = 0; i < 2; ++i){
pid_t pid1 = fork();
switch(pid1){
case -1:
fatal("fork failed");
break;
case 0:
child(i);
break;
default:
parent(i);
break;
}
}
In child() and parent() respectively, I am calling getpid(). Inside child(), I exit(0) once I am done. Inside parent() I wait for the child using wait(0) When I run the program, it outputs 2 different children pids and 2 same parent pids. Is this happening because I am calling the same fork() twice?
Process 1 calls fork for 1st loop iteration, creating process 1.1.
Then process 1 calls fork again for 2nd loop iteration, creating process 1.2.
Then process 1.1 (which is essentially process 1 duplicated when fork was done) also enters 2nd loop iteration, creating process 1.1.1.
So processes 1.1 and 1.2 have same parent, process 1. And there are total 4 processes (1, 1.1, 1.2, 1.1.1).
Note that steps 2 and 3 may happen in different order, depending on how OS decides to schedule processes.
Because you have used exit function in child(i) function, child will exit and hence only parent process will continue it's execution in for loop. So you only get 2 new process forked which are childs of same parent . So parent id stays same but since two childs are created, so get 2 distinct child pids! If you want four process to be forked, then you will have to erase the exit statement from child(i) function and use if else statements after each fork() call.

Is a process can send to himself data? Using MPICH2

I have an upper triangular matrix and the result vector b.
My program need to solve the linear system:
Ax = b
using the pipeline method.
And one of the constraints is that the number of process is smaller than the number of
the equations (let's say it can be from 2 to numberOfEquations-1).
I don't have the code right now, I'm thinking about the pseudo code..
My Idea was that one of the processes will create the random upper triangular matrix (A)
the vector b.
lets say this is the random matrix:
1 2 3 4 5 6
0 1 7 8 9 10
0 0 1 12 13 14
0 0 0 1 16 17
0 0 0 0 1 18
0 0 0 0 0 1
and the vector b is [10 5 8 9 10 5]
and I have a smaller amount of processes than the number of equations (lets say 2 processes)
so what I thought is that some process will send to each process line from the matrix and the relevant number from vector b.
so the last line of the matrix and the last number in vector b will be send to
process[numProcs-1] (here i mean to the last process (process 1) )
than he compute the X and sends the result to process 0.
Now process 0 need to compute the 5 line of the matrix and here i'm stuck..
I have the X that was computed by process 1, but how can the process can send to himself
the next line of the matrix and the relevant number from vector b that need to be computed?
Is it possible? I don't think it's right to send to "myself"
Yes, MPI allows a process to send data to itself but one has to be extra careful about possible deadlocks when blocking operations are used. In that case one usually pairs a non-blocking send with blocking receive or vice versa, or one uses calls like MPI_Sendrecv. Sending a message to self usually ends up with the message simply being memory-copied from the source buffer to the destination one with no networking or other heavy machinery involved.
And no, communicating with self is not necessary a bad thing. The most obvious benefit is that it makes the code more symmetric as it removes/reduces the special logic needed to handle self-interaction. Sending to/receiving from self also happens in most collective communication calls. For example, MPI_Scatter also sends part of the data to the root process. To prevent some send-to-self cases that unnecessarily replicate data and decrease performance, MPI allows in-place mode (MPI_IN_PLACE) for most communication-related collectives.
Is it possible? I don't think it's right to send to "myself"
Sure, it is possible to communicate with oneself. There is even a communicator for it: MPI_COMM_SELF. Talking to yourself is not too uncommon.
Your setup sounds like you would rather use MPI collectives. Have a look at MPI_Scatter and MPI_Gather and see if they don't provide you with the functionality, you are looking for.