MPI: Process 0 executing its code twice

MPI: Process 0 executing its code twice - c++

I'm having a weird problem with an MPI program. Part of the code is supposed to be executed by the root (process zero) only, but process zero seems to execute it twice. For example,
root = 0;
if (rank == root) {
cout << "Hello from process " << rank << endl;
}
gives
Hello from process 0
Hello from process 0
This seems to only happen when I use 16 or more processes. I've been trying to debug this for quite a few days but couldn't.
Since I don't know why this is happening, I think I have to copy my entire code here. I made it nice and clear. The goal is to multiply two matrices (with simplifying assumptions). The problem happens in the final if block.
#include <iostream>
#include <cstdlib>
#include <cmath>
#include "mpi.h"
using namespace std;
int main(int argc, char *argv[]) {
if (argc != 2) {
cout << "Use one argument to specify the N of the matrices." << endl;
return -1;
}
int N = atoi(argv[1]);
int A[N][N], B[N][N], res[N][N];
int i, j, k, start, end, P, p, rank;
int root=0;
MPI::Status status;
MPI::Init(argc, argv);
rank = MPI::COMM_WORLD.Get_rank();
P = MPI::COMM_WORLD.Get_size();
p = sqrt(P);
/* Designate the start and end position for each process. */
start = rank * N/p;
end = (rank+1) * N/p;
if (rank == root) { // No problem here
/* Initialize matrices. */
for (i=0; i<N; i++)
for (j=0; j<N; j++) {
A[i][j] = N*i + j;
B[i][j] = N*i + j;
}
cout << endl << "Matrix A: " << endl;
for(i=0; i<N; ++i)
for(j=0; j<N; ++j) {
cout << " " << A[i][j];
if(j==N-1)
cout << endl;
}
cout << endl << "Matrix B: " << endl;
for(i=0; i<N; ++i)
for(j=0; j<N; ++j) {
cout << " " << B[i][j];
if(j==N-1)
cout << endl;
}
}
/* Broadcast B to all processes. */
MPI::COMM_WORLD.Bcast(B, N*N, MPI::INT, 0);
/* Scatter A to all processes. */
MPI::COMM_WORLD.Scatter(A, N*N/p, MPI::INT, A[start], N*N/p, MPI::INT, 0);
/* Compute your portion of the final result. */
for(i=start; i<end; i++)
for(j=0; j<N; j++) {
res[i][j] = 0;
for(k=0; k<N; k++)
res[i][j] += A[i][k]*B[k][j];
}
MPI::COMM_WORLD.Barrier();
/* Gather results form all processes. */
MPI::COMM_WORLD.Gather(res[start], N*N/p, MPI::INT, res, N*N/p, MPI::INT, 0);
if (rank == root) { // HERE is the problem!
// This chunk executes twice in process 0
cout << endl << "Result of A x B: " << endl;
for(i=0; i<N; ++i)
for(j=0; j<N; ++j) {
cout << " " << res[i][j];
if(j == N-1)
cout << endl;
}
}
MPI::Finalize();
return 0;
}
When I run the program with P = 16 and two 4x4 matrices:
>$ mpirun -np 16 ./myprog 4
Matrix A:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Matrix B:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Result of A x B:
6366632 0 0 0
-12032 32767 0 0
0 0 -1431597088 10922
1 10922 0 0
Result of A x B:
56 62 68 74
152 174 196 218
248 286 324 362
344 398 452 506
Why is it printing out that first result?
I would really appreciate if someone was willing to help me.

You have undefined behavior / you are corrupting your memory. Let's assume your example with N=4,P=16,p=4. Therefore start=rank.
What do you do when you Scatter? You send 4 elements each to 16 processes. MPI will be assuming A on the root contains 64 elements, but it only contains 16. Further you store them at all ranks in A[start]. I don't even know if that is exactly defined, but it should be equal to A[start][0], which is out of the allocated memory for A when rank >= 4. So you already read and write to invalid memory. The wildly invalid memory accesses continue in the loop and Gather.
Unfortunately, MPI programs can be difficult to debug, especially with respect to memory corruption. There is very valuable info for OpenMPI. Read the entire page! mpirun -np 16 valgrind ... would have told you about the issue.
Some other notable issues:
The C++ bindings of MPI have been deprecated for years. You should
either use the C bindings in C++ or a high level binding such as
Boost.MPI.
Variable-length arrays are not standard C++.
You don't need a Barrier before the Gather.
Make sure your code is not full of unchecked assumptions. Do assert that P is square, if you need it to be, that N is divisible by p, if you need it to be.
Never name two variables P and p.
Now I am struggling as to what I should recommend you in addition to using debugging tools. If you need a fast parallel matrix multiplication - use a library. If you want to write nice high-level code as an exercise - use boost::mpi and some high-level matrix abstraction. If you want to write low-level code as an exercise - use std::vector<>(N*N), build your own 2D-index and think carefully how to index it and how to access the correct chunks of memory.

Related

Can someone explain this output? (C++)

Hello can someone please explain the output of the following C++ code espacially the numbers after the output of the first one 43211223334444
Here is the code:
void rek(int i) {
if (i > 0){
cout << i;
rek(i-1);
for (int k=0; k < i; k++)
cout << i; }
}
int main(){
rek(4);
return 0;
}
Here is the output:
the output is: 43211223334444

All the trouble comes from the fact that your output didn't introduce any separation in values cout << i; prints.
You actually getting the following at first:
4, 3, 2, 1
We got this out of these two statements:
// ...
cout<< i;
rek(i-1);
// ...
Where each call to the rek(i) prints its current i and then calls rek(i-1), which in turn before proceed through the function body has to print its i-1 and then call rek((i-1)-1)... And so on, until it hits the i < 0 case.
The remaining part is brought to you by the output of 1 one time, 2 two times, 3 three times, and 4 four times. All because of the for loop:
for (int k=0; k < i; k++) // print the value of i, i times
cout << i;
Thus, you essentially have the following:
4
3
2
1 // We hit base case on next rek() call, so no print out of there
1 // Here the for loop comes in
2 2
3 3 3
4 4 4 4
By the way, please note that the placement of brace(s) in your code is somewhat counterintuitive.

We can write the dynamic stack development down manually. Below I have pseudo code with an indentation of four per stack level. The commands are on the left, the resulting output, in that order, on the right. I have replaced i with the respective number to make even more transparent what's going on.
rek(4)
cout << 4; 4
rek(3)
cout << 3; 3
rek(2)
cout << 2; 2
rek(1)
cout << 1; 1
rek(0)
if(i>0) // fails, rek(0) returns
rek(1) continues
for (int k=0; k < 1; k++)
cout << 1; 1
rek(1) returns
rek(2) continues
for (int k=0; k < 2; k++)
cout << 2; 2
cout << 2; 2
rek(2) returns
rek(3) continues
for (int k=0; k < 3; k++)
cout << 3; 3
cout << 3; 3
cout << 3; 3
rek(3) returns
rek(4) continues
for (int k=0; k < 4; k++)
cout << 4; 4
cout << 4; 4
cout << 4; 4
cout << 4; 4
rek(4) returns
main() continues
return 0;

Lets break your output string: "43211223334444" into 2 parts:
Part 1: "4321"
This is the result of
cout << i;
rek(i-1);
You are printing i and recursively calling same function by passing the (i-1) as argument and function performs the operation till (i > 0).
This prints the numbers from 4 to 1, i.e. "4321"
Part 2 "1223334444"
This is the result of code section:
for (int k=0; k < i; k++)
cout << i;
This section gets called in reverse order from number 1 to 4.
This code section basically prints the number i, for the i times.
For i=1 it prints: 1
For i=2 it prints: 22
For i=3 it prints: 333
For i=4 it prints: 4444
That makes the string: "1223334444"
Hope this explains you the total output string: "43211223334444"

you are doing recursive function and then a for-loop
and you're function is going deep in the recursion and then the for loop is working from the deepest point of the recursion that's why you see this output

You start with going "down in the recursive tree", first you complete printing the numbers from i=4 to i=1 and therefore you print: 4321.
After that you go "up the recursive tree" and start printing from i=1 to i=4 using the loop you wrote, therefore the order is now for i=1 you print 1 and for i=2 you print 22 which result in printing 1223334444 at the end of the line.

Why does the code throw illegal memory access error

I want to know why the error appears when the code just gets into the function '
sort'
I made some check points using standard output. So I know where the error occurs.
I use repl.it to build this code
...
/*return pivot function*/
int partition(...){
...
}
void sort(vector<int> array, int left, int right){\
/*********"sort start" string dose not appear in console***********/
cout << "sort start";
// one element in array
if(left == right);
// two elements in array
else if( left +1 == right){
if(array.at(0) > array.at(1)){
int temp;
swap(array.at(0),array.at(1),temp);
}
}
// more then 3 elements in array
else{
int p = partition(array,left,right);
sort(array,left,p-1);
sort(array,p+1,right);
}
}
int main() {
vector<int> array;
array.push_back(1);
array.push_back(2);
array.push_back(3);
array.push_back(4);
cout << "array is ";
for(int i = 0 ; i < array.size(); i++){
cout << array.at(i) << " ";
}
cout << endl;
sort(array,0,array.size()-1);/***************sort is here*************/
cout << "sorting..." << endl;
cout << "array is ";
for(int i = 0 ; i < array.size(); i++){
cout << array.at(i) << " ";
}
return 0;
}
When I run this code console output is
array is 4 3 2 2
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 18446744073709551615) >=
this->size() (which is 4)
exited, aborted
But what I expected is
array is 4 3 2 2
sorting...
sort start
array is 2 2 3 4

You are trying to access an element at index -1, which, when converted into 64-bit unsigned value, is 18446744073709551615. Live demo: https://wandbox.org/permlink/jAhJZS3ANjkDDOUr.
There are multiple problems with your code and, first of all, you don't show us the definition of your partition and swap. Moreover, your code does not match the provided output (1 2 3 4 vs 4 3 2 2 in the first line).
Anyway, one of the problems is that you don't check for cases where left is higher than right. That can easily happen. Consider that in the very first call of sort, partition returns 0 (pivot position). Then, you call:
sort(array, left, p - 1);
which turns into
sort(array, 0, -1);
That's where negative indexes can be generated.

Program only works with inclusion of (side effects free) cout statements?

So I've been working on problem 15 from the Project Euler's website , and my solution was working great up until I decided to remove the cout statements I was using for debugging while writing the code. My solution works by generating Pascal's Triangle in a 1D array and finding the element that corresponds to the number of paths in the NxN lattice specified by the user. Here is my program:
#include <iostream>
using namespace std;
//Returns sum of first n natural numbers
int sumOfNaturals(const int n)
{
int sum = 0;
for (int i = 0; i <= n; i++)
{
sum += i;
}
return sum;
}
void latticePascal(const int x, const int y, int &size)
{
int numRows = 0;
int sum = sumOfNaturals(x + y + 1);
numRows = x + y + 1;
//Create array of size (sum of first x + y + 1 natural numbers) to hold all elements in P's T
unsigned long long *pascalsTriangle = new unsigned long long[sum];
size = sum;
//Initialize all elements to 0
for (int i = 0; i < sum; i++)
{
pascalsTriangle[i] = 0;
}
//Initialize top of P's T to 1
pascalsTriangle[0] = 1;
cout << "row 1:\n" << "pascalsTriangle[0] = " << 1 << "\n\n"; // <--------------------------------------------------------------------------------
//Iterate once for each row of P's T that is going to be generated
for (int i = 1; i <= numRows; i++)
{
int counter = 0;
//Initialize end of current row of P's T to 1
pascalsTriangle[sumOfNaturals(i + 1) - 1] = 1;
cout << "row " << i + 1 << endl; // <--------------------------------------------------------------------------------------------------------
//Iterate once for each element of current row of P's T
for (int j = sumOfNaturals(i); j < sumOfNaturals(i + 1); j++)
{
//Current element of P's T is not one of the row's ending 1s
if (j != sumOfNaturals(i) && j != (sumOfNaturals(i + 1)) - 1)
{
pascalsTriangle[j] = pascalsTriangle[sumOfNaturals(i - 1) + counter] + pascalsTriangle[sumOfNaturals(i - 1) + counter + 1];
cout << "pascalsTriangle[" << j << "] = " << pascalsTriangle[j] << '\n'; // <--------------------------------------------------------
counter++;
}
//Current element of P's T is one of the row's ending 1s
else
{
pascalsTriangle[j] = 1;
cout << "pascalsTriangle[" << j << "] = " << pascalsTriangle[j] << '\n'; // <---------------------------------------------------------
}
}
cout << endl;
}
cout << "Number of SE paths in a " << x << "x" << y << " lattice: " << pascalsTriangle[sumOfNaturals(x + y) + (((sumOfNaturals(x + y + 1) - 1) - sumOfNaturals(x + y)) / 2)] << endl;
delete[] pascalsTriangle;
return;
}
int main()
{
int size = 0, dim1 = 0, dim2 = 0;
cout << "Enter dimension 1 for lattice grid: ";
cin >> dim1;
cout << "Enter dimension 2 for lattice grid: ";
cin >> dim2;
latticePascal(dim1, dim2, size);
return 0;
}
The cout statements that seem to be saving my program are marked with commented arrows. It seems to work as long as any of these lines are included. If all of these statements are removed, then the program will print: "Number of SE paths in a " and then hang for a couple of seconds before terminating without printing the answer. I want this program to be as clean as possible and to simply output the answer without having to print the entire contents of the triangle, so it is not working as intended in its current state.

There's a good chance that either the expression to calculate the array index or the one to calculate the array size for allocation causes undefined behaviour, for example, a stack overflow.
Because the visibility of this undefined behaviour to you is not defined the program can work as you intended or it can do something else - which could explain why it works with one compiler but not another.
You could use a vector with vector::resize() and vector::at() instead of an array with new and [] to get some improved information in the case that the program aborts before writing or flushing all of its output due to an invalid memory access.
If the problem is due to an invalid index being used then vector::at() will raise an exception which you won't catch and many debuggers will stop when they find this pair of factors together and they'll help you to inspect the point in the program where the problem occurred and key facts like which index you were trying to access and the contents of the variables.
They'll typically show you more "stack frames" than you expect but some are internal details of how the system manages uncaught exceptions and you should expect that the debugger helps you to find the stack frame relevant to your problem evolving so you can inspect the context of that one.

Your program works well with g++ on Linux:
$ g++ -o main pascal.cpp
$ ./main
Enter dimension 1 for lattice grid: 3
Enter dimension 2 for lattice grid: 4
Number of SE paths in a 3x4 lattice: 35
There's got to be something else since your cout statements have no side effects.
Here's an idea on how to debug this: open 2 visual studio instances, one will have the version without the cout statements, and the other one will have the version with them. Simply do a step by step debug to find the first difference between them. My guess is that you will realize that the cout statements have nothing to do with the error.

Dynamic numerical series

I am trying to create a program to print first 200 elements following a specific numerical series condition which is
1-1-3-6-8-8-10-20
But instead of showing, just 200 elements is showing 802. I assume is because of the code inside the for loop. I have hours thinking on how to reduce that code to the job and I cannot think anything else. I am getting frustrated and need your help.
The exercise is on the code comments
//Print the following numerical series 1-1-3-6-8-8-10-20 until 200
#include <stdafx.h>
#include <iostream>
#include <stdlib.h>
using namespace std;
int main()
{
int Num1=200, z = 0, x = 1, y = 1;
cout << "\n\n1,";
cout << " 1,";
for (int i = 1; i <= Num1; i++)
{
z = y + 2;
cout << " " << z << ","; //It will print 3
z = z * 2;
cout << " " << z << ",";//It will print 6
z = z + 2;
cout << " " << z << ",";//It will print 8
z = z;
cout << " " << z << ",";//It will print 8
y = z;
}
cout << "\n\n";
system("pause");
return 0;
}

You're looping 200 times, and each time you loop, you're printing out 4 different numbers. You're also printing twice at the start so thats 2 + 4 * 200 = 802, which is where your 802 number output is coming from.

I assume is because of the code inside the "for" loop but I've hours
thinking on how to reduce that code to the job and I cannot think
anything else. I'm getting frustrated and need your help.
So you basically wanna simplify your code. Which can be done by noticing the repetitions.
There you can find only two types of change in the series; either a +2 or x2 with the previous element.
In each iteration this can be achieved by:
If reminder i%4 == 1 or i%4 == 3, need an increment of 2 (assuming 1 <= i <= MAX)
If reminder i%4 == 0, nothing but a multiplication of 2.
When you do like so, you can simply neglect, printing of first two ones and other complications in the total numbers in the series.
Also not that, you are trying to get 200 terms of this series, which increases in each step very fast and exceed the maximum limit of int. Therefore, long long is needed to be used instead.
The updated code will look like this:
#include <iostream>
typedef long long int int64;
int main()
{
int size = 200;
int64 z = -1;
for (int i = 1; i <= size; i++)
{
if ((i % 4 == 1) || (i % 4 == 3)) z += 2;
else if (i % 4 == 0) z *= 2;
std::cout << z << "\n";
}
return 0;
}
See the Output here: https://www.ideone.com/JiWB8W

c++ dice roll analyser (2D arrays)

I am doing some c++ practice and trying to write a program to count the amount of times a dice combination is rolled after 10000 attempts. I have used a 2D array to store every possible dice combination, and I perform 10000 rand()%6+1 and increments the value in the memory allocation it randoms.
This is my attempt.
cout << "\nDice roll analyser" << endl;
const int first = 6;
const int second = 6;
int nRolls[first][second];
int count = 0;
while (count < 10000){
nRolls[rand()%6+1][rand()%6+1]+=1;
count++;
}
for (int i=0;i<first;i++){
for (int j=0;j<second;j++){
cout << nRolls[i][j] << " ";
}
}
This is the output that I get;
0 0 0 0 0 0 0 269 303 265 270 264 228 289 272 294 290 269 262 294 303 277 265 294 288 266 313 274 301 245 317 276 292 284 264 260
What I am trying to achieve is the amount of times each combination is rolled e.g. how many times 1, 6 is rolled etc.

You never update your count.
For something where you want to run a code segment n times, where right now n = 10000, this is the general way you wanna do it.
for (int i = 0; i < 10000; ++i)
{
//loop code.
}
additionally, myVariable+=1 can always be simplified to either ++myVariable or myVariable++ (if you aren't use the value of myVariable right when you are assigning it, it is better to use the first one. More info on pre/post increment can be found here: http://gd.tuwien.ac.at/languages/c/programming-bbrown/c_015.htm
so instead of nRolls[rand()%6+1][rand()%6+1]+=1;
you can instead do
++(nRolls[rand()%6+1][rand()%6+1]);
Additionally, arrays are zero-indexed, meaning when you do rand()%6+1 you are restricting the values from 1 to 6 and leaving out the 0 position of an array, which is the first one, so consider instead just using
++(nRolls[rand()%6][rand()%6]);
then, to find out how often you roll a (i,j), where i and j are between 1 and 6,
cout << "(" << i << "," << j << "):" << nRolls[i-1][j-1] << endl;

You will have a problem here because you are adding +1 to rand()%6 you will never increment the count of any of elements with index zero. The minimum element index you allow to be incremented starts at 1.

OK, thanks for the help. This is the updated code that displays correctly!
cout << "\nDice roll analyzer" << endl;
srand(time(0));
const int first = 6;
const int second = 6;
int nRolls[first][second];
int count = 0;
while (count < 10000){
nRolls[rand()%6][rand()%6]++;
count++;
}
for (int i=0;i<first;i++){
for (int j=0;j<second;j++){
cout << "(" << i+1 << "," << j+1 << ")" << nRolls[i][j] << endl;
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js