MPI_Scatter Segfaulting

MPI_Scatter Segfaulting - c++

I'm working on a parallel sort program to learn MPI, and I've been having problems with MPI_Scatter. Every time I attempt to run, I get the following:
reading input
Scattering input
_pmii_daemon(SIGCHLD): [NID 00012] PE 0 exit signal Segmentation fault
[NID 00012] 2011-03-28 10:12:56 Apid 23655: initiated application termination
A basic look at other questions didn't really answer why I'm having troubles - The arrays are contiguous, so I shouldn't have problems with non-contiguous memory access, and I'm passing the correct pointers in the correct order. Does anyone have any ideas?
Source code is below - It's specified for a specific number because I don't want to deal with variable input and rank size just yet.
#include <mpi.h>
#include <iostream>
using std::endl;
using std::cout;
#include <fstream>
using std::ifstream;
using std::ofstream;
#include <algorithm>
using std::sort;
#define SIZEOF_INPUT 10000000
#define NUMTHREADS 100
#define SIZEOF_SUBARRAY SIZEOF_INPUT/NUMTHREADS
int main(int argc, char** argv){
MPI_Init(&argc, &argv);
int input[SIZEOF_INPUT];
int tempbuf[SIZEOF_SUBARRAY];
int myRank;
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
/*
Read input from file
*/
if(myRank == 0){
cout << "reading input" << endl;
ifstream in(argv[1]);
for(int i = 0; i < SIZEOF_INPUT; ++i)
in >> input[i];
cout << "Scattering input" << endl;
}
// Scatter, Sort, and Gather again
MPI_Scatter(input,SIZEOF_INPUT,MPI_INT,tempbuf,SIZEOF_SUBARRAY,MPI_INT,0,MPI_COMM_WORLD);
cout << "Rank " << myRank << "Sorting" << endl;
sort(tempbuf,tempbuf+SIZEOF_SUBARRAY);
MPI_Gather(tempbuf,SIZEOF_SUBARRAY,MPI_INT,input,SIZEOF_INPUT,MPI_INT,0,MPI_COMM_WORLD);
if(myRank == 0){
cout << "Sorting final output" << endl;
// I'm doing a multi-queue merge here using tricky pointer games
//list of iterators representing things in the queue
int* iterators[NUMTHREADS];
//The ends of those iterators
int* ends[NUMTHREADS];
//Set up iterators and ends
for(int i = 0; i < NUMTHREADS; ++i){
iterators[i] = input + (i*SIZEOF_SUBARRAY);
ends[i] = iterators[i] + SIZEOF_SUBARRAY;
}
ofstream out(argv[2]);
int ULTRA_MAX = SIZEOF_INPUT + 1;
int* ULTRA_MAX_POINTER = &ULTRA_MAX;
while(true){
int** curr_min = &ULTRA_MAX_POINTER;
for(int i = 0 ; i < NUMTHREADS; ++i)
if(iterators[i] < ends[i] && *iterators[i] < **curr_min)
curr_min = &iterators[i];
if(curr_min == &ULTRA_MAX_POINTER) break;
out << **curr_min << endl;
++(*curr_min);
}
}
MPI_Finalize();
}
Any help would be much appreciated.
Regards,
Zach

Hah! Took me a while to see this one.
The trick is, in MPI_Scatter, the sendcount is the amount to send to each process, not in total. Same with gather; it's the amount to receive from each. That is, it's like MPI_Scatterv with counts; the count is to each process, but in this case, it's assumed to be the same.
so this
MPI_Scatter(input,SIZEOF_SUBARRAY,MPI_INT,tempbuf,SIZEOF_SUBARRAY,MPI_INT,0,MPI_COMM_WORLD);
cout << "Rank " << myRank << "Sorting" << endl;
MPI_Gather(tempbuf,SIZEOF_SUBARRAY,MPI_INT,input,SIZEOF_SUBARRAY,MPI_INT,0,MPI_COMM_WORLD);
works for me.
Also, be careful of allocating big arrays like that on the stack; I know this is just an example problem, but for me this was causing crashes right away. Doing it dynamically
int *input = new int[SIZEOF_INPUT];
int *tempbuf = new int[SIZEOF_SUBARRAY];
//....
delete [] input;
delete [] tempbuf;
solved that problem.

int* iterators[NUMTHREADS];
//The ends of those iterators
int* ends[NUMTHREADS];
//Set up iterators and ends
for(int i = 0; i < NUMTHREADS; ++i){
iterators[i] = input + (i*SIZEOF_SUBARRAY); // problem
ends[i] = iterators[i] + SIZEOF_SUBARRAY; // problem
}
Both iterators and ends are array of integer pointers pointing no where or garbage. But in the for loop trying to keep values as if they are pointing to some location, which results segmentation fault. Program should should first allocate memory, iterators can point to and then should keep the values at locations pointed by them.
for( int i=0 ; i < NUMTHREADS; ++i )
{
iterators[i] = new int;
end[i] = new int ;
}
// Now do the earlier operation which caused problem
Since the program manages resources( i.e., acquired from new ), it should return resources to free store using delete[] when no longer needed. Use std::vector instead of managing resources your self, which is very easy.

Related

Issues with checking an array moving both forwards and backwards simultaneously and issue printing values stored in a pointer array

Preface: Currently reteaching myself C++ so please excuse some of my ignorance.
The challenge I was given was to write a program to search through a static array with a function and return the indices of the number you were searching for. This only required 1 function and minimal effort so I decided to make it more "complicated" to practice more of the things I have learned thus far. I succeeded for the most part, but I'm having issues with my if statements within my for loop. I want them to check 2 separate spots within the array passed to it, but it is checking the same indices for both of them. I also cannot seem to get the indices as an output. I can get the correct number of memory locations, but not the correct values. My code is somewhat cluttered and I understand there are more efficient ways to do this. I would love to be shown these ways as well, but I would also like to understand where my error is and how to fix it. Also, I know 5 won't always be present within the array since I'm using a pseudo random number generator.
Thank you in advance.
#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
// This is supposed to walk throught the array both backwards and forwards checking for the value entered and
// incrementing the count so you know the size of the array you need to create in the next function.
int test(int A[], int size, int number) {
int count = 0;
for (int i = 0; i <= size; i++, size--)
{
if (A[i] == number)
count++;
// Does not walk backwards through the array. Why?
if (A[size] == number)
count++;
}
cout << "Count is: " << count << endl;
return (count);
}
// This is a linear search that creates a pointer array from the previous "count" variable in function test.
// It should store the indices of the value you are searching for in this newly created array.
int * search(int A[], int size, int number, int arr_size){
int *p = new int[arr_size];
int count =0;
for(int i = 0; i < size; i++){
if(A[i]==number) {
p[count] = i;
}
count++;
}
return p;
}
int main(){
// Initializing the array to zero just to be safe
int arr[99]={0},x;
srand(time(0));
// Populating the array with random numbers in between 1-100
for (int i = 0; i < 100; i++)
arr[i]= (rand()%100 + 1);
// Was using this to check if the variable was actually in the array.
// for(int x : arr)
// cout << x << " ";
// Selecting the number you wish to search for.
// cout << "Enter the number you wish to search for between 1 and 100: ";
// cin >> x;
// Just using 5 as a test case.
x = 5;
// This returns the number of instances it finds the number you're looking for
int count = test(arr, (sizeof(arr)/4), x);
// If your count returns 0 that means the number wasn't found so no need to continue.
if(count == 0){
cout << "Your number was not found " << endl;
return 0;
}
// This should return the address array created in the function "search"
int *index = search(arr, (sizeof(arr)/4), x, count);
// This should increment through the array which address you assigned to index.
for(int i=0; i < count; i++) {
// I can get the correct number of addresses based on count, just not the indices themselves.
cout << &index[i] << " " << endl;
}
return 0;
}
I deeply appreciate your help and patience as well as I want to thank you again for your help.

Assigning Symbols to Numbers in a 2D Array

I'm working on a program in C++ that is supposed to read in a file, store the content of the file into a 2D array, assign characters to each of the numbers in the array and store in a char array, and print both of these arrays. It's then supposed to go through the initial array and make sure that each number doesn't differ in value from it's neighboring numbers by more than 1, correct these errors by replacing these numbers with the value of the average of their neighbors, assign characters to this corrected array as it did before, and print both arrays.
The character assignments go as follows:
0=blank
1=.
2=,
3=_
4=!
5=+
6=*
7=#
8=$
9=&
I have the code written that opens the file and loads the array, but I have no idea where to go from there. To me the obvious, although probably not best, way to do the assignments is to go through the array with a for loop and use a series of if statements to check for the value of the number at each index and assign the appropriate symbol. I'm sure there's a better way to accomplish this.
Here is the code I have so far:
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ifstream prog;
prog.open("../prog1.dat");
//If file can't be opened, exit
if (!prog) {
cerr << "File could not be opened" << endl;
exit(EXIT_FAILURE);
}
else {
while (!prog.eof()) {
int size = 100, i, j;
prog >> size;
int **numArray = new int* [size];
for(i = 0; i < size; i++) {
numArray[i] = new int[size];
for(j = 0; j < size; j++) {
prog >> numArray[i][j];
}
cout << endl;
}
for(i = 0; i < size; i++) {
for(j = 0; j < size; j++) {
cout <<numArray[i][j] << " ";
}
cout << endl;
}
prog.close();
return 0;
}
}
}
I'm extremely new to this programming language, this is actually my first program I've done in C++ and I'm literally learning as I go. Any suggestions would be greatly appreciated.

In this code You have not put a check on the difference with neighbour.
Moreover their is no need for 2 nested for loops that is a very big overhead. You could have printed the numArray in the first nested for loop.
According to you it is your first programming assignment and you are already using double pointers and nested loops and also the way you checked the file is opened or not. Are you sure it's your first assignment

Parallel implemention of Lisp-style mapping of a function to a list in C++ fails without cout after use of thread

This code works only when any of the lines under /* debug messages */ are uncommented. Or if the list being mapped to is less than 30 elements.
func_map is a linear implementation of a Lisp-style mapping and can be assumed to work.
Use of it would be as follows func_map(FUNC_PTR foo, std::vector* list, locs* start_and_end)
FUNC_PTR is a pointer to a function that returns void and takes in an int pointer
For example: &foo in which foo is defined as:
void foo (int* num){ (*num) = (*num) * (*num);}
locs is a struct with two members int_start and int_end; I use it to tell func_map which elements it should iterate over.
void par_map(FUNC_PTR func_transform, std::vector<int>* vector_array) //function for mapping a function to a list alla lisp
{
int array_size = (*vector_array).size(); //retain the number of elements in our vector
int num_threads = std::thread::hardware_concurrency(); //figure out number of cores
int array_sub = array_size/num_threads; //number that we use to figure out how many elements should be assigned per thread
std::vector<std::thread> threads; //the vector that we will initialize threads in
std::vector<locs> vector_locs; // the vector that we will store the start and end position for each thread
for(int i = 0; i < num_threads && i < array_size; i++)
{
locs cur_loc; //the locs struct that we will create using the power of LOGIC
if(array_sub == 0) //the LOGIC
{
cur_loc.int_start = i; //if the number of elements in the array is less than the number of cores just assign one core to each element
}
else
{
cur_loc.int_start = (i * array_sub); //otherwise figure out the starting point given the number of cores
}
if(i == (num_threads - 1))
{
cur_loc.int_end = array_size; //make sure all elements will be iterated over
}
else if(array_sub == 0)
{
cur_loc.int_end = (i + 1); //ditto
}
else
{
cur_loc.int_end = ((i+1) * array_sub); //otherwise use the number of threads to determine our ending point
}
vector_locs.push_back(cur_loc); //store the created locs struct so it doesnt get changed during reference
threads.push_back(std::thread(func_map,
func_transform,
vector_array,
(&vector_locs[i]))); //create a thread
/*debug messages*/ // <--- whenever any of these are uncommented the code works
//cout << "i = " << i << endl;
//cout << "int_start == " << cur_loc.int_start << endl;
//cout << "int_end == " << cur_loc.int_end << endl << endl;
//cout << "Thread " << i << " initialized" << endl;
}
for(int i = 0; i < num_threads && i < array_size; i++)
{
(threads[i]).join(); //make sure all the threads are done
}
}
I think that the issue might be in how vector_locs[i] is used and how threads are resolved. But the use of a vector to maintain the state of the locs instance referenced by thread should prevent that from being an issue; I'm really stumped.

You're giving the thread function a pointer, &vector_locs[i], that may become invalidated as you push_back more items into the vector.
Since you know beforehand how many items vector_locs will contain - min(num_threads, array_size) - you can reserve that space in advance to prevent reallocation.
As to why it doesn't crash if you uncomment the output, I would guess that the output is so slow that the thread you just started will finish before the output is done, so the next iteration can't affect it.

I think you should make this loop inner to the main one:
...
for(int i = 0; i < num_threads && i < array_size; i++)
{
(threads[i]).join(); //make sure all the threads are done
}
}

Random slowdown when inserting elements at random into vectors

EDIT:
I've fixed the insertion. As Blastfurnace kindly mentioned the insertion invalidated the iterators. The loop is needed I believe to compare performance (see my comment on Blastfurnance's answer). My code is updated. I have completely similar code for the list just with vector replaced by list. However, with the code I find that the list performs better than the vector both for small and large datatypes and even for linear search (if I remove the insertion). According to http://java.dzone.com/articles/c-benchmark-%E2%80%93-stdvector-vs and other sites that should not be the case. Any clues to how that can be?
I am taking a course on programming of mathematical software (exam on monday) and for that I would like to present a graph that compares performance between random insertion of elements into a vector and a list. However, when I'm testing the code I get random slowdowns. For instance I might have 2 iterations where inserting 10 elements at random into a vector of size 500 takes 0.01 seconds and then 3 similar iterations that each take roughly 12 seconds. This is my code:
void AddRandomPlaceVector(vector<FillSize> &myContainer, int place) {
int i = 0;
vector<FillSize>::iterator iter = myContainer.begin();
while (iter != myContainer.end())
{
if (i == place)
{
FillSize myFill;
iter = myContainer.insert(iter, myFill);
}
else
++iter;
++i;
}
//cout << i << endl;
}
double testVector(int containerSize, int iterRand)
{
cout << endl;
cout << "Size: " << containerSize << endl << "Random inserts: " << iterRand << endl;
vector<FillSize> myContainer(containerSize);
boost::timer::auto_cpu_timer tid;
for (int i = 0; i != iterRand; i++)
{
double randNumber = (int)(myContainer.size()*((double)rand()/RAND_MAX));
AddRandomPlaceVector(myContainer, randNumber);
}
double wallTime = tid.elapsed().wall/1e9;
cout << "New size: " << myContainer.size();
return wallTime;
}
int main()
{
int testSize = 500;
int measurementIters = 20;
int numRand = 1000;
int repetionIters = 100;
ofstream tidOutput1_sum("VectorTid_8bit_sum.txt");
ofstream tidOutput2_sum("ListTid_8bit_sum.txt");
for (int i = 0; i != measurementIters; i++)
{
double time = 0;
for (int j = 0; j != repetionIters; j++) {
time += testVector((i+1)*testSize, numRand);
}
std::ostringstream strs;
strs << double(time/repetionIters);
tidOutput1_sum << ((i+1)*testSize) << "," << strs.str() << endl;
}
for (int i = 0; i != measurementIters; i++)
{
double time = 0;
for (int j = 0; j != repetionIters; j++) {
time += testList((i+1)*testSize, numRand);
}
std::ostringstream strs;
strs << double(time/repetionIters);
tidOutput2_sum << ((i+1)*testSize) << "," << strs.str() << endl;
}
return 0;
}
struct FillSize
{
double fill1;
};
The struct is just for me to easily add more values so I can test for elements with different size. I know that this code is probably not perfect concerning performance-testing, but they would rather have me make a simple example than simply reference to something I found.
I've tested this code on two computers now, both having the same issues. How can that be? And can you help me with a fix so I can graph it and present it Monday? Perhaps adding some seconds of wait time between each iteration will help?
Kind regards,
Bjarke

Your AddRandomPlaceVector function has a serious flaw. Using insert() will invalidate iterators so the for loop is invalid code.
If you know the desired insertion point there's no reason to iterate over the vector at all.
void AddRandomPlaceVector(vector<FillSize> &myContainer, int place)
{
FillSize myFill;
myContainer.insert(myContainer.begin() + place, myFill);
}

Killed process by SIGKILL

I have a process that get killed immediately after executing the program. This is the code of the compiled executable, and it is a small program that reads several graphs represented by numbers from standard input (a descriptive file usually) and finds the minimum spanning tree for every graph using the Prim's algorithm (it does not show the results yet, it just find the solution).
#include <stdlib.h>
#include <iostream>
using namespace std;
const int MAX_NODOS = 20000;
const int infinito = 10000;
int nnodos;
int nAristas;
int G[MAX_NODOS][MAX_NODOS];
int solucion[MAX_NODOS][MAX_NODOS];
int menorCoste[MAX_NODOS];
int masCercano[MAX_NODOS];
void leeGrafo(){
if (nnodos<0 || nnodos>MAX_NODOS) {
cerr << "Numero de nodos (" << nnodos << ") no valido\n";
exit(0);
}
for (int i=0; i<nnodos ; i++)
for (int j=0; j<nnodos ; j++)
G[i][j] = infinito;
int A,B,P;
for(int i=0;i<nAristas;i++){
cin >> A >> B >> P;
G[A][B] = P;
G[B][A] = P;
}
}
void prepararEstructuras(){
// Grafo de salida
for(int i=0;i<nnodos;i++)
for(int j=0;j<nnodos;j++)
solucion[i][j] = infinito;
// el mas cercaano
for(int i=1;i<nnodos;i++){
masCercano[i]=0;
// menor coste
menorCoste[i]=G[0][i];
}
}
void prim(){
prepararEstructuras();
int min,k;
for(int i=1;i<nnodos;i++){
min = menorCoste[1];
k = 1;
for(int j=2;i<nnodos;j++){
if(menorCoste[j] < min){
min = menorCoste[j];
k = j;
}
}
solucion[k][masCercano[k]] = G[k][masCercano[k]];
menorCoste[k] = infinito;
for(int j=1;j<nnodos;j++){
if(G[k][j] < menorCoste[j] && menorCoste[j]!=infinito){
menorCoste[j] = G[k][j];
masCercano[j] = k;
}
}
}
}
void output(){
for(int i=0;i<nnodos;i++){
for(int j=0;j<nnodos;j++)
cout << G[i][j] << ' ';
cout << endl;
}
}
int main (){
while(true){
cin >> nnodos;
cin >> nAristas;
if((nnodos==0)&&(nAristas==0)) break;
else{
leeGrafo();
output();
prim();
}
}
}
I have learned that i must use strace to find what is going on, and this is what i get :
execve("./412", ["./412"], [/* 38 vars */] <unfinished ...>
+++ killed by SIGKILL +++
Killed
I am runing ubuntu and this is the first time i get this type of errors. The program is supposed to stop after reading two zeros in a row from the input wich i can guarantee that i have in my graphs descriptive file. Also the problem happens even if i execute the program without doing an input redirection to my graphs file.

Although I'm not 100% sure that this is the problem, take a look at the sizes of your global arrays:
const int MAX_NODOS = 20000;
int G[MAX_NODOS][MAX_NODOS];
int solucion[MAX_NODOS][MAX_NODOS];
Assuming int is 4 bytes, you'll need:
20000 * 20000 * 4 bytes * 2 = ~3.2 GB
For one, you might not even have that much memory. Secondly, if you're on 32-bit, it's likely that the OS will not allow a single process to have that much memory at all.
Assuming you're on 64-bit (and assuming you have enough memory), the solution would be to allocate it all at run-time.

Your arrays G and solucion each contain 400,000,000 integers, which is about 1.6 GiB each on most machines. Unless you have enough (virtual) memory for that (3.2 GiB and counting), and permission to use it (try ulimit -d; that's correct for bash on MacOS X 10.7.2), your process will fail to start and will be killed by SIGKILL (which cannot be trapped, not that the process is really going yet).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

MPI_Scatter Segfaulting - c++

Related

Issues with checking an array moving both forwards and backwards simultaneously and issue printing values stored in a pointer array

Assigning Symbols to Numbers in a 2D Array

Parallel implemention of Lisp-style mapping of a function to a list in C++ fails without cout after use of thread

Random slowdown when inserting elements at random into vectors

Killed process by SIGKILL

Categories

Resources