Weird characters in MPI_File_write - c++

I copied the following example from Using MPI-2: Advanced Features of the Message-Passing Interface but the output file is just weired characters. I tried to change the data types from int to char but the output is still the same. I tried to open the open the outputfile with different programs like Notepadqq and gedit. I tried also to open the file with different file formats and adding null pointer to the end of the file through process zero but the results are still weired characters.
/* example of parallel MPI write into a single file */
#include <stdio.h>
#include "mpi.h"
#define BUFSIZE 100
int main( int argc, char **argv )
{
int i, MyRank, NumProcs, buf[BUFSIZE];
MPI_File TheFile;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
MPI_Comm_size(MPI_COMM_WORLD,&NumProcs);
for (i=0; i<BUFSIZE; i++)
buf[i]=MyRank*BUFSIZE+i;
MPI_File_open(MPI_COMM_WORLD, "testfile",MPI_MODE_CREATE|MPI_MODE_WRONLY,MPI_INFO_NULL, &TheFile);
MPI_File_set_view(TheFile,MyRank*BUFSIZE*sizeof(int),MPI_INT,MPI_INT,"native",MPI_INFO_NULL);
MPI_File_write(TheFile,buf,BUFSIZE,MPI_INT,MPI_STATUS_IGNORE);
// THis is my trial
if(MyRank == 0){
char nullChar = '\0';
MPI_File_write(TheFile, & nullChar , 1 , MPI_CHAR ,MPI_STATUS_IGNORE );
}
MPI_File_close(&TheFile);
MPI_Finalize();
return 0;
}

Because you are opening a binary file and you expect to see meaningful characters (not gonna happen). Look here for the difference between binary and text files. You can always read the data with MPI_File_read().

Related

MPI_Finalize() won't finalize if stdout and stderr are redirected via freopen

I have a problem using MPI and redirection of stdout and stderr.
When launched with multiple processes, if both stdout and stderr are redirected in (two different) files, then every processes will get stucked in MPI_Finalize(), waiting indefinitely. But if only stdout or stderr is redirected, then there is no problem and the program stops normaly.
I'm working on windows 7 with intel MPI on visual studio 2013.
Thanks for your help!
Below is a simple code that fails on my computer with 2 processes (mpiexec -np 2 mpitest.exe)
#include <iostream>
#include <string>
#include <mpi.h>
int main(int argc, char *argv[])
{
int ierr = MPI_Init(&argc, &argv);
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("[%d/%d] This is printed on screen\n", rank, size-1);
// redirect the outputs and errors if necessary
std::string log_file = "log_" + std::to_string(rank) + ".txt";
std::string error_file = "err_" + std::to_string(rank) + ".txt";
//If one of the two following line is commented, then everything works fine
freopen(log_file.c_str(), "w", stdout);
freopen(error_file.c_str(), "w", stderr);
printf("[%d/%d] This is printed on the logfile\n", rank, size - 1);
ierr = MPI_Finalize();
return 0;
}
EDIT : For those who are interested, we submit this error to the INTEL developper forum. They were able to reproduce the bug and are working to fix it. In the meantime, we redirect every stderr message on the stdout (ugly but working).

Copy large data file using parallel I/O

I have a fairly big data set, about 141M lines with .csv formatted. I want to use MPI commands with C++ to copy and manipulate a few columns, but I'm a newbie on both C++ and MPI.
So far my code looks like this
#include <stdio.h>
#include "mpi.h"
using namespace std;
int main(int argc, char **argv)
{
int i, rank, nprocs, size, offset, nints, bufsize, N=4;
MPI_File fp, fpwrite; // File pointer
MPI_Status status;
MPI_Offset filesize;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_File_get_size(fp, &filesize);
int buf[N];
for (i = 0; i<N; i++)
buf[i] = i;
offset = rank * (N/size)*sizeof(int);
MPI_File_open(MPI_COMM_WORLD, "new.csv", MPI_MODE_RDONLY, MPI_INFO_NULL, &fp);
MPI_File_open(MPI_COMM_WORLD, "Ntest.csv", MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fpwrite);
MPI_File_read(fp, buf, N, MPI_INT, &status);
// printf("\nrank: %d, buf[%d]: %d\n", rank, rank*bufsize, buf[0]);
printf("My rank is: %d\n", rank);
MPI_File_write_at(fpwrite, offset, buf, (N/size), MPI_INT, &status);
/* // repeat the process again
MPI_Barrier(MPI_COMM_WORLD);
printf("2/ My rank is: %d\n", rank); */
MPI_File_close(&fp);
MPI_File_close(&fpwrite);
MPI_Finalize();
}
I'm not sure where to start, and I've seen a few examples with lustre stripes. I would like to go that direction if possible. Additional options include HDF5 and T3PIO.
You are way too early to worry about lustre stripes, aside from the fact that lustre stripes are by default something ridiculously small for a "parallel file system". Increase the stripe size of the directory where you will write and read these files with lfs setstripe
Your first challenge will be how to decompose this CSV file. What does a typical row look like? If the rows are of variable length, you're going to have a bit of a headache. Here's why:
consider a CSV file with 3 rows and 3 MPI processes.
One row is aa,b,c (8 bytes).
row is aaaaaaa,bbbbbbb,ccccccc (24 bytes).
third row is ,,c (4 bytes) .
(darnit, markdown, how do I make this list start at zero?)
Rank 0 can read from the beginning of the file, but where will rank 1 and 2 start? If you simply divide total size (8+24+4=36) by 3, then the decomposistion is
0 ends up reading aa,b,c\naaaaaa,
1 reads a,bbbbbbb,ccc, and
reads cccc\n,,c\n
The two approaches to unstructured text input are as follows. One option is to index your file, either after the fact or as the file is being generated. This index would store the beginning offset of every row. Rank 0 reads the offset then broadcasts to everyone else.
The second option is to do this initial decomposition by file size, then fix up the splits. In the above simple example, rank 0 would send everything after the newline to rank 1. Rank 1 would receive the new data and glue it to the beginning of its row and send everything after its own newline to rank 2. This is extremely fiddly and I would not suggest it for someone just starting MPI-IO.
HDF5 is a good option here! Instead of trying to write your own parallel CSV parser, have your CSV creator generate an HDF5 dataset. HDF5, among other features, will keep that index i mentioned for you, so you can set up hyperslabs and do parallel reading and writing.

Passing non-NULL argv to MPI_Comm_spawn

Suppose that my program (let's call it prog_A) starts as a single MPI process.
And later I want program prog_A to spawn n MPI processes (let's call them prog_B) using MPI_Comm_spawn with the same arguments I passed to prog_A.
For example, if I run prog_A with the arguments 200 100 10
mpiexec -n 1 prog_A 200 100 10
I want prog_B to be provided with the same argments 200 100 10.
How can I do this? I tried the following but it does not work.
char ** newargv = new char*[3];//create new argv for childs
newargv[0] = new char[50];
newargv[1] = new char[50];
newargv[2] = new char[50];
strcpy(newargv[0],argv[1]);//copy argv to newargv
strcpy(newargv[1],argv[2]);
strcpy(newargv[2],argv[3]);
MPI_Comm theother;
MPI_Init(&argc, &argv);
MPI_Comm_spawn("prog_B",newargv,numchildprocs,
MPI_INFO_NULL, 0, MPI_COMM_SELF, &theother,
MPI_ERRCODES_IGNORE);
MPI_Finalize();
Your problem is that you didn't NULL terminate your argv list. Here's the important part of the MPI standard (emphasis added):
The argv argument argv is an array of strings containing arguments
that are passed to the program. The first element of argv is the first
argument passed to command, not, as is conventional in some contexts,
the command itself. The argument list is terminated by NULL in C and
C++ and an empty string in Fortran. In Fortran, leading and trailing
spaces are always stripped, so that a string consisting of all spaces
is considered an empty string. The constant MPI_ARGV_NULL may be used
in C, C++ and Fortran to indicate an empty argument list. In C and
C++, this constant is the same as NULL.
You just need to add a NULL to the end of your list. Here's the corrected code (translated to C since I didn't have the C++ bindings installed on my laptop):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mpi.h"
int main(int argc, char ** argv) {
char ** newargv = malloc(sizeof(char *)*4);//create new argv for childs
int numchildprocs = 1;
MPI_Comm theother;
MPI_Init(&argc, &argv);
MPI_Comm_get_parent(&theother);
if (MPI_COMM_NULL != theother) {
fprintf(stderr, "SPAWNED!\n");
} else {
newargv[0] = (char *) malloc(sizeof(char)*50);
newargv[1] = (char *) malloc(sizeof(char)*50);
newargv[2] = (char *) malloc(sizeof(char)*50);
newargv[3] = NULL;
strncpy(newargv[0],argv[1], 50);//copy argv to newargv
strncpy(newargv[1],argv[2], 50);
strncpy(newargv[2],argv[3], 50);
fprintf(stderr, "SPAWNING!\n");
MPI_Comm_spawn("./prog_B",newargv,numchildprocs,
MPI_INFO_NULL, 0, MPI_COMM_SELF, &theother,
MPI_ERRCODES_IGNORE);
}
MPI_Comm_free(&theother);
MPI_Finalize();
}
You do not need to copy the argument vector at all. All you have to do is make use of the provisions of the C99 standard, which requires that argv should be NULL-terminated:
MPI_Comm theother;
// Passing &argc and &argv here is a thing of the past (MPI-1)
MPI_Init(NULL, NULL);
MPI_Comm_spawn("prog_B", argv+1, numchildprocs,
MPI_INFO_NULL, 0, MPI_COMM_SELF, &theother,
MPI_ERRCODES_IGNORE);
MPI_Finalize();
Note the use of argv+1 in order to skip over the first argument (the program name). The benefit of that code is that it works with any number of arguments passed to the original program.

I am not able to compile with MPI compiler with C++

I was trying to compile a very simple MPI hello_world:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
MPI_Finalize();
}
And got the following problem:
Catastrophic error: could not set locale "" to allow processing of multibyte characters
I really don't know how to figure it out.
Try defining environment variables
LANG=en_US.utf8
LC_ALL=en_US.utf8
Assuming you're on unix, also try man locale and locale -a at command line, and google for "utf locale" and similar searches.
Re-defining the environment variable LANG solved the problem for me, as pointed out (setting LANG=en_US.utf8).
I may say that I'm conecting to a foreign server, and there's where I get the problem compiling code with Intel compilers.

reading this text in C/C++

Hi I am trying to read this text using a file input stream or some sort:
E^#^#<a^R#^##^FÌø<80>è^AÛ<80>è ^F \^DÔVn3Ï^#^#^#^# ^B^VÐXâ^#^#^B^D^E´^D^B^H
IQRÝ^#^#^#^#^A^C^C^GE^#^#<^#^##^##^F.^K<80>è ^F<80>è^AÛ^DÔ \»4³ÕVn3Р^R^V J ^#^#^B^D^E´^D^B^H
^#g<9f><86>IQRÝ^A^C^C^GE^#^#4a^S#^##^FÌÿ<80>è^AÛ<80>è ^F \^DÔVn3л4³Ö<80>^P^#.<8f>F^#^#^A^A^H
IQRÞ^#g<9f><86>E^#^A±,Q#^##^F^#E<80>è ^F<80>è^AÛ^DÔ \»4³ÖVn3Ð<80>^X^#.^NU^#^#^A^A^H
^#g<9f><87>
Here's the code I tried to read it with, but I am getting a bunch of 0s.
#include <stdio.h> /* required for file operations */
int main(int argc, char *argv[]){
int n;
FILE *fr;
unsigned char c;
if (argc != 2) {
perror("Usage: summary <FILE>");
return 1;
}
fr = fopen (argv[1], "rt"); /* open the file for reading */
while (1 == 1){
read(fr, &c, sizeof(c));
printf("<0x%x>\n", c);
}
fclose(fr); /* close the file prior to exiting the routine */
}
What's wrong with my code? I think I am not reading the file correctly.
You're using fopen() to open your file, which returns a FILE *, and read() to read it, which takes an int. You need to either use open() and read() together, or fopen() and fread(). You can't mix these together.
To clarify, fopen() and fread() make use of FILE pointers, which are a different way to access and a different abstraction than straight-up file descriptors. open() and read() make use of "raw" file descriptors, which are a notion understood by the operating system.
While not related to the program's failure here, your fclose() call must also match. In other words, fopen(), fread(), and fclose(), or open(), read(), and close().
Your's didn't compile for me, but I made a few fixes and it's right as rain ;-)
#include <stdio.h> /* required for file operations */
int main(int argc, char *argv[]){
int n;
FILE *fr;
unsigned char c;
if (argc != 2) {
perror("Usage: summary <FILE>");
return 1;
}
fr = fopen (argv[1], "rt"); /* open the file for reading */
while (!feof(fr)){ // can't read forever, need to stop when reading is done
// my ubuntu didn't have read in stdio.h, but it does have fread
fread(&c, sizeof(c),1, fr);
printf("<0x%x>\n", c);
}
fclose(fr); /* close the file prior to exiting the routine */
}
That doesn't look like text to me. So use the "r" mode to fopen, not "rt".
Also, ^# represents '\0', so you probably will read a bunch of zeros in any case. But not ALL zeros.