Writing MPI result to a file - c++

I have some code which solves an all-pars shortest path problem and each processor has a piece of the result. I am trying to write this result, which is a martix to an output file. So each process, which has part of the solution, will write the result to an output file in the correct position. Now i am trying to use fseek for this but am a little stuck because of the different sized integers. Like 2 and -199 will have to take more space. How can I do it so that the processors do not overwrite eachother? Also there might be race conditions for the writing.
Should i do this another way or is there a way to accomplish this? I was thinking of sending all result to one process (rank 0) and have that create the array and write the the file.

Don't use ASCII output; use binary, which is well defined in size.
So if you're using fstream and doubles:
fstream filewriter("file.bin",ios::out | ios::binary);
vector<double> mylist;
mylist.push_back(2.5);
mylist.push_back(7.6);
mylist.push_back(2.1);
mylist.push_back(3.2);
mylist.push_back(4.2);
filewriter.write((char*)&mylist[0],mylist.size()*sizeof(double));
This will write exactly 40 bytes, which is the size of double (8) times the size of your list (5 elements). And using fseek will be very easy.
In scientific environment when having a huge output it's extremely recommended to use binary data. However:
1- You have to learn about the concept of endianness (big endian, little endian).
2- You have to document your work proporly for reuse (purpose, size, number of element, dimensionality). I face huge misunderstandings when I forget to document stuff (I'm a PhD physicist who programs simulations).
So ASCII for data analysis is not the right choice.
Luckily, there's a full library specialized in organizing stuff for you, called HDF5. It organizes endianness and portability for you; however, it's not easy to deal with it, and it has a steep learning curve. I think that's a harder story for later times.
What I would recommend, is that you learn how to deal with binary files and how to read them, understand their issues and problems. I think that you're professional enough to deal with binary files, since you use MPI.
Here's a quick tutorial to binary files:
http://courses.cs.vt.edu/cs2604/fall02/binio.html
Cheers.

You could have each process write the output in some format that can be merged and cleaned up after the last one is done. Like (x, y, z), (x, y, z)...where x is the index of the row, y is the column and z the value.

This is a good job for memory-mapped files. They are system-dependent, but they're implemented in both POSIX and Windows OS families, so if you use a modern OS, they'd work. There is a portable and C++-friendly implementation of them in boost (classes mapped_file_source, mapped_file_sink and mapped_file). Interprocess output is a classical example of their usage.
They are binary, so most of that Samer said in his answer applies, too, the only difference is that you use pointer arithmetic instead of seeking.

Related

Fastest way to read a vector<double> from file

I have 3 vector, each with exactly 256^3 ~ 16 million elements that i want to store in a file and read as fast as possible. I only care about reading performance, and the representation of the data in memory can be any.
I have taken a look at some serialization techniques as well as writing/ reading plain numbers to/ from a file with ofstream, however i wonder if there is a more direct and faster approach.
(i am pretty new to c++ and its concepts)
Assuming both systems, windows and android, are little endian, which is common in ARM and x86/x64 CPUs, you can do the following.
First: Determine the type with a sepcific size, so either double, with 64-bit, float with 32-bit, or uint64/32/16 or int64/32/16. Do NOT use stuff like int or long to determine your data type.
Second: Use the following method to write binary data:
std::vector<uint64_t> myVec;
std::ofstream f("outputFile.bin", std::ios::binary);
f.write(reinterpret_cast<char*>(myVec.data()), myVec.size()*sizeof(uint64_t));
f.close();
In this, you're take the raw data and writing its binary format to a file.
Now on other machine, make sure the data type you use has the same datatype size and same endianness. If both are the same, you can do this:
std::vector<uint64_t> myVec(sizeOfTheData);
std::ifstream f("outputFile.bin", std::ios::binary);
f.read(reinterpret_cast<char*>(&myVec.front()), myVec.size()*sizeof(uint64_t));
f.close();
Notice that you have to know the size of the data before reading it.
Note: This code is off my head. I haven't tested it, but it should work.
Now if the target system doesn't have the same endianness, you have to read the data in batches, flip the endianness, then put it in your vector. How to flip endianness was extensively discussed here.
To determine the endianness of your system, this was discussed here.
The penalty on performance will be proportional to how different these systems are. If they're both the same endianness and you choose the same data type and size, you're good and you have optimum performance. Otherwise, you'll have some penalty depending on how many conversion you have to do. This is the fastest that you can ever get.
Note from comments: If you're transferring doubles or floats, make sure both systems use IEEE 754 standard. It's very common to use these, way more than endianness, but just to be sure.
Now if these solutions don't fit you, then you have to use a proper serialization library to standardize the format for you. There are libraries that can do that, such as protobuf.

Write a C++ struct to a file and read file using another programming language?

I have a challenging situation; we will have programs on Mac, PC, iOS and Android receiving files in a legacy format and parsing data from those files. We cannot change how those files are created.
The files are produced by a C++ program filling a struct with numbers and Strings and then writing it out. Here's a sanitized version.
struct MyObject {
String Kfkj(MAXKYS);
String Oern(MAXKYS);
String Vdflj(MAXKYS, 9);
int Muic;
int Tdfkj;
int VdfkAsdk;
int SsdjsdDsldsk;
int Ndsoief;
String TdflsajPdlj;
String TdckjdfPas;
String AdsfakjIdd;
int IdkfjdKasdkj;
int AsadkjaKadkja(MAXKYS);
int Kasldsdkj;
bool Usadl;
String PsadkjOasdj(9);
String PasdkjOsdkj;
};
Primitives and Strings, as you can see.
Then here is how they write it out to a file:
MyInstance MyObject;
FileName = "C:\MyFile.ab2"
ofstream fout (FileName, ios::binary);
fout.write((char*)& MyInstance, sizeof(MyInstance));
There is no option for us to translate it once and then distribute the file to other platforms; we must translate it on each and every different platform, and this is what we have to work with. I'd appreciate any information on how C++ serializes data, so we know how to parse the file.
EDIT: solution
The feedback I received from multiple answers here was VERY helpful. Using that, I did extensive analysis with hex editors and discovered:
the elements come in the file one after another
a "String," in this case, starts with an int describing how many characters follow the int for that String. If the String does not exist, it will still have that int with a value of 0.
integers, for the files and machines I saw, are two bytes, little-endian, and MOSTLY unsigned (there were a few that were signed, just to keep me on my toes)
the boolean was two bytes, with apparently -1 (FF FF) representing "true"
So far we have not ran into issues with different padding or endianness on different devices, but those are very real concerns. The skilled notes and warnings in these answers provides us with more ammunition to try to convince the client to change to a less fragile alternative, such as XML or JSON, for transferring data online across platforms.
As for those of you asking if the developer was fired... well, let's just say their code is very old, but after multiple conversations we're still having trouble convincing them writing out the C++ struct and trying to read that on different platforms is not a good idea.
You're going to run into many problems.
C++ doesn't have a specific format for serializing data per se. It is highly dependent on the computer architecture/processor that you are running on.
The compiler is allowed to add padding to help alignment on systems. When we say alignment we basically are referring to an architecture/processor's affinity for having data lie on specific byte boundaries. For example, some processors vastly prefer floating point numbers to lie at 4 or 8 byte boundaries - if they don't the processor may work much slower or may not work at all.
So, you can't simply know what padding your system is adding magically.
What you can do is use #pragma pack(1) / #pragma pack(0) to stop your compiler from padding your numbers.
PS: you also have to worry about endianness. What if one computer is running on big-endian and one is little endian? They will interpret bytes differently without a conversion.
Simply put, you either have to fix the application generating the files so it uses a proper serialization scheme OR you need to look at it running on a SPECIFIC computer, look at exactly how it writes the files, and write a translator for every target platform (which is just silly).
Interesting Suggestion
If you're really stuck, write an app that monitors the folder where you write files. Have the app pick up the files (since it's on the same PC it'll be able to read their format without issue). Have it write the files back in XML or some other true serialization format and distribute those instead.
Whoa - that's crazy. So String objects don't contain any pointers? Must not- because you claim this is working code.
Anyway, that code isn't doing any serialization. Its just writing the structure out to file exactly the way it is laid out in memory. The only issue you have is that on some platforms padding and sizeof integral types like int may be different.
You'll have to find the size of the integral types, and use that information in reader/writer for newer platforms to make sure they get laid out the same way on the legacy platform.
You're running a real risk with that code though. As it is, a compiler change could suddenly cause the file layout to change.
The format of your data file is entirely down to the compiler that your C++ program is compiled with, and the definition of your String class. You can rely on the fields being in the order they're declared in, and in this case, I think you can rely on there not being any padding at the start, but that's about all. Some tips that might help you out in this case:-
You don't give the definition of the String class you're using. If it's a typedef for std::string, you're completely screwed, because the contents of the string aren't in the memory. I assume your C++ programmers are using some special local buffer, in which case I'll guess you will find the first bytes of the object are the string, and there is some amount of useless padding afterwards. I hope the struct contains an int at the start telling you how much data in it is useful.
You'll probably find the int fields are four bytes long.
You'll probably find the bool field is one byte long, followed by three bytes of useless padding. Only one bit, most likely the bottom bit, will be set.
That's about all the useful guesswork I can offer you. In your target language, make sure to read the whole file in as the closest thing to a byte array available in the language, and only after that, use the language features to convert it into the right kind of thing in your language. Don't try reading it in as integers, as that won't let you byte-swap if you're on a platform with different endianness to the C++ program. I suggest also looking through the file in a text editor to reverse-engineer it and help you find the offset of each field.
Last piece of advice: consider printing P45s (or pink slips, or whatever you have in your country) for whichever programmers or project managers thought this kind of 'serialization' was a good idea. This kind of sloppy work might have been acceptable in a life-or-death situation, but they have seriously screwed you over in a way you're going to find it very hard to recover from. Writing the code to read in these files will not be that hard, if it's only one struct like this, but keeping it reliable will be a world of pain, and they've effectively made it impossible for themselves to change compilers or compiler version safely.
The way it's done, the struct is written in raw form to a file. So basically what you need to know to parse this file is the binary layout of your struct.
Basically, the fields are just one after the other, so to read an int, you just read 4 bytes and cast that to an int, etc.
Strings are a particular case. It's not clear from your code whether this "String" type is an inline array of characters, or a pointer to such an array. In the first case, you need to know how many characters each string contains and simply read that number of characters sequentially. In the second case, you won't be able to get the string back, since it won't have been written to file. The pointer will be useless to you.
One last concern is whether the struct is packed or not. Since you gave no indication to that, by default struct fields are aligned to 4-bytes boundaries, so there may be space for instance after the boolean field that you need to account for. If the struct is packed, then each field comes directly after the previous.
So, to make a long story short, figure out your struct binary layout using its definition and, if all else fails, inspecting the memory at run-time with the debugger, or use a hex editor to study the output file. Then write that specification down somewhere and this will give you what you need to read from the file. It's impossible to tell exactly what that layout is simply by looking at the pseudo-definition you gave.
Writing in an ofstream does not serialize data. This code write the raw memory content of the struct as it was a string of char. Depending of your compiler, its version, its options and the system it is running on the content will be completely different.
Even the number of bits of a char is allowed to change between c++ implementation.
Data referenced by the object of the struct won't be written (forget the content of std::string).
If you cannot change the writer code. You must know the alignment policy, the size of base type and the data representation. You will have to analyze files produced by hand, for example with an hexadecimal editor like this one
http://www.physics.ohio-state.edu/~prewett/hexedit/
, and probably look at your compiler documentation.
If you can change the writer code. Use proper serialization like json, protocol buffer or simply xml.
No one has pointed out something that sticks out to me as particularly problematic (maybe because I've been bit by it). That problem: the data member bool Usadl;. sizeof(bool) varies across platforms, across compilers, and even across releases of the same compiler. Common values for sizeof(bool) are 4 and 1. This will bite you. It's getting hard to find a big endian machine nowadays, very, very hard to find a computer where CHAR_BIT is not 8 or sizeof(int) is not 4. This is not the case for sizeof(bool).
In agreement with everyone else, Chad's team needs to document the structure of the records in the file, and then make sure the program that produces the file writes this structure explicitly, including element sizes, padding, and endianness. Don't depend on class layout to do this for you. That's just asking for trouble.
The best way would probably be to use JSON or if you want a more robust solution go with something like Avro. Avro has a C++ API and a Java API, so it covers most of the cases you're encountering.

More efficient way than scanf and printf for scaning and printing integers

I just want to know that how can we read integers(from stdin) and write integers(on stdout) without using scanf/printf/cin/cout because they are too slow.
Can anyone tell me how fread and fwrite can be used for this purpose? I know only a little about buffering etc.
I want the substitute for following two codes->
[1]
long i , a[1000000];
for(i=0;i<=999999;i++)
scanf("%ld",&a[i]);
and
[2]
for(i=0;i<=999999;i++)
printf("%ld\n",a[i]);
Any other efficient method is appreciated. Thanks in advance!!
I suggest you're looking in the wrong place for speed improvements. Let's look at just printf( ):
printf( )is limited by the huge time it takes to physically (electronically?) put characters on the terminal. You could speed this up a lot by using sprintf( ) to first write the output chars into an array; and then using printf( ) to send the array to the tty. printf( ) buffers lines anyway, but using a large, multi-line output array can overcome the setup delay that happens for every line.
printf( ) formatting is a tiny part of its overhead. You can be sure that the folks who wrote this library function did their best to make it as fast as it could be. And, over the forty years that printf( ) has been around, many others have worked it over and rewritten it a few zillion times to speed it up. No matter how hard you work to do the formatting that printf( ) takes care of, it's unlikely you can improve very much on their efforts.
scanf( ) delays and overhead are analogous.
Yours is a noble effort, put I don't believe it can pay off.
boost::karma (part of boost::spirit) promises quite good generator performance. You might try it.
If it is crucial that the numbers are read fast, and written fast, but not necessarily important that the output is presented to the user fast, or read from the file fast. You might consider making a buffer between the input/output streams and the processing.
Read the file into the buffer, it can be done in a separate thread, and then extract the number from this buffer instead. And for generating the output write to a memory buffer first, and then write the buffer to a file afterwards, again this can be done in a separate thread.
Most IO routines are relatively slow, since accessing information on the disk is slow (slower than the memory or cache). Of course this only makes sense if it is not about optimising the entire output/input phase. In which case there is no point in going through a separate (own implement) buffer.
By separating the parsing of the numbers and the IO part, you will increase the perceived speed of the parsing tremendously.
If you are looking for alternatives faster than scanf / printf you might consider implementing your own method that is not depending on a format string. Specialised implementations are often faster than the generalised ones. However do consider it twice before you start reinventing the wheel.
printf is an operation on a FILE * and it buffers, puts operates on a FD and does not. Build output in buffer and then use puts. printf also has to parse the format string and takes variable type args; if you know the value is an integer, you can avoid all that by doing some math and add value of each digit to '0' and formatting the number yourself.

Marshall multiple protobuf to file

Background:
I'm using Google's protobuf, and I would like to read/write several gigabytes of protobuf marshalled data to a file using C++. As it's recommended to keep the size of each protobuf object under 1MB, I figured a binary stream (illustrated below) written to a file would work. Each offset contains the number of bytes to the next offset until the end of the file is reached. This way, each protobuf can stay under 1MB, and I can glob them together to my heart's content.
[int32 offset]
[protobuf blob 1]
[int32 offset]
[protobuf blob 2]
...
[eof]
I have an implemntation that works on Github:
src/glob.hpp
src/glob.cpp
test/readglob.cpp
test/writeglob.cpp
But I feel I have written some poor code, and would appreciate some advice on how to improve it. Thus,
Questions:
I'm using reinterpret_cast<char*> to read/write the 32 bit integers to and from the binary fstream. Since I'm using protobuf, I'm making the assumption that all machines are little-endian. I also assert that an int is indeed 4 bytes. Is there a better way to read/write a 32 bit integer to a binary fstream given these two limiting assumptions?
In reading from fstream, I create a temporary fixed-length char buffer, so that I can then pass this fixed-length buffer to the protobuf library to decode using ParseFromArray, as ParseFromIstream will consume the entire stream. I'd really prefer just to tell the library to read at most the next N bytes from fstream, but there doesn't seem to be that functionality in protobuf. What would be the most idiomatic way to pass a function at most N bytes of an fstream? Or is my design sufficiently upside down that I should consider a different approach entirely?
Edit:
#codymanix: I'm casting to char since istream::read requires a char array if I'm not mistaken. I'm also not using the extraction operator >> since I read it was poor form to use with binary streams. Or is this last piece of advice bogus?
#Martin York: Removed new/delete in favor of std::vector<char>. glob.cpp is now updated. Thanks!
Don't use new []/delete[].
Instead us a std::vector as deallocation is guaranteed in the event of exceptions.
Don't assume that reading will return all the bytes you requested.
Check with gcount() to make sure that you got what you asked for.
Rather than have Glob implement the code for both input and output depending on a switch in the constructor. Rather implement two specialized classes like ifstream/ofstream. This will simplify both the interface and the usage.

How to perform fast formatted input from a stream in C++?

The situation is: there is a file with 14 294 508 unsigned integers and 13 994 397 floating-point numbers (need to read doubles). Total file size is ~250 MB.
Using std::istream takes ~30sec. Reading the data from file to memory (just copying bytes, without formatted input) is much faster. Is there any way to improve reading speed without changing file format?
Do you need to use STL style i/o? You must check out this excellent piece of work from one of the experts. It's a specialized iostream by Dietmar Kuhl.
I hate to suggest this but take a look at the C formatted i/o routines. Also, are you reading in the whole file in one go?
You might also want to look at Matthew Wilson's FastFormat library:
http://www.fastformat.org/
I haven't used it, but he makes some pretty impressive claims and I've found a lot of his other work to be worth studying and using (and stealing on occasion).
You haven't specified the format. It's possible that you could memory map it, or could read in very large chunks and process in a batch algorithm.
Also, you haven't said whether you know for sure that the file and process that will read it will be on the same platform. If a big-endian process writes it and a little-endian process reads it, or vice versa, it won;t work.
Parsing input by yourself (atoi & atof), usually boosts speed at least twice, compared to "universal" read methods.
Something quick and dirty is to just dump the file into a standard C++ string, and then use a stringstream on it:
#include <sstream>
// Load file into string file_string
std::stringstream s( file_string );
int x; float y;
s >> x >> y;
This may not give you much of a performance improvement (you will get a larger speed-up by avoiding iostreams), but it's very easy to try, and it may be faster enough.