How can I use fread() on a binary file to read the data into a std::vector? - c++

Follow-up question on an earlier question I had, that has been perfectly answered. To quickly recap, I had trouble creating a class holding a huge array (stack overflow error). In the answers, some users recommended I use std::vector instead.
The function to read in the data looks like this:
Test()
{
memset(myarray, 0, sizeof(myarray));
FILE* fstr = fopen("myfile.dat", "rb");
size_t success= fread(myarray, sizeof(myarray), 1, fstr);
fclose(fstr);
}
for a myarray which looked like this:
int myarray[45000000];
My question is: How can I read this into a preferable:
std::vector<int> myvector;
I searched google , and have found multiple answers, usually pointing to the following code:
std::ifstream input("myfile.dat", std::ios::in | std::ifstream::binary);
std::copy(std::istream_iterator<int>(input),
std::istream_iterator<int>(),
std::back_inserter(myvector));
After implementing this, and when calling myvector.size() I get 16 (for whatever reason), and accessing a vector element leads to an immediate crash for going out of the vector bounds.
So what do I have to do to get this right? I once read somewhere that I could just simply use the "old" method, and then reading the array into the vector, but this seems to defeat the purpose of using the vector in the first place.

fread() reads your file binary, while ifstream_iterator tries to extract formatted ints (like 42).
You want to resize your vector and use input.read(...) instead:
const size_t size = 45000000; // change this to the appropriate value
std::vector<char> myvector(size, 0);
std::ifstream input("myfile.dat", std::ios::in | std::ifstream::binary);
input.read(&myvector[0], myvector.size());
Note that you need to use a std::vector<char> since read expects the first parameter to be a char *. You can use other types T if you cast the type correctly:
input.read(reinterpret_cast<char*>(&myvector[0]), myvector.size() * sizeof(T));

If you're using C++ you should try to avoid using the C FILE APIs all together -- so you're on the right track. The problem you're having is that istream_iterator reads input as text, not binary -- it's expecting ASCII digits. This this out instead:
std::vector<int> vec(45000000);
std::filebuf fb;
fb.open("myfile.dat", std::ios_base::in | std::ios_base::binary);
fb.sgetn((char*)&vec[0], vec.size() * sizeof(vec[0]));

Related

Dynamic allocation of a vector inside a struct while reading a large txt file

I am currently learning the C ++ language and need to read a file containing more than 5000 double type numbers. Since push_back will make a copy while allocating new data, I was trying to figure out a way to decrease computational work. Note that the file may contain a random number of double types, so allocating memory by specifying a large enough vector is not the solution looking for.
My idea would be to quickly read the whole file and get and approximation size of the array. In Save & read double vector from file C++? found an interesting idea that can be found in the code below.
Basically, the vector containing the file data is inserted in a structure type named PathStruct. Bear in mind that the PathStruct contains more that this vector, but for the sake of simplicity I deleted all the rest. The function receives a reference of the PathStruct pointer and read the file.
struct PathStruct
{
std::vector<double> trivial_vector;
};
bool getFileContent(PathStruct *&path)
{
std::ifstream filename("simplePath.txt", std::ios::in | std::ifstream::binary);
if (!filename.good())
return false;
std::vector<char> buffer{};
std::istreambuf_iterator<char> iter(filename);
std::istreambuf_iterator<char> end{};
std::copy(iter, end, std::back_inserter(buffer));
path->trivial_vector.reserve(buffer.size() / sizeof(double));
memcpy(&path->trivial_vector[0], &buffer[0], buffer.size());
return true;
};
int main(int argc, char **argv)
{
PathStruct *path = new PathStruct;
const int result = getFileContent(path);
return 0;
}
When I run the code, the compiler give the following error:
corrupted size vs. prev_size, Aborted (core dumped).
I believe my problem in the incorrect use of pointer. Is definitely not my strongest point, but I cannot find the problem. I hope someone could help out this poor soul.
If your file contains only consecutive double values, you can check the file size and divide it by double size. To determine the file size you can use std::filesystem::file_size but this function is available from C++ 17. If you cannot use C++ 17, you can find other methods for determining file size here
auto fileName = "file.bin";
auto fileSize = std::filesystem::file_size(fileName);
std::ifstream inputFile("file.bin", std::ios::binary);
std::vector<double> values;
values.reserve(fileSize / sizeof(double));
double val;
while(inputFile.read(reinterpret_cast<char*>(&val), sizeof(double)))
{
values.push_back(val);
}
or using pointers:
auto numberOfValues = fileSize / sizeof(double);
std::vector<double> values(numberOfValues);
// Notice that I pass numberOfValues * sizeof(double) as a number of bytes to read instead of fileSize
// because the fileSize may be not divisable by sizeof(double)
inputFile.read(reinterpret_cast<char*>(values.data()), numberOfValues * sizeof(double));
Alternative
If you can modify the file structure, you can add a number of double values at the beginning of the file and read this number before reading double values. This way you will always know the number of values to read, without checking file size.
Alternative 2
You can also change a container from std::vector to std::deque. This container is similar to std::vector, but instead of keeping a single buffer for data, it has may smaller array. If you are inserting data and the array is full, the additional array will be allocated and linked without copying previous data.
This has however a small price, data access requires two pointer dereferences instead of one.

File writing and appending in Binary mode not working

I am trying to append into file in binary mode but the logic below is not working.
For Pdf files,file is getting corrupted and for text files, it is adding some junk data in addition to my file contents.
My variable m_strReceivedMessage is of type std::string.
std::ofstream out(file, std::ios::binary | std::ios_base::app );
int i = sizeof(m_strReceivedMessage);
if (out.is_open()) {
// out.write(m_strReceivedMessage.c_str(), m_strReceivedMessage.size());
//out << m_strReceivedMessage;
out.write(reinterpret_cast<char *>(&m_strReceivedMessage), m_strReceivedMessage.size());
}
You're printing the memory of the std::string object, rather than the character buffer that it contains. To get a pointer to the character buffer, see the data() member function. Hint: The fact that you need to cast std::string* using reinterpret_cast<char*> is a dead giveaway that you're doing something very wrong.
Also, I'm not familiar with the PDF spec, but I suspect that it may possibly contain nul bytes. And depending on how you get your std::string, it's possible you may have missed any content after the first nul. std::vector<char> would be more appropriate way to store binary data.

Writing struct of vector to a binary file in c++

I have a struct and I would like to write it to a binary file (c++ / visual studio 2008).
The struct is:
struct DataItem
{
std::string tag;
std::vector<int> data_block;
DataItem(): data_block(1024 * 1024){}
};
I am filling tha data_block vector with random values:
DataItem createSampleData ()
{
DataItem data;
std::srand(std::time(NULL));
std::generate(data.data_block.begin(), data.data_block.end(), std::rand);
data.tag = "test";
return data;
}
And trying to write the struct to file:
void writeData (DataItem data, long fileName)
{
ostringstream ss;
ss << fileName;
string s(ss.str());
s += ".bin";
char szPathedFileName[MAX_PATH] = {0};
strcat(szPathedFileName,ROOT_DIR);
strcat(szPathedFileName,s.c_str());
ofstream f(szPathedFileName, ios::out | ios::binary | ios::app);
// ******* first I tried to write this way then one by one
//f.write(reinterpret_cast<char *>(&data), sizeof(data));
// *******************************************************
f.write(reinterpret_cast<const char *>(&data.tag), sizeof(data.tag));
f.write(reinterpret_cast<const char *>(&data.data_block), sizeof(data.data_block));
f.close();
}
And the main is:
int main()
{
DataItem data = createSampleData();
for (int i=0; i<5; i++) {
writeData(data,i);
}
}
So I expect a file size at least (1024 * 1024) * 4 (for vector)+ 48 (for tag) but it just writes the tag to the file and creates 1KB file to hard drive.
I can see the contents in while I'm debugging but it doesn't write it to file...
What's wrong with this code, why can't I write the strcut to vector to file? Is there a better/faster or probably efficient way to write it?
Do I have to serialize the data?
Thanks...
Casting a std::string to char * will not produce the result you expect. Neither will using sizeof on it. The same for a std::vector.
For the vector you need to use either the std::vector::data method, or using e.g. &data.data_block[0]. As for the size, use data.data_block.size() * sizeof(int).
Writing the string is another matter though, especially if it can be of variable length. You either have to write it as a fixed-length string, or write the length (in a fixed-size format) followed by the actual string, or write a terminator at the end of the string. To get a C-style pointer to the string use std::string::c_str.
Welcome to the merry world of C++ std::
Basically, vectors are meant to be used as opaque containers.
You can forget about reinterpret_cast right away.
Trying to shut the compiler up will allow you to create an executable, but it will produce silly results.
Basically, you can forget about most of the std::vector syntactic sugar that has to do with iterators, since your fstream will not access binary data through them (it would output a textual representation of your data).
But all is not lost.
You can access the vector underlying array using the newly (C++11) introduced .data() method, though that defeats the point of using an opaque type.
const int * raw_ptr = data.data_block.data();
that will gain you 100 points of cool factor instead of using the puny
const int * raw_ptr = &data.data_block.data[0];
You could also use the even more cryptic &data.data_block.front() for a cool factor bonus of 50 points.
You can then write your glob of ints in one go:
f.write (raw_ptr, sizeof (raw_ptr[0])*data.data_block.size());
Now if you want to do something really too simple, try this:
for (int i = 0 ; i != data.data_block.size() ; i++)
f.write (&data.data_block[i], sizeof (data.data_block[i]));
This will consume a few more microseconds, which will be lost in background noise since the disk I/O will take much more time to complete the write.
Totally not cool, though.

How to push binary data into std::string?

Im trying to create a binary file in the following way:
string buf;
...
buf += filename.length();
buf += filename;
etc. So first i give the length in binary format, but how do i convert this into a 4 byte char array, or 2 byte etc? basically i want to achieve the same functionality as this would:
int len = filename.length();
fwrite(&len, sizeof(len), 1, fp);
Which works fine, but having it in one string might be easier to process.
Edit: i dont want to use streams, nor vector, im trying to find out if its possible with strings.
Streams are the way to go. Not strings.
Use a vector for holding the data, or write it straight to the file (via streams)
simply use std:vector<unsigned char> and use a istream or ostream iterator to read/write data to/from the vector. For instance to read from a file you can do:
vector<unsigned char> binary_buffer;
ifstream in_file("my_binary_file.bin", ios_base::binary | ios_base::in);
istream_iterator<unsigned char> end_of_file;
istream_iterator<unsigned char> in_file_iter(in_file);
while (in_file_iter != end_of_file)
{
binary_buffer.push_back(*in_file_iter++);
}
Output would be even simpler:
ofstream out_file("another_binary_file.bin", ios_base::binary | ios_base::out);
ostream_iterator<unsigned char> binary_output(out_file);
copy(binary_buffer.begin(), binary_buffer.end(), binary_output);
Yes, it is possible to do this, because this is C++, and everything is possible in C++.
First, here is how you do it, then I'll answer the question of why you might need to do it:
std::ifstream input( "my.png", std::ios::binary );
std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(input), {});
int buffSize = buffer.size();
std::string myString(buffer.begin(), buffer.end());
In my case, I was using a framework where the HTTP Client only supported a string for post message body, but I needed to post raw binary of a file to the service I was using. I could either mess around with the internals of the library (bad) or introduce another unnecessary dependency for a simple file post (also not so good).
But since strings can hold this data, I was able to read the file, copy it to a string and then upload.
This is an unfortunate situation to be in, either way, but in some cases it is helpful to be able to do. Usually you would want to do something different. This isn't the clearest for someone else to read, and you will have some performance penalties from the copy, but it works; and it helps to understand the internals of why it works. std::string saves the data contiguously internally in binary format, like vector does, and can be initialized from iterators in the vector.
unsigned char is one byte long, so it is like a byte value. The string copies the bytes into itself, so you end up with the exact data.

Fastest way to write large STL vector to file using STL

I have a large vector (10^9 elements) of chars, and I was wondering what is the fastest way to write such vector to a file. So far I've been using next code:
vector<char> vs;
// ... Fill vector with data
ofstream outfile("nanocube.txt", ios::out | ios::binary);
ostream_iterator<char> oi(outfile, '\0');
copy(vs.begin(), vs.end(), oi);
For this code it takes approximately two minutes to write all data to file. The actual question is: "Can I make it faster using STL and how"?
With such a large amount of data to be written (~1GB), you should write to the output stream directly, rather than using an output iterator. Since the data in a vector is stored contiguously, this will work and should be much faster.
ofstream outfile("nanocube.txt", ios::out | ios::binary);
outfile.write(&vs[0], vs.size());
There is a slight conceptual error with your second argument to ostream_iterator's constructor. It should be NULL pointer, if you don't want a delimiter (although, luckily for you, this will be treated as such implicitly), or the second argument should be omitted.
However, this means that after writing each character, the code needs to check for the pointer designating the delimiter (which might be somewhat inefficient).
I think, if you want to go with iterators, perhaps you could try ostreambuf_iterator.
Other options might include using the write() method (if it can handle output this large, or perhaps output it in chunks), and perhaps OS-specific output functions.
Since your data is contiguous in memory (as Charles said), you can use low level I/O. On Unix or Linux, you can do your write to a file descriptor. On Windows XP, use file handles. (It's a little trickier on XP, but well documented in MSDN.)
XP is a little funny about buffering. If you write a 1GB block to a handle, it will be slower than if you break the write up into smaller transfer sizes (in a loop). I've found the 256KB writes are most efficient. Once you've written the loop, you can play around with this and see what's the fastest transfer size.
OK, I did write method implementation with for loop that writes 256KB blocks (as Rob suggested) of data at each iteration and result is 16 seconds, so problem solved. This is my humble implementation so feel free to comment:
void writeCubeToFile(const vector<char> &vs)
{
const unsigned int blocksize = 262144;
unsigned long blocks = distance(vs.begin(), vs.end()) / blocksize;
ofstream outfile("nanocube.txt", ios::out | ios::binary);
for(unsigned long i = 0; i <= blocks; i++)
{
unsigned long position = blocksize * i;
if(blocksize > distance(vs.begin() + position, vs.end())) outfile.write(&*(vs.begin() + position), distance(vs.begin() + position, vs.end()));
else outfile.write(&*(vs.begin() + position), blocksize);
}
outfile.write("\0", 1);
outfile.close();
}
Thnx to all of you.
If you have other structure this method is still valid.
For example:
typedef std::pair<int,int> STL_Edge;
vector<STL_Edge> v;
void write_file(const char * path){
ofstream outfile(path, ios::out | ios::binary);
outfile.write((const char *)&v.front(), v.size()*sizeof(STL_Edge));
}
void read_file(const char * path,int reserveSpaceForEntries){
ifstream infile(path, ios::in | ios::binary);
v.resize(reserveSpaceForEntries);
infile.read((char *)&v.front(), v.size()*sizeof(STL_Edge));
}
Instead of writing via the file i/o methods, you could try to create a memory-mapped file, and then copy the vector to the memory-mapped file using memcpy.
Use the write method on it, it is in ram after all and you have contigous memory.. Fastest, while looking for flexibility later? Lose the built-in buffering, hint sequential i/o, lose the hidden things of iterator/utility, avoid streambuf when you can but do get dirty with boost::asio ..