Read binary bytes beginning at position N within ifstream? - c++

I am writing an unknown number of structs to a binary file and then reinterpret_cast-ing the bytes back to the struct. I know how to write the bytes.
I am unsure how to iterate over the binary file. I would like to use std::ifstream. At some point I must need to increment a file pointer/index by sizeof(struct) bytes, but the only examples (of reading binary in to structs) I could find online were writing N structs and then reading N structs, they were not looping over the file, incrementing any file index.
Pseudo code of what I would like to achieve is:
std::ifstream file("test.txt", std::ifstream::binary);
const size_t fileLength = file.size();
size_t pos = 0;
while(pos < fileLength)
{
MyStruct* ms = &(reinterpret_cast<MyStruct&>(&file[pos]));
// Do whatever with my struct
pos += sizeof(MyStruct);
}
UPDATE:
My struct is POD

#include <fstream>
struct MyStruct{};
int main()
{
std::ifstream file("test.txt", std::ifstream::binary);
MyStruct ms;
//Evaluates to false if anything wrong happened.
while(file.read(reinterpret_cast<char*>(&ms),sizeof ms))
{
// Do whatever with my struct
}
if(file.eof())
;//Successfully iterated over the whole file
}
Please be sure not to do something like this:
char buffer[sizeof(MyStruct)];
file.read(buffer,sizeof(MyStruct));
//...
MyStruct* myStruct = reinterpret_cast<MyStruct*>(buffer);
It will likely work, but it breaks the aliasing rule and is undefined behaviour. If you truly need a buffer ( e.g. for small files it might be faster to read the whole file into the memory first and then iterate over that buffer) then the correct way is:
char buffer[sizeof(MyStruct)];
file.read(buffer,sizeof(MyStruct));
//...
MyStruct myStruct;
std::memcpy(&myStruct,buffer,sizeof myStruct);

Related

Switching array to vector inside struct, how to handle file io byte alignment

We used to have a structure:
const int NUMBER_OF_ELEMENTS = 10;
struct myStruct1
{
uint32_t var1;
uint64_t var2;
uint32_t elements[NUMBER_OF_ELEMENTS];
};
However, going forward we want the number of elements to be variable. I think I would best do this as:
struct myStruct2
{
uint32_t var1;
uint64_t var2;
std::vector<uint32_t> elements;
myStruct2(int len){ elements.resize(len);};
};
For reading/writing from a file, we used to simply do:
myStruct1 ms1;
std::ofstream outfile(FILENAME,std::ofstream::out | std::ofstream::binary);
outfile.write((const char*)&ms1,sizeof(myStruct1));
outfile.close();
myStruct1 msread1;
std::ifstream infile(FILENAME, std::ifstream::in | std::ifstream::binary);
infile.read((char *)&msread1, sizeof(myStruct1));
infile.close();
Obviously I can't do that anymore for the vector version. So, I would have to read element by element.
myStruct2 msread2(NUMBER_OF_ELEMENTS);
std::ifstream infile(FILENAME, std::ifstream::in | std::ifstream::binary);
infile.read((char *)&msread2.var1, sizeof(uint32_t));
infile.read((char *)&msread2.var2, sizeof(uint64_t));
for (int i=0; i<NUMBER_OF_ELEMENTS; i++)
{
infile.read((char *)&msread2.elements[i], sizeof(uint32_t));
}
infile.close();
However, this runs into the problem of byte alignment... there is 4 bytes of padding after var1 in the struct (and in the file). Reading var1 then var2 doesn't skip this pad.
I could use #pragma pack(1) both for writing and for reading. However I would like the reader to be compatible with old files which were created with the padding.
I could manually have a seek of 4 bytes (or read a dummy 4 byte variable) after reading var1. But I feel there's probably better ways.
I could put var1 and var2 in their own struct within myStruct1 or myStruct2, and read them together, maybe this is a bit cleaner for io but then accessing them would have an extra step eg ms1.headvars.var1 instead of just ms1.var1. (more changes throughout the codebase)
Any recommendations on a nicer solution?
You cannot simply write a vector binary to disk; it contains internally the size, used, and a pointer to the real data which is dynamically allocated (and whatever else the compiler-maker enjoyed to put there; in any sequence). You would get at best the pointer value written into your file.
Instead, you need to write the size, and then its content in two steps: v.size(), and v.data().
When reading them back, you read first the size, prepare a vector, and then read the data back: v.resize(size);, then read into v.data() [yes, you can write into v.data()!]

Weird behavior writing/reading simple binary file

I'm writing and reading on a binary file. I'm getting small errors when outputting the reads.
The strings are there but with little snippets like: (I"�U) (�U) appended to the end of ~30% of them
I'm using g++ compiler on Ubuntu
Simplified code:
struct Db_connection
{
public:
string name;
}
int Db_connection::write_config()
{
ofstream config_f("config.dat", std::ios_base::binary | std::ios_base::out); //open file
string str = name;
int size = str.length();
config_f.write(reinterpret_cast<char *>(&size), sizeof(int)); // write size of string in int size chunk
config_f.write(str.c_str(), size); //write string
config_f.close();
return 0;
}
Db_connection read_config()
{
ifstream config_f("config.dat", std::ios_base::binary | std::ios_base::in);
Db_connection return_obj;
int size;
string data;
config_f.read(reinterpret_cast<char *>(&size), sizeof(int)); // read string size
char buffer[size];
config_f.read(buffer, size); // read string
data.assign(buffer);
return_obj.name = data;
return return_obj;
}
Is there anything obvious I am messing up? Does this have to do with Endian? I tried to minimize the code to it's absolute essentials
The actual code is more complex. I have a class holding vectors of 2 structs. 1 struct has four string members and the other has a string and bool. These fuctions are actually a member of and return (respectively) that class. The fuctions loop through the vectors writing struct members sequentially.
Two oddities:
To debug, I added outputs of the size and data variables on each iteration in both the read and write functions. size comes out accurate and consistent on both sides. data is accurate on the write side but with the weird special characters on the read side. I'm looking at outputs like:
Read Size: 12
Data: random addy2�U //the 12 human readable chars are there but with 2 extra symbols
The final chunk of data (a bool) comes out fine every time, so I don't think there is a file pointer issue. If its relevant: every bool and int is fine. Its just a portion of the strings.
Hopefully i'm making a bonehead mistake and this minimized code can be critiqued. The actual example would be too long.
Big thanks to WhozCraig,
The following edit did, indeed, work:
Db_connection read_config()
{
ifstream config_f("config.dat", std::ios_base::binary | std::ios_base::in);
Db_connection return_obj;
int size;
string data;
config_f.read(reinterpret_cast<char *>(&size), sizeof(int)); // read string size
vector<char> buff(size);
config_f.read(buff.data(), size);
data = string(buff.begin(), buff.end());
return_obj.name = data;
return return_obj;
}
As paddy pointed out directly and WhozCraig alluded to, this code still needs to implement a standardized, portable data type for recording the integer properly into binary and the write function needs to be rethought as well.
Thank you very much to the both of you. I read like 5-8 top search results for "cpp binary i/o" before writing my code and still ended up with that mess. You guys saved me hours/days of my life.

Dynamic allocation of a vector inside a struct while reading a large txt file

I am currently learning the C ++ language and need to read a file containing more than 5000 double type numbers. Since push_back will make a copy while allocating new data, I was trying to figure out a way to decrease computational work. Note that the file may contain a random number of double types, so allocating memory by specifying a large enough vector is not the solution looking for.
My idea would be to quickly read the whole file and get and approximation size of the array. In Save & read double vector from file C++? found an interesting idea that can be found in the code below.
Basically, the vector containing the file data is inserted in a structure type named PathStruct. Bear in mind that the PathStruct contains more that this vector, but for the sake of simplicity I deleted all the rest. The function receives a reference of the PathStruct pointer and read the file.
struct PathStruct
{
std::vector<double> trivial_vector;
};
bool getFileContent(PathStruct *&path)
{
std::ifstream filename("simplePath.txt", std::ios::in | std::ifstream::binary);
if (!filename.good())
return false;
std::vector<char> buffer{};
std::istreambuf_iterator<char> iter(filename);
std::istreambuf_iterator<char> end{};
std::copy(iter, end, std::back_inserter(buffer));
path->trivial_vector.reserve(buffer.size() / sizeof(double));
memcpy(&path->trivial_vector[0], &buffer[0], buffer.size());
return true;
};
int main(int argc, char **argv)
{
PathStruct *path = new PathStruct;
const int result = getFileContent(path);
return 0;
}
When I run the code, the compiler give the following error:
corrupted size vs. prev_size, Aborted (core dumped).
I believe my problem in the incorrect use of pointer. Is definitely not my strongest point, but I cannot find the problem. I hope someone could help out this poor soul.
If your file contains only consecutive double values, you can check the file size and divide it by double size. To determine the file size you can use std::filesystem::file_size but this function is available from C++ 17. If you cannot use C++ 17, you can find other methods for determining file size here
auto fileName = "file.bin";
auto fileSize = std::filesystem::file_size(fileName);
std::ifstream inputFile("file.bin", std::ios::binary);
std::vector<double> values;
values.reserve(fileSize / sizeof(double));
double val;
while(inputFile.read(reinterpret_cast<char*>(&val), sizeof(double)))
{
values.push_back(val);
}
or using pointers:
auto numberOfValues = fileSize / sizeof(double);
std::vector<double> values(numberOfValues);
// Notice that I pass numberOfValues * sizeof(double) as a number of bytes to read instead of fileSize
// because the fileSize may be not divisable by sizeof(double)
inputFile.read(reinterpret_cast<char*>(values.data()), numberOfValues * sizeof(double));
Alternative
If you can modify the file structure, you can add a number of double values at the beginning of the file and read this number before reading double values. This way you will always know the number of values to read, without checking file size.
Alternative 2
You can also change a container from std::vector to std::deque. This container is similar to std::vector, but instead of keeping a single buffer for data, it has may smaller array. If you are inserting data and the array is full, the additional array will be allocated and linked without copying previous data.
This has however a small price, data access requires two pointer dereferences instead of one.

Writing struct of vector to a binary file in c++

I have a struct and I would like to write it to a binary file (c++ / visual studio 2008).
The struct is:
struct DataItem
{
std::string tag;
std::vector<int> data_block;
DataItem(): data_block(1024 * 1024){}
};
I am filling tha data_block vector with random values:
DataItem createSampleData ()
{
DataItem data;
std::srand(std::time(NULL));
std::generate(data.data_block.begin(), data.data_block.end(), std::rand);
data.tag = "test";
return data;
}
And trying to write the struct to file:
void writeData (DataItem data, long fileName)
{
ostringstream ss;
ss << fileName;
string s(ss.str());
s += ".bin";
char szPathedFileName[MAX_PATH] = {0};
strcat(szPathedFileName,ROOT_DIR);
strcat(szPathedFileName,s.c_str());
ofstream f(szPathedFileName, ios::out | ios::binary | ios::app);
// ******* first I tried to write this way then one by one
//f.write(reinterpret_cast<char *>(&data), sizeof(data));
// *******************************************************
f.write(reinterpret_cast<const char *>(&data.tag), sizeof(data.tag));
f.write(reinterpret_cast<const char *>(&data.data_block), sizeof(data.data_block));
f.close();
}
And the main is:
int main()
{
DataItem data = createSampleData();
for (int i=0; i<5; i++) {
writeData(data,i);
}
}
So I expect a file size at least (1024 * 1024) * 4 (for vector)+ 48 (for tag) but it just writes the tag to the file and creates 1KB file to hard drive.
I can see the contents in while I'm debugging but it doesn't write it to file...
What's wrong with this code, why can't I write the strcut to vector to file? Is there a better/faster or probably efficient way to write it?
Do I have to serialize the data?
Thanks...
Casting a std::string to char * will not produce the result you expect. Neither will using sizeof on it. The same for a std::vector.
For the vector you need to use either the std::vector::data method, or using e.g. &data.data_block[0]. As for the size, use data.data_block.size() * sizeof(int).
Writing the string is another matter though, especially if it can be of variable length. You either have to write it as a fixed-length string, or write the length (in a fixed-size format) followed by the actual string, or write a terminator at the end of the string. To get a C-style pointer to the string use std::string::c_str.
Welcome to the merry world of C++ std::
Basically, vectors are meant to be used as opaque containers.
You can forget about reinterpret_cast right away.
Trying to shut the compiler up will allow you to create an executable, but it will produce silly results.
Basically, you can forget about most of the std::vector syntactic sugar that has to do with iterators, since your fstream will not access binary data through them (it would output a textual representation of your data).
But all is not lost.
You can access the vector underlying array using the newly (C++11) introduced .data() method, though that defeats the point of using an opaque type.
const int * raw_ptr = data.data_block.data();
that will gain you 100 points of cool factor instead of using the puny
const int * raw_ptr = &data.data_block.data[0];
You could also use the even more cryptic &data.data_block.front() for a cool factor bonus of 50 points.
You can then write your glob of ints in one go:
f.write (raw_ptr, sizeof (raw_ptr[0])*data.data_block.size());
Now if you want to do something really too simple, try this:
for (int i = 0 ; i != data.data_block.size() ; i++)
f.write (&data.data_block[i], sizeof (data.data_block[i]));
This will consume a few more microseconds, which will be lost in background noise since the disk I/O will take much more time to complete the write.
Totally not cool, though.

C++ read()-ing from a socket to an ofstream

Is there a C/C++ way to read data from a socket using read() and having the receiving buffer be a file (ofstream) or a similar self-extending object (vector e.g.)?
EDIT: The question arose while I contemplated how to read a stream socket that may receive the contents of a, say 10000+ byte file. I just never did like putting 20000 or 50000 bytes (large enough for now) on the stack as a buffer where the file could be stored temporarily till I could stick in into a file. Why not just stream it directly into the file to star with.
Much like you can get at the char* inside a std:string, I thought of something like
read( int fd, outFile.front(), std::npos ); // npos = INT_MAX
or something like that.
end edit
Thanks.
This is simplistic, and off the top of my fingers, but I think something along these lines would work out:
template <unsigned BUF_SIZE>
struct Buffer {
char buf_[BUF_SIZE];
int len_;
Buffer () : buf_(), len_(0) {}
int read (int fd) {
int r = read(fd, buf_ + len_, BUF_SIZE - len_);
if (r > 0) len_ += r;
return r;
}
int capacity () const { return BUF_SIZE - len_; }
}
template <unsigned BUF_SIZE>
struct BufferStream {
typedef std::unique_ptr< Buffer<BUF_SIZE> > BufferPtr;
std::vector<BufferPtr> stream_;
BufferStream () : stream_(1, BufferPtr(new Buffer<BUF_SIZE>)) {}
int read (int fd) {
if ((*stream_.rbegin())->capacity() == 0)
stream_.push_back(BufferPtr(new Buffer<BUF_SIZE>));
return (*stream_.rbegin())->read(fd);
}
};
In a comment, you mentioned you wanted to avoid creating a big char buffer. When using the read system call, it is generally more efficient to perform a few large reads rather than many small ones. So most implementations will opt for large input buffers to gain that efficiency. You could implement something like:
std::vector<char> input;
char in;
int r;
while ((r = read(fd, &in, 1)) == 1) input.push_back(in);
But that would involve a system call and at least one byte copied for every byte of input. In contrast, the code I put forth avoids extra data copies.
I don't really expect the code I put out to be the solution you would adopt. I just wanted to provide you with an illustration of how to create a self-extending object that was fairly space and time efficient. Depending on your purposes, you may want to extend it, or write your own. Off the top of my head, some improvements may be:
use std::list instead, to avoid vector resizing
allow API a parameter to specify how many bytes to read
use readv to always allow at least BUF_SIZE bytes (or more than BUF_SIZE bytes) to be read at a time
Take a look at stream support in boost::asio.