Iterator for read binary file - c++

I have to read some binary file in blocks of 8 bytes and then send those blocks by tcp socket.
Can I use C++ iterator for this task? Like:
FileIterator file("name_file.bin");
for(iter = file.begin(); iter != file.end(); iter++) {
sendTcp(iter);
}
Class FileIterator has to return some struct which will be sent.
In constructor of FileIterator I open binary file and read it. Then I create dinamic array and write in it file's content. And in each step iterator I have to read next block from array and write it in struct and return.

Yes you can!
You can use fstream with istream_iterator, like so:
auto f = std::ifstream("lol.bin", std::ios::binary | std::ios::in);
f.exceptions(std::ios::badbit);
for (auto start = std::istream_iterator<char>{ f }, end = std::istream_iterator<char>{}; start != end; ++start)
{
...
}
Edit:
I haven't notice you asked for 8 bytes block. The way you can solve it is like this:
First define an operator>> for example:
struct My8Bytes {
char bytes[8];
};
std::istream& operator>>(std::istream& s, My8Bytes& bytes) {
s.read(bytes.bytes, sizeof(bytes.bytes));
return s;
}
and than use the the iterator the same way as before, only now with your specific type:
for (auto start = std::istream_iterator<My8Bytes>{ f }, end = std::istream_iterator<My8Bytes>{}; start != end; ++start)
{
...
}

I see this as an X-Y problem. Yes, it can be done with an iterator, but iterators aren't the best fit solution for this job. Using an iterator for this is an interesting educational experience, but going old school solves this problem with almost zero fuss and much easier error resolution.
#include <iostream>
#include <fstream>
// simple 8 byte struct
struct EightByteStruct
{
uint32_t a;
uint32_t b;
};
// quick hack send routine. Added capacity for some simple error checking.
bool sendTcp(EightByteStruct & test)
{
bool rval = false;
// send test. Set rval true if success
return rval;
}
//main event: read file into struct, write struct to socket
int main()
{
std::ifstream in("filename", std::ios::binary);
EightByteStruct test;
while (in.read((char*)&test, sizeof(test)))
{ // will not enter if sizeof(test) bytes not read from file
if (sendTcp(test))
{
// handle send error
}
}
// test here for any file error conditions you wish to have special handling
}

Related

C, C++ extract struct member from binary file

I'm using the following code to extract a struct member from a binary file.
I'm wondering why this prints out multiple times? when there is only one ID record, and only one struct in the file. I need to access just this member, what is the best way to do it?
I don't really understand what the while loop is doing? Is it testing for whether the file is open and returning 1 until that point?
Why use fread inside the while loop?
Does the fread need to be set to the specific size of the struct member?
Is the printf statement reading the binary and outputting an int?
FILE *p;
struct myStruct x;
p=fopen("myfile","rb");
while(1) {
size_t n = fread(&x, sizeof(x), 1, p);
if (n == 0) {
break;
}
printf("\n\nID:%d", x.ID); // Use matching specifier
fflush(stdout); // Insure output occurs promptly
}
fclose(p);
return 0;
The struct looks like this:
struct myStruct
{
int cm;
int bytes;
int ID;
int version;
char chunk[1];
}
Not really an answer but to answer a comment.
Just do
FILE *p = fopen("myfile","rb");
struct myStruct x;
size_t n = fread(&x, sizeof(x), 1, p);
if (n != 1) {
// Some error message
} else {
printf("\n\nID:%d\n", x.ID);
}
...Do as you wish with the rest of the file
I'm wondering why this prints out multiple times? when there is only one ID record, and only one struct in the file.
It won't! So if you have multiple prints the likely explanation is that the file contains more than just one struct. Another explanation could be that the file (aka the struct) was not saved in the same way as you use for reading.
I need to access just this member, what is the best way to do it?
Your approach looks fine to me.
I don't really understand what the while loop is doing?
The while is there because the code should be able to read multiple structs from the file. Using while(1) means something like "loop forever". To get out of such a loop, you use break. In your code the break happens when it can't read more structs from the file, i.e. if (n == 0) { break; }
Is it testing for whether the file is open and returning 1 until that point?
No - see answer above.
Why use fread inside the while loop?
As above: To able to read multiple structs from the file
Does the fread need to be set to the specific size of the struct member?
Well, fread is not "set" to anything. It is told how many elements to read and the size of each element. Therefore you call it with sizeof(x).
Is the printf statement reading the binary and outputting an int?
No, the reading is done by fread. Yes, printf outputs the decimal value.
You can try out this code:
#include <stdio.h>
#include <unistd.h>
struct myStruct
{
int cm;
int bytes;
int ID;
int version;
char chunk[1];
};
void rr()
{
printf("Reading file\n");
FILE *p;
struct myStruct x;
p=fopen("somefile","rb");
while(1) {
size_t n = fread(&x, sizeof(x), 1, p);
if (n == 0) {
break;
}
printf("\n\nID:%d", x.ID); // Use matching specifier
fflush(stdout); // Insure output occurs promptly
}
fclose(p);
}
void ww()
{
printf("Creating file containing a single struct\n");
FILE *p;
struct myStruct x;
x.cm = 1;
x.bytes = 2;
x.ID = 3;
x.version = 4;
x.chunk[0] = 'a';
p=fopen("somefile","wb");
fwrite(&x, sizeof(x), 1, p);
fclose(p);
}
int main(void) {
if( access( "somefile", F_OK ) == -1 )
{
// If "somefile" isn't there already, call ww to create it
ww();
}
rr();
return 0;
}
Answers in-line
I'm wondering why this prints out multiple times? when there is only one ID record, and only one struct in the file. I need to access just this member, what is the best way to do it?
The file size is 2906 bytes and fread is only reading sone 17 bytes at a time, and this goes on in a loop
I don't really understand what the while loop is doing? Is it testing for whether the file is open and returning 1 until that point?
The total number of elements successfully read is returned by fread
Why use fread inside the while loop?
In this case while is not necessary. just one fread is enough. Fread is sometimes used in a while loop when input from some other source like UART is being processed and the program has to wait for the said number of bytes t be read
Does the fread need to be set to the specific size of the struct member?
No. Reading the entire struct is better
Is the printf statement reading the binary and outputting an int?
No

Byte output to binary file C++

I'm writing Huffman coding and everything was OK, until I tried to save the result into the archived file. Our teacher offered us to do it with such function (it takes each time a bit and after taking 8 of them should output a byte):
long buff=0;
int counter=0;
std::ofstream out("output", std::iostream::binary);
void putbit(bool b)
{
buff<<=1;
if (b) buff++;
counter++;
if (counter>=8)
{
out.put(buff);
counter=0;
buff=0;
}
}
I tried an example with inputting sequence of bits like this:
0011001011001101111010010001000001010101101100
but the output file in binary mode includes just: 1111111
As buff variable has the correct numbers (25 102 250 68 21 108) I suggested that I wrote the code in my notebook incorrectly and something is wrong with this line:
out.put(buff);
I tried to remove it with this line:
out << buff;
but got: 1111111111111111
Another way was:
out.write((char *) &buff, 8);
which gives:
100000001000000010000000100000001000000010000000
It look like the closest to the correct answer, but still doesn't work correctly.
Maybe I don't understand something about file output.
Question:
Could you explain me how to make it work and why previous variants are wrong?
UPD:
The input comes from this function:
void code(std::vector<bool> cur, std::vector<bool> sh, std::vector<bool>* codes, Node* r)
{
if (r->l)
{
cur.push_back(0);
if (r->l->symb)
{
putbit(0);
codes[(int)r->l->symb] = cur;
for (int i=7; i>=0; i--)
{
if ((int)r->l->symb & (1 << i))
putbit(1);
else putbit(0);
}
}
else
{
putbit(0);
code(cur, sh, codes, r->l);
}
cur.pop_back();
}
if (r->r)
{
cur.push_back(1);
if (r->r->symb)
{
putbit(1);
codes[(int)r->r->symb] = cur;
for (int i=7; i>=0; i--)
{
if ((int)r->r->symb & (1 << i))
putbit(1);
else putbit(0);
}
}
else
{
putbit(1);
code(cur, sh, codes, r->r);
}
cur.pop_back();
}
}
The thing is, your putbit function is working (though its terrible, you use globals and your buffer should be a char).
For example, this is how I tested your function.
out.open( "outfile", std::ios::binary );
if ( out.is_open() ) {
putbit(1);
putbit(1);
putbit(0);
putbit(1);
putbit(0);
putbit(1);
putbit(0);
putbit(0);
out.close();
}
This should ouput 1101 0100 or d4 in hex.
I believe this an XY problem. The problem you're trying to solve is not in the putbit function but rather on the way you use it and in your algorithm.
You said that you had the right values before putting your data to the output file. There are many similar questions to your in stackoverflow, just look for them.
The real problem is that the putbit function is not enough to solve your problems. You rely of the fact that it will write a byte after you call it 8 times. What if you write less than 8 bytes? Also, you never flush your file (at least in the code you posted) so there's no guarantee that all data will be written.
First you must understand how file handles (streams) work. Open your file locally, check if it's open and close it when you're done. Closing also guarantees that all data in the file buffer is written to the file.
outfile.open( "output", std::ios::binary );
if ( outfile.is_open() ) {
// ... use file ...
outfile.close();
}
else {
// Couldnt open file!
}
Other questions solve this by writing, or using, a BitStream. It would look somewhat like this,
class OutBitstream {
public:
OutBitstream();
~OutBitstream(); // close file
bool isOpen();
void open( const std::string &file );
void close(); // close file, also write pending bits
void writeBit( bool b ); // or putbit, use the names you prefer
void writeByte( char c );
void writePendingBits(); // write bits in the buffer they may
// be less than 8 so you may have to do some padding
private:
std::ofstream _out;
char _bitBuffer; //or std::bitset<8>
int _numbits;
};
With this interface it should be easier to handle bit input. No globals as well. I hope that helps.

How to read multiple structs from a binary file

I have written two instances ck1,ck2 of a struct named Cookie and have saved them in a binary file named "mydat" by calling a function :
bool s_cookie(Cookie myck,std::string fname) {
std::ofstream ofs(fname,std::ios::binary | std::ios::app);
if(!ofs) return false;
ofs.write((char *) &myck, sizeof(Cookie));
ofs.close();
return true;
}
of course myck can be ck1, ck2, etc, and fname reps the "mydat" binary file. So the two structs have both been saved in the same file.
Now I want to read them back into ck3 and ck4 respectively. How do i do that? Cookie looks like this :
struct Cookie {
std::string name;
std::string value;
unsigned short duration;
bool expired;
};
Thanks
Something like writing, but read them, if Cookie is a POD:
std::ifstream ifs(fname,std::ios::binary);
Cookie ck3, ck4;
ifs.read((char *) &ck3, sizeof(Cookie));
ifs.read((char *) &ck4, sizeof(Cookie));
Also, you should check the result of each opening and reading operation and handle them.
Update: After your update and seeing the Cookie, you can not simply write it into a file. You should serialize it or make a well-defined protocol to read/write data.
A simple workaround is (read the comment):
// Assume name and value are not longer that 99
// and you don't care about wasted space in the file
struct CookiePOD {
CookiePOD(const Cookie &p)
{
// I ignored bound checking !
std::copy(p.name.begin(), p.name.end(), name);
name[p.name.size()] = 0;
std::copy(p.value.begin(), p.value.end(), value);
value[p.value.size()] = 0;
duration = p.duration;
expired = p.expired;
}
char name[100];
char value[100];
unsigned short duration;
bool expired;
};
And then try to read/write CookiePOD instead of Cookie.

Load a formatted binary file and assign information to structure c++

I've finally figured out how to write some specifically formatted information to a binary file, but now my problem is reading it back and building it back the way it originally was.
Here is my function to write the data:
void save_disk(disk aDisk)
{
ofstream myfile("disk01", ios::out | ios::binary);
int32_t entries;
entries = (int32_t) aDisk.current_file.size();
char buffer[10];
sprintf(buffer, "%d",entries);
myfile.write(buffer, sizeof(int32_t));
std::for_each(aDisk.current_file.begin(), aDisk.current_file.end(), [&] (const file_node& aFile)
{
myfile.write(aFile.name, MAX_FILE_NAME);
myfile.write(aFile.data, BLOCK_SIZE - MAX_FILE_NAME);
});
}
and my structure that it originally was created with and what I want to load it back into is composed as follows.
struct file_node
{
char name[MAX_FILE_NAME];
char data[BLOCK_SIZE - MAX_FILE_NAME];
file_node(){};
};
struct disk
{
vector<file_node> current_file;
};
I don't really know how to read it back in so that it is arranged the same way, but here is my pathetic attempt anyway (I just tried to reverse what I did for saving):
void load_disk(disk aDisk)
{
ifstream myFile("disk01", ios::in | ios::binary);
char buffer[10];
myFile.read(buffer, sizeof(int32_t));
std::for_each(aDisk.current_file.begin(), aDisk.current_file.end(), [&] (file_node& aFile)
{
myFile.read(aFile.name, MAX_FILE_NAME);
myFile.read(aFile.data, BLOCK_SIZE - MAX_FILE_NAME);
});
}
^^ This is absolutely wrong. ^^
I understand the basic operations of the ifstream, but really all I know how to do with it is read in a file of text, anything more complicated than that I'm kind of lost.
Any suggestions on how I can read this in?
You're very close. You need to write and read the length as binary.
This part of your length-write is wrong:
char buffer[10];
sprintf(buffer, "%d",entries);
myfile.write(buffer, sizeof(int32_t));
It only writes the first four bytes of whatever the length is, but the length is character data from a sprintf() call. You need to write this as a binary-value of entries (the integer):
// writing your entry count.
uint32_t entries = (uint32_t)aDisk.current_file.size();
entries = htonl(entries);
myfile.write((char*)&entries, sizeof(entries));
Then on the read:
// reading the entry count
uint32_t entries = 0;
myFile.read((char*)&entries, sizeof(entries));
entries = ntohl(entries);
// Use this to resize your vector; for_each has places to stuff data now.
aDisk.current_file.resize(entries);
std::for_each(aDisk.current_file.begin(), aDisk.current_file.end(), [&] (file_node& aFile)
{
myFile.read(aFile.name, MAX_FILE_NAME);
myFile.read(aFile.data, BLOCK_SIZE - MAX_FILE_NAME);
});
Or something like that.
Note 1: this does NO error checking nor does it account for portability for potentially different endian-ness on different host machines (a big-endian machine writing the file, a little endian machine reading it). Thats probably ok for your needs, but you should at least be aware of it.
Note 2: Pass your input disk parameter to load_disk() by reference:
void load_disk(disk& aDisk)
EDIT Cleaning file_node content on construction
struct file_node
{
char name[MAX_FILE_NAME];
char data[BLOCK_SIZE - MAX_FILE_NAME];
file_node()
{
memset(name, 0, sizeof(name));
memset(data, 0, sizeof(data));
}
};
If you are using a compliant C++11 compiler:
struct file_node
{
char name[MAX_FILE_NAME];
char data[BLOCK_SIZE - MAX_FILE_NAME];
file_node() : name(), data() {}
};

Having trouble serializing binary data using ifstream and ofstream

I am trying to serialize a Plain Old Datastructure using ifstream and ofstream and I wasn't able to get it to work. I then tried to reduce my problem to an ultra basic serialization of just a char and int and even that didn't work. Clearly I'm missing something at a core fundamental level.
For a basic structure:
struct SerializeTestStruct
{
char mCharVal;
unsigned int mIntVal;
void Serialize(std::ofstream& ofs);
};
With serialize function:
void SerializeTestStruct::Serialize(std::ofstream& ofs)
{
bool isError = (false == ofs.good());
if (false == isError)
{
ofs.write((char*)&mCharVal, sizeof(mCharVal));
ofs.write((char*)&mIntVal, sizeof(mIntVal));
}
}
Why would this fail with the following short program?
//ultra basic serialization test.
SerializeTestStruct* testStruct = new SerializeTestStruct();
testStruct->mCharVal = 'y';
testStruct->mIntVal = 9;
//write
std::string testFileName = "test.bin";
std::ofstream fileOut(testFileName.data());
fileOut.open(testFileName.data(), std::ofstream::binary|std::ofstream::out);
fileOut.clear();
testStruct->Serialize(fileOut);
fileOut.flush();
fileOut.close();
delete testStruct;
//read
char * memblock;
std::ifstream fileIn (testFileName.data(), std::ifstream::in|std::ifstream::binary);
if (fileIn.is_open())
{
// get length of file:
fileIn.seekg (0, std::ifstream::end);
int length = fileIn.tellg();
fileIn.seekg (0, std::ifstream::beg);
// allocate memory:
memblock = new char [length];
fileIn.read(memblock, length);
fileIn.close();
// read data as a block:
SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();
delete[] testStruct2;
}
When I run through the code I notice that memblock has a "y" at the top so maybe it is working and it's just a problem with the placement new at the very end? After that placement new I end up with a SerializeTestStruct with values: 0, 0.
Here is how your stuff should read:
#include <fstream>
#include <string>
#include <stdexcept>
struct SerializeTestStruct
{
char mCharVal;
unsigned int mIntVal;
void Serialize(::std::ostream &os);
static SerializeTestStruct Deserialize(::std::istream &is);
};
void SerializeTestStruct::Serialize(std::ostream &os)
{
if (os.good())
{
os.write((char*)&mCharVal, sizeof(mCharVal));
os.write((char*)&mIntVal, sizeof(mIntVal));
}
}
SerializeTestStruct SerializeTestStruct::Deserialize(std::istream &is)
{
SerializeTestStruct retval;
if (is.good())
{
is.read((char*)&retval.mCharVal, sizeof(retval.mCharVal));
is.read((char*)&retval.mIntVal, sizeof(retval.mIntVal));
}
if (is.fail()) {
throw ::std::runtime_error("failed to read full struct");
}
return retval;
}
int main(int argc, const char *argv[])
{
//ultra basic serialization test.
// setup
const ::std::string testFileName = "test.bin";
// write
{
SerializeTestStruct testStruct;
testStruct.mCharVal = 'y';
testStruct.mIntVal = 9;
::std::ofstream fileOut(testFileName.c_str());
fileOut.open(testFileName.c_str(),
std::ofstream::binary|std::ofstream::out);
fileOut.clear();
testStruct.Serialize(fileOut);
}
// read
{
::std::ifstream fileIn (testFileName.c_str(),
std::ifstream::in|std::ifstream::binary);
if (fileIn.is_open())
{
SerializeTestStruct testStruct = \
SerializeTestStruct::Deserialize(fileIn);
::std::cout << "testStruct.mCharVal == '" << testStruct.mCharVal
<< "' && testStruct.mIntVal == " << testStruct.mIntVal
<< '\n';
}
}
return 0;
}
Style issues:
Don't use new to create things if you can help it. Stack allocated objects are usually what you want and significantly easier to manage than the arbitrary lifetime objects you allocate from the heap. If you do use new, consider using a smart pointer type of some kind to help manage the lifetime for you.
Serialization and deserialization code should be matched up so that they can be examined and altered together. This makes maintenance of such code much easier.
Rely on C++ to clean things up for you with destructors, that's what they're for. This means making basic blocks containing parts of your code if it the scopes of the variables used is relatively confined.
Don't needlessly use flags.
Mistakes...
Don't use the data member function of ::std::string.
Using placement new and that memory block thing is really bad idea because it's ridiculously complex. And if you did use it, then you do not use array delete in the way you did. And lastly, it won't work anyway for a reason explained later.
Do not use ofstream in the type taken by your Serialize function as it is a derived class who's features you don't need. You should always use the most base class in a hierarchy that has the features you need unless you have a very specific reason not to. Serialize works fine with the features of the base ostream class, so use that type instead.
The on-disk layout of your structure and the in memory layout do not match, so your placement new technique is doomed to fail. As a rule, if you have a serialize function, you need a matching deserialize function.
Here is a further explanation of your memory layout issue. The structure, in memory, on an x86_64 based Linux box looks like this:
+------------+-----------+
|Byte number | contents |
+============+===========+
| 0 | 0x79 |
| | (aka 'y') |
+------------+-----------+
| 1 | padding |
+------------+-----------+
| 3 | padding |
+------------+-----------+
| 4 | padding |
+------------+-----------+
| 5 | 9 |
+------------+-----------+
| 6 | 0 |
+------------+-----------+
| 7 | 0 |
+------------+-----------+
| 8 | 0 |
+------------+-----------+
The contents of the padding section are undefined, but generally 0. It doesn't matter though because that space is never used and merely exists so that access to the following int lies on an efficient 4-byte boundary.
The size of your structure on disk is 5 bytes, and is completely missing the padding sections. So that means when you read it into memory it won't line up properly with the in memory structure at all and accessing it is likely to cause some kind of horrible problem.
The first rule, if you need a serialize function, you need a deserialize function. Second rule, unless you really know exactly what you are doing, do not dump raw memory into a file. This will work just fine in many cases, but there are important cases in which it won't work. And unless you are aware of what does and doesn't work, and when it does or doesn't work, you will end up code that seems to work OK in certain test situations, but fails miserable when you try to use it in a real system.
My code still does dump memory into a file. And it should work as long as you read the result back on exactly the same architecture and platform with code compiled with the same version of the compiler as when you wrote it. As soon as one of those variables changes, all bets are off.
bool isError = (false == ofs.good());
if (false == isError)
{
ofs.write((char*)&mCharVal, sizeof(mCharVal));
ofs.write((char*)&mIntVal, sizeof(mIntVal));
}
change to
if ( ofs.good() )
{
ofs.write((char*)&mCharVal, sizeof(mCharVal));
ofs.write((char*)&mIntVal, sizeof(mIntVal));
}
I would do:
ostream & operator << ( ostream &os, const SerializeTestStruct &mystruct )
{
if ( ofs.good() )
{
os.write((char*)&mystruct.mCharVal, sizeof(mCharVal));
os.write((char*)&mystruct.mIntVal, sizeof(mIntVal));
}
return os;
}
The problem is here:
SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();
This will construct value-initialized object of type SerializeTestStruct in previously allocated memory. It will fill the memblock with zeros, since value-initialization is zero-initialization for POD-types (more info).
Here's fast fix for your code:
SerializeTestStruct* testStruct2 = new SerializeTestStruct;
fileIn.read( (char*)&testStruct2->mCharVal, sizeof(testStruct2->mCharVal) );
fileIn.read( (char*)&testStruct2->mIntVal, sizeof(testStruct2->mIntVal) );
fileIn.close();
// do some with testStruct2
// ...
delete testStruct2;
In my opinion, you need allow serialization to a buffer and not directly to a stream. Writing to a buffer allows for nested or inherited classes to write to memory, then the whole buffer can be written to the stream. Writing bits and pieces to the stream is not efficient.
Here is something I concocted, before I stopped writing binary data to streams:
struct Serialization_Interface
{
//! Returns size occupied on a stream.
/*! Note: size on the platform may be different.
* This method is used to allocate memory.
*/
virtual size_t size_on_stream(void) const = 0;
//! Stores the fields of the object to the given pointer.
/*! Pointer is incremented by the size on the stream.
*/
virtual void store_to_buffer(unsigned char *& p_buffer) const = 0;
//! Loads the object's fields from the buffer, advancing the pointer.
virtual void load_from_buffer(const unsigned char *& p_buffer) = 0;
};
struct Serialize_Test_Structure
: Serialization_Interface
{
char mCharVal;
int mIntVal;
size_t size_on_stream(void) const
{
return sizeof(mCharVal) + sizeof(mIntVal);
}
void store_to_buffer(unsigned char *& p_buffer) const
{
*p_buffer++ = mCharVal;
((int&)(*p_buffer)) = mIntVal;
p_buffer += sizeof(mIntVal);
return;
}
void load_from_buffer(const unsigned char *& p_buffer)
{
mCharVal = *p_buffer++;
mIntVal = (const int&)(*p_buffer);
p_buffer += sizeof(mIntVal);
return;
}
};
int main(void)
{
struct Serialize_Test_Struct myStruct;
myStruct.mCharVal = 'G';
myStruct.mIntVal = 42;
// Allocate a buffer:
unsigned char * buffer = new unsigned char[](myStruct.size_on_stream());
// Create output file.
std::ofstream outfile("data.bin");
// Does your design support this concept?
unsigned char * p_buffer = buffer;
myStruct.store_to_buffer(p_buffer);
outfile.write((char *) buffer, myStruct.size_on_stream());
outfile.close();
return 0;
}
I stopped writing binary data to streams in favor of textual data because textual data doesn't have to worry about Endianess or which IEEE floating point format is accepted by the receiving platform.
Am I the only one that finds this totally opaque:
bool isError = (false == ofs.good());
if (false == isError) {
// stuff
}
why not:
if ( ofs ) {
// stuff
}