C, C++ extract struct member from binary file - c++

I'm using the following code to extract a struct member from a binary file.
I'm wondering why this prints out multiple times? when there is only one ID record, and only one struct in the file. I need to access just this member, what is the best way to do it?
I don't really understand what the while loop is doing? Is it testing for whether the file is open and returning 1 until that point?
Why use fread inside the while loop?
Does the fread need to be set to the specific size of the struct member?
Is the printf statement reading the binary and outputting an int?
FILE *p;
struct myStruct x;
p=fopen("myfile","rb");
while(1) {
size_t n = fread(&x, sizeof(x), 1, p);
if (n == 0) {
break;
}
printf("\n\nID:%d", x.ID); // Use matching specifier
fflush(stdout); // Insure output occurs promptly
}
fclose(p);
return 0;
The struct looks like this:
struct myStruct
{
int cm;
int bytes;
int ID;
int version;
char chunk[1];
}

Not really an answer but to answer a comment.
Just do
FILE *p = fopen("myfile","rb");
struct myStruct x;
size_t n = fread(&x, sizeof(x), 1, p);
if (n != 1) {
// Some error message
} else {
printf("\n\nID:%d\n", x.ID);
}
...Do as you wish with the rest of the file

I'm wondering why this prints out multiple times? when there is only one ID record, and only one struct in the file.
It won't! So if you have multiple prints the likely explanation is that the file contains more than just one struct. Another explanation could be that the file (aka the struct) was not saved in the same way as you use for reading.
I need to access just this member, what is the best way to do it?
Your approach looks fine to me.
I don't really understand what the while loop is doing?
The while is there because the code should be able to read multiple structs from the file. Using while(1) means something like "loop forever". To get out of such a loop, you use break. In your code the break happens when it can't read more structs from the file, i.e. if (n == 0) { break; }
Is it testing for whether the file is open and returning 1 until that point?
No - see answer above.
Why use fread inside the while loop?
As above: To able to read multiple structs from the file
Does the fread need to be set to the specific size of the struct member?
Well, fread is not "set" to anything. It is told how many elements to read and the size of each element. Therefore you call it with sizeof(x).
Is the printf statement reading the binary and outputting an int?
No, the reading is done by fread. Yes, printf outputs the decimal value.
You can try out this code:
#include <stdio.h>
#include <unistd.h>
struct myStruct
{
int cm;
int bytes;
int ID;
int version;
char chunk[1];
};
void rr()
{
printf("Reading file\n");
FILE *p;
struct myStruct x;
p=fopen("somefile","rb");
while(1) {
size_t n = fread(&x, sizeof(x), 1, p);
if (n == 0) {
break;
}
printf("\n\nID:%d", x.ID); // Use matching specifier
fflush(stdout); // Insure output occurs promptly
}
fclose(p);
}
void ww()
{
printf("Creating file containing a single struct\n");
FILE *p;
struct myStruct x;
x.cm = 1;
x.bytes = 2;
x.ID = 3;
x.version = 4;
x.chunk[0] = 'a';
p=fopen("somefile","wb");
fwrite(&x, sizeof(x), 1, p);
fclose(p);
}
int main(void) {
if( access( "somefile", F_OK ) == -1 )
{
// If "somefile" isn't there already, call ww to create it
ww();
}
rr();
return 0;
}

Answers in-line
I'm wondering why this prints out multiple times? when there is only one ID record, and only one struct in the file. I need to access just this member, what is the best way to do it?
The file size is 2906 bytes and fread is only reading sone 17 bytes at a time, and this goes on in a loop
I don't really understand what the while loop is doing? Is it testing for whether the file is open and returning 1 until that point?
The total number of elements successfully read is returned by fread
Why use fread inside the while loop?
In this case while is not necessary. just one fread is enough. Fread is sometimes used in a while loop when input from some other source like UART is being processed and the program has to wait for the said number of bytes t be read
Does the fread need to be set to the specific size of the struct member?
No. Reading the entire struct is better
Is the printf statement reading the binary and outputting an int?
No

Related

Byte output to binary file C++

I'm writing Huffman coding and everything was OK, until I tried to save the result into the archived file. Our teacher offered us to do it with such function (it takes each time a bit and after taking 8 of them should output a byte):
long buff=0;
int counter=0;
std::ofstream out("output", std::iostream::binary);
void putbit(bool b)
{
buff<<=1;
if (b) buff++;
counter++;
if (counter>=8)
{
out.put(buff);
counter=0;
buff=0;
}
}
I tried an example with inputting sequence of bits like this:
0011001011001101111010010001000001010101101100
but the output file in binary mode includes just: 1111111
As buff variable has the correct numbers (25 102 250 68 21 108) I suggested that I wrote the code in my notebook incorrectly and something is wrong with this line:
out.put(buff);
I tried to remove it with this line:
out << buff;
but got: 1111111111111111
Another way was:
out.write((char *) &buff, 8);
which gives:
100000001000000010000000100000001000000010000000
It look like the closest to the correct answer, but still doesn't work correctly.
Maybe I don't understand something about file output.
Question:
Could you explain me how to make it work and why previous variants are wrong?
UPD:
The input comes from this function:
void code(std::vector<bool> cur, std::vector<bool> sh, std::vector<bool>* codes, Node* r)
{
if (r->l)
{
cur.push_back(0);
if (r->l->symb)
{
putbit(0);
codes[(int)r->l->symb] = cur;
for (int i=7; i>=0; i--)
{
if ((int)r->l->symb & (1 << i))
putbit(1);
else putbit(0);
}
}
else
{
putbit(0);
code(cur, sh, codes, r->l);
}
cur.pop_back();
}
if (r->r)
{
cur.push_back(1);
if (r->r->symb)
{
putbit(1);
codes[(int)r->r->symb] = cur;
for (int i=7; i>=0; i--)
{
if ((int)r->r->symb & (1 << i))
putbit(1);
else putbit(0);
}
}
else
{
putbit(1);
code(cur, sh, codes, r->r);
}
cur.pop_back();
}
}
The thing is, your putbit function is working (though its terrible, you use globals and your buffer should be a char).
For example, this is how I tested your function.
out.open( "outfile", std::ios::binary );
if ( out.is_open() ) {
putbit(1);
putbit(1);
putbit(0);
putbit(1);
putbit(0);
putbit(1);
putbit(0);
putbit(0);
out.close();
}
This should ouput 1101 0100 or d4 in hex.
I believe this an XY problem. The problem you're trying to solve is not in the putbit function but rather on the way you use it and in your algorithm.
You said that you had the right values before putting your data to the output file. There are many similar questions to your in stackoverflow, just look for them.
The real problem is that the putbit function is not enough to solve your problems. You rely of the fact that it will write a byte after you call it 8 times. What if you write less than 8 bytes? Also, you never flush your file (at least in the code you posted) so there's no guarantee that all data will be written.
First you must understand how file handles (streams) work. Open your file locally, check if it's open and close it when you're done. Closing also guarantees that all data in the file buffer is written to the file.
outfile.open( "output", std::ios::binary );
if ( outfile.is_open() ) {
// ... use file ...
outfile.close();
}
else {
// Couldnt open file!
}
Other questions solve this by writing, or using, a BitStream. It would look somewhat like this,
class OutBitstream {
public:
OutBitstream();
~OutBitstream(); // close file
bool isOpen();
void open( const std::string &file );
void close(); // close file, also write pending bits
void writeBit( bool b ); // or putbit, use the names you prefer
void writeByte( char c );
void writePendingBits(); // write bits in the buffer they may
// be less than 8 so you may have to do some padding
private:
std::ofstream _out;
char _bitBuffer; //or std::bitset<8>
int _numbits;
};
With this interface it should be easier to handle bit input. No globals as well. I hope that helps.

Trouble with fscanf() in c programming

I am trying to read some data in a file called "data" with specific format. The data in this file is:
0 mpi_write() 100
1 mpi_write() 200
2 mpi_write() 300
4 mpi_write() 400
5 mpi_write() 1000
then code is as follow:
#include<stdlib.h>
#include<stdio.h>
typedef struct tracetype{
int pid;
char* operation;
int size;
}tracetyper;
void main(){
FILE* file1;
file1=fopen("./data","r");
if(file1==NULL){
printf("cannot open file");
exit(1);
}else{
tracetyper* t=(tracetyper*)malloc(sizeof(tracetyper));
while(feof(file1)!=EOF){
fscanf(file1,"%d %s %d\n",&t->pid,t->operation,&t->size);
printf("pid:%d,operation:%s,size:%d",t->pid,t->operation,t->size);
}
free(t);
}
fclose(file1);
}
When running with gdb, I found fscanf doesn't write data to t->pid,t->operation and t->size. Any thing wrong with my code or what? Please help me!
Your program has undefined behavior: you are reading %s data into an uninitialized char* pointer. You need to either allocate operation with malloc, or if you know the max length is, say, 20 characters, you can put a fixed string for it into the struct itself:
typedef struct tracetype{
int pid;
char operation[21]; // +1 for null terminator
int size;
} tracetyper;
When you read %s data, you should always tell fscanf the limit on the length, like this:
fscanf(file1,"%d %20s %d\n",&t->pid,t->operation,&t->size);
Finally, you should remove \n at the end of the string, and check the count of returned values instead of checking feof, like this:
for (;;) { // Infinite loop
...
if (fscanf(file1,"%d %20s %d",&t->pid,t->operation,&t->size) != 3) {
break;
}
...
}
You should loop with something like:
while ( (fscanf(file1,"%d %s %d\n",&t->pid,t->operation,&t->size)) != EOF) {
printf("pid:%d,operation:%s,size:%d",t->pid,t->operation,t->size);
}
You also need to add malloc for char array in the structure.
Also, insert a check for t as
if (t == NULL)
cleanup();

Heap Corruption caused by Invalid Casting?

I have the code:
unsigned char *myArray = new unsigned char[40000];
char pixelInfo[3];
int c = 0;
while(!reader.eof()) //reader is a ifstream open to a BMP file
{
reader.read(pixelInfo, 3);
myArray[c] = (unsigned char)pixelInfo[0];
myArray[c + 1] = (unsigned char)pixelInfo[1];
myArray[c + 2] = (unsigned char)pixelInfo[2];
c += 3;
}
reader.close();
delete[] myArray; //I get HEAP CORRUPTION here
After some tests, I found it to be caused by the cast in the while loop, if I use a signed char myArray I don't get the error, but I must use unsigned char for the rest of my code.
Casting pixelInfo to unsigned char also gives the same error.
Is there any solution to this?
This is what you should do:
reader.read((char*)myArray, myArrayLength); /* note, that isn't (sizeof myArray) */
if (!reader) { /* report error */ }
If there's processing going on inside the loop, then
int c = 0;
while (c + 2 < myArraySize) //reader is a ifstream open to a BMP file
{
reader.read(pixelInfo, 3);
myArray[c] = (unsigned char)pixelInfo[0];
myArray[c + 1] = (unsigned char)pixelInfo[1];
myArray[c + 2] = (unsigned char)pixelInfo[2];
c += 3;
}
Trying to read after you've hit the end is not a problem -- you'll get junk in the rest of the array, but you can deal with that at the end.
Assuming your array is big enough to hold the whole file invites buffer corruption. Buffer overrun attacks involving image files with carefully crafted incorrect metadata are quite well-known.
in Mozilla
in Sun Java
in Internet Explorer
in Windows Media Player
again in Mozilla
in MSN Messenger
in Windows XP
Do not rely on the entire file content fitting in the calculated buffer size.
reader.eof() will only tell you if the previous read hit the end of the file, which causes your final iteration to write past the end of the array. What you want instead is to check if the current read hits the end of file. Change your while loop to:
while(reader.read(pixelInfo, 3)) //reader is a ifstream open to a BMP file
{
// ...
}
Note that you are reading 3 bytes at a time. If the total number of bytes is not divisible by 3 (not a multiple of 3) then only part of the pixelInfo array will actually be filled with correct data which may cause an error with your program. You could try the following piece of not tested code.
while(!reader.eof()) //reader is a ifstream open to a BMP file
{
reader.read(pixelInfo, 3);
for (int i = 0; i < reader.gcount(); i++) {
myArray[c+i] = pixelInfo[i];
}
c += 3;
}
Your code does follow the documentation on cplusplus.com very well since eof bit will be set after an incomplete read so this code will terminate after your last read however, as I mentioned before the likely cause of your issue is the fact that you are assigning likely junk data to the heap since pixelInfo[x] might not necessarily be set if 3 bytes were not read.

Load a formatted binary file and assign information to structure c++

I've finally figured out how to write some specifically formatted information to a binary file, but now my problem is reading it back and building it back the way it originally was.
Here is my function to write the data:
void save_disk(disk aDisk)
{
ofstream myfile("disk01", ios::out | ios::binary);
int32_t entries;
entries = (int32_t) aDisk.current_file.size();
char buffer[10];
sprintf(buffer, "%d",entries);
myfile.write(buffer, sizeof(int32_t));
std::for_each(aDisk.current_file.begin(), aDisk.current_file.end(), [&] (const file_node& aFile)
{
myfile.write(aFile.name, MAX_FILE_NAME);
myfile.write(aFile.data, BLOCK_SIZE - MAX_FILE_NAME);
});
}
and my structure that it originally was created with and what I want to load it back into is composed as follows.
struct file_node
{
char name[MAX_FILE_NAME];
char data[BLOCK_SIZE - MAX_FILE_NAME];
file_node(){};
};
struct disk
{
vector<file_node> current_file;
};
I don't really know how to read it back in so that it is arranged the same way, but here is my pathetic attempt anyway (I just tried to reverse what I did for saving):
void load_disk(disk aDisk)
{
ifstream myFile("disk01", ios::in | ios::binary);
char buffer[10];
myFile.read(buffer, sizeof(int32_t));
std::for_each(aDisk.current_file.begin(), aDisk.current_file.end(), [&] (file_node& aFile)
{
myFile.read(aFile.name, MAX_FILE_NAME);
myFile.read(aFile.data, BLOCK_SIZE - MAX_FILE_NAME);
});
}
^^ This is absolutely wrong. ^^
I understand the basic operations of the ifstream, but really all I know how to do with it is read in a file of text, anything more complicated than that I'm kind of lost.
Any suggestions on how I can read this in?
You're very close. You need to write and read the length as binary.
This part of your length-write is wrong:
char buffer[10];
sprintf(buffer, "%d",entries);
myfile.write(buffer, sizeof(int32_t));
It only writes the first four bytes of whatever the length is, but the length is character data from a sprintf() call. You need to write this as a binary-value of entries (the integer):
// writing your entry count.
uint32_t entries = (uint32_t)aDisk.current_file.size();
entries = htonl(entries);
myfile.write((char*)&entries, sizeof(entries));
Then on the read:
// reading the entry count
uint32_t entries = 0;
myFile.read((char*)&entries, sizeof(entries));
entries = ntohl(entries);
// Use this to resize your vector; for_each has places to stuff data now.
aDisk.current_file.resize(entries);
std::for_each(aDisk.current_file.begin(), aDisk.current_file.end(), [&] (file_node& aFile)
{
myFile.read(aFile.name, MAX_FILE_NAME);
myFile.read(aFile.data, BLOCK_SIZE - MAX_FILE_NAME);
});
Or something like that.
Note 1: this does NO error checking nor does it account for portability for potentially different endian-ness on different host machines (a big-endian machine writing the file, a little endian machine reading it). Thats probably ok for your needs, but you should at least be aware of it.
Note 2: Pass your input disk parameter to load_disk() by reference:
void load_disk(disk& aDisk)
EDIT Cleaning file_node content on construction
struct file_node
{
char name[MAX_FILE_NAME];
char data[BLOCK_SIZE - MAX_FILE_NAME];
file_node()
{
memset(name, 0, sizeof(name));
memset(data, 0, sizeof(data));
}
};
If you are using a compliant C++11 compiler:
struct file_node
{
char name[MAX_FILE_NAME];
char data[BLOCK_SIZE - MAX_FILE_NAME];
file_node() : name(), data() {}
};

Typecasting from byte[] to struct

I'm currently working on a small C++ project where I use a client-server model someone else built. Data gets sent over the network and in my opinion it's in the wrong order. However, that's not something I can change.
Example data stream (simplified):
0x20 0x00 (C++: short with value 32)
0x10 0x35 (C++: short with value 13584)
0x61 0x62 0x63 0x00 (char*: abc)
0x01 (bool: true)
0x00 (bool: false)
I can represent this specific stream as :
struct test {
short sh1;
short sh2;
char abc[4];
bool bool1;
bool bool2;
}
And I can typecast it with test *t = (test*)stream; However, the char* has a variable length. It is, however, always null terminated.
I understand that there's no way of actually casting the stream to a struct, but I was wondering whether there would be a better way than struct test() { test(char* data) { ... }} (convert it via the constructor)
This is called Marshalling or serialization.
What you must do is read the stream one byte at a time (or put all in a buffer and read from that), and as soon as you have enough data for a member in the structure you fill it in.
When it comes to the string, you simply read until you hit the terminating zero, and then allocate memory and copy the string to that buffer and assign it to a pointer in the struct.
Reading strings this way is simplest and most effective if you have of the message in a buffer already, because then you don't need a temporary buffer for the string.
Remember though, that with this scheme you have to manually free the memory containing the string when you are done with the structure.
Just add a member function that takes in the character buffer(function input parameter char *) and populates the test structure by parsing it.
This makes it more clear and readable as well.
If you provide a implicit conversion constructor then you create a menace which will do the conversion when you least expect it.
When reading variable length data from a sequence of bytes,
you shouldn't fit everything into a single structure or variable.
Pointers are also used to store this variable length.
The following suggestion, is not tested:
// data is stored in memory,
// in a different way,
// NOT as sequence of bytes,
// as provided
struct data {
short sh1;
short sh2;
int abclength;
// a pointer, maybe variable in memory !!!
char* abc;
bool bool1;
bool bool2;
};
// reads a single byte
bool readByte(byte* MyByteBuffer)
{
// your reading code goes here,
// character by character, from stream,
// file, pipe, whatever.
// The result should be true if not error,
// false if cannot rea anymore
}
// used for reading several variables,
// with different sizes in bytes
int readBuffer(byte* Buffer, int BufferSize)
{
int RealCount = 0;
byte* p = Buffer;
while (readByte(p) && RealCount <= BufferSize)
{
RealCount++
p++;
}
return RealCount;
}
void read()
{
// real data here:
data Mydata;
byte MyByte = 0;
// long enough, used to read temporally, the variable string
char temp[64000];
// fill buffer for string with null values
memset(temp, '\0', 64000);
int RealCount = 0;
// try read "sh1" field
RealCount = (readBuffer(&(MyData.sh1), sizeof(short)));
if (RealCount == sizeof(short))
{
// try read "sh2" field
RealCount = readBuffer(&(MyData.sh2), sizeof(short));
if (RealCount == sizeof(short))
{
RealCount = readBuffer(temp, 64000);
if (RealCount > 0)
{
// store real bytes count
MyData.abclength = RealCount;
// allocate dynamic memory block for variable length data
MyData.abc = malloc(RealCount);
// copy data from temporal buffer into data structure plus pointer
// arrays in "plain c" or "c++" doesn't require the "&" operator for address:
memcpy(MyData.abc, temp, RealCount);
// comented should be read as:
//memcpy(&MyData.abc, &temp, RealCount);
// continue with rest of data
RealCount = readBuffer(&(MyData.bool1), sizeof(bool));
if (RealCount > 0)
{
// continue with rest of data
RealCount = readBuffer(&(MyData.bool2), sizeof(bool));
}
}
}
}
} // void read()
Cheers.