C++ binary files not read correctly

C++ binary files not read correctly - c++

I am reading a file that is written in high endian on a little endian intel processor in c++. The file is a generic file written in binary. I have tried reading it using open() and fopen() both but they both seem to get the same thing wrong. The file is a binary file for training images from the MNIST dataset. It contains 4 headers, each 32 bits in size and stored in high endian. My code is working, it is just not giving the right value for the 2nd header. It works for the rest of the headers. I even opened the file in a hex editor to see if the value might be wrong but it is right. The program, for some weird reason, reads only the value of the second header wrong:
Here is the code that deals with reading the headers only:
void DataHandler::readInputData(std::string path){
uint32_t headers[4];
char bytes[4];
std::ifstream file;
//I tried both open() and fopen() as seen below
file.open(path.c_str(), std::ios::binary | std::ios::in);
//FILE* f = fopen(path.c_str(), "rb");
if (file)
{
int i = 0;
while (i < 4)//4 headers
{
//if (fread(bytes, sizeof(bytes), 1, f))
//{
// headers[i] = format(bytes);
// ++i;
//}
file.read(bytes, sizeof(bytes));
headers[i++] = format(bytes);
}
printf("Done getting images file header.\n");
printf("magic: 0x%08x\n", headers[0]);
printf("nImages: 0x%08x\n", headers[1]);//THIS IS THE ONE THAT IS GETTING READ WRONG
printf("rows: 0x%08x\n", headers[2]);
printf("cols: 0x%08x\n", headers[3]);
exit(1);
//reading rest of the file code here
}
else
{
printf("Invalid Input File Path\n");
exit(1);
}
}
//converts high endian to little indian (required for Intel Processors)
uint32_t DataHandler::format(const char * bytes) const
{
return (uint32_t)((bytes[0] << 24) |
(bytes[1] << 16) |
(bytes[2] << 8) |
(bytes[3]));
}
Output I am getting is:
Done getting images file header.
magic: 0x00000803
nImages: 0xffffea60
rows: 0x0000001c
cols: 0x0000001c
nImages should be 60,000 or (0000ea60)h in hex but it is reading it as ffff... for some reason.
Here is the file opened in a hex editor:
As we can see, the 2nd 32 bit number is 0000ea60 but it is reading it wrong...

It seems that char is signed in your environment and therefore 0xEA in the data is sign-extended to 0xFFFFFFEA.
This will break the higher digits.
To prevent this, you should use unsigned char instead of char. (for both of element type of bytes and the argument of format())

Related

zlib error -3 while decompressing archive: Incorrect data check

I am writing a C++ library that also decompresses zlib files. For all of the files, the last call to gzread() (or at least one of the last calls) gives error -3 (Z_DATA_ERROR) with message "incorrect data check". As I have not created the files myself I am not entirely sure what is wrong.
I found this answer and if I do
gzip -dc < myfile.gz > myfile.decomp
gzip: invalid compressed data--crc error
on the command line the contents of myfile.decomp seems to be correct. There is still the crc error printed in this case, however, which may or may not be the same problem. My code, pasted below, should be straightforward, but I am not sure how to get the same behavior in code as on the command line above.
How can I achieve the same behavior in code as on the command line?
std::vector<char> decompress(const std::string &path)
{
gzFile inFileZ = gzopen(path.c_str(), "rb");
if (inFileZ == NULL)
{
printf("Error: gzopen() failed for file %s.\n", path.c_str());
return {};
}
constexpr size_t bufSize = 8192;
char unzipBuffer[bufSize];
int unzippedBytes = bufSize;
std::vector<char> unzippedData;
unzippedData.reserve(1048576); // 1 MiB is enough in most cases.
while (unzippedBytes == bufSize)
{
unzippedBytes = gzread(inFileZ, unzipBuffer, bufSize);
if (unzippedBytes == -1)
{
// Here the error is -3 / "incorrect data check" for (one of) the last block(s)
// in the file. The bytes can be correctly decompressed, as demonstrated on the
// command line, but how can this be achieved in code?
int errnum;
const char *err = gzerror(inFileZ, &errnum);
printf(err, "%s\n");
break;
}
if (unzippedBytes > 0)
{
unzippedData.insert(unzippedData.end(), unzipBuffer, unzipBuffer + unzippedBytes);
}
}
gzclose(inFileZ);
return unzippedData;
}

First off, the whole point of the CRC is to detect corrupted data. If the CRC is bad, then you should be going back to where this file came from and getting the data not corrupted. If the CRC is bad, discard the input and report an error.
You are not clear on the "behavior" you are trying to reproduce, but if you're trying to recover as much data as possible from a corrupted gzip file, then you will need to use zlib's inflate functions to decompress the file. int ret = inflateInit2(&strm, 31); will initialize the zlib stream to process a gzip file.

how to copy binary data of a file

Basically, I am trying to read binary data of a file by using fread() and print it on screen using printf(), now, the problem is that when it prints it out, it actually don't show it as binary 1 and 0 but printing symbols and stuff which I don't know what they are.
This is how I am doing it:
#include <stdio.h>
#include <windows.h>
int main(){
size_t sizeForB, sizeForT;
char ForBinary[BUFSIZ], ForText[BUFSIZ];
char RFB [] = "C:\\users\\(Unknown)\\Desktop\\hi.mp4" ; // Step 1
FILE *ReadBFrom = fopen(RFB , "rb" );
if(ReadBFrom == NULL){
printf("Following File were Not found: %s", RFB);
return -1;
} else {
printf("Following File were found: %s\n", RFB); // Step 2
while(sizeForB = fread(ForBinary, 1, BUFSIZ, ReadBFrom)){ // Step 1
printf("%s", ForBinary);
}
fclose(ReadBFrom);
}
return 0;
}
I would really appreciate if someone could help me out to read the actual binary data of a file as binary (0,1).

while(sizeForB = fread(ForBinary, 1, BUFSIZ, ReadBFrom)){
printf("%s", ForBinary); }
This is wrong on many levels. First of all you said it is binary file - which means there might not be text in it in the first place, and you are using %s format specifier which is used to print null terminated strings. Again since this is binary file, and there might not be text in it in the first place, %s is the wrong format specifier to use. And even if there was text inside this file, you are not sure that fread would read a "complete" null terminated string that you could pass to printf with format specifier %s.
What you may want to do is, read each byte form a file, convert it to a binary representation (google how to convert integer to binary string say, e.g., here), and print binary representation for each that byte.
Basically pseudocode:
foreach (byte b in FileContents)
{
string s = convertToBinary(b);
println(s);
}

How to view files in binary in the terminal?
Either
"hexdump -C yourfile.bin" perhaps, unless you want to edit it of course. Most linux distros have hexdump by default (but obviously not all).
or
xxd -b file

To simply read a file and print it in binary (ones and zeros), read it one char at a time. Then for each bit, print a '0' or '1'. Can print Most or Least significant bit first. Suggest MSb.
if (ReadBFrom) {
int ch;
while ((ch = fgetc(ReadBFrom)) != EOF) {
unsigned mask = 1u << (CHAR_BIT - 1); // CHAR_BIT is typically 8
while (mask) {
putchar(mask & ch ? '1' : '0');
mask >>= 1;
}
}
fclose(ReadBFrom);
}

C++ bitmap editing

I am trying to open a bitmap file, edit it, and then save the edited version as a new file. This is eventually to mess with using steganography. I am trying to save the bitmap information now but the saved file will not open. No errors in compilation or run time. It opens fine and the rest of the functions work.
void cBitmap::SaveBitmap(char * filename)
{
// attempt to open the file specified
ofstream fout;
// attempt to open the file using binary access
fout.open(filename, ios::binary);
unsigned int number_of_bytes(m_info.biWidth * m_info.biHeight * 4);
BYTE red(0), green(0), blue(0);
if (fout.is_open())
{
// same as before, only outputting now
fout.write((char *)(&m_header), sizeof(BITMAPFILEHEADER));
fout.write((char *)(&m_info), sizeof(BITMAPINFOHEADER));
// read off the color data in the bass ackwards MS way
for (unsigned int index(0); index < number_of_bytes; index += 4)
{
red = m_rgba_data[index];
green = m_rgba_data[index + 1];
blue = m_rgba_data[index + 2];
fout.write((const char *)(&blue), sizeof(blue));
fout.write((const char *)(&green), sizeof(green));
fout.write((const char *)(&red), sizeof(red));
}
}
else
{
// post file not found message
cout <<filename << " not found";
}
// close the file
fout.close();
}

You're missing the padding bytes after each RGB row. The rows have to be a multiple of 4 bytes each.
Also, are you supposed to be writing a 24 or 32-bit bmp file? If you're writing 24-bit, you're just missing padding. If you're writing 32-bit, then you're missing each extra byte (alpha). Not enough information to fix your code sample short of writing a complete bmp writer that would support all possible options.

Why size of file .jpg is't same after read binary in C++?

this my pic.jpg
now i try to read the file, code to open files:
ifstream inputFile("pic.jpg", ios::in | ios::binary);
ofstream outFile("pic2.jpg", ios::out | ios::binary);
ofstream outFileTXT("pic2.txt", ios::out | ios::binary);
then i read 256 bytes from inpuFile and write on to outFile and outFileTXT.
the problem in the size, i mean:
pic.jpg = 11,126 bytes.
pic2.jpg = pic2.txt = 4,966 bytes.
this my buffer for read,
char buffer[257];
my code work well on *.txt (no problem).
for 11,126 bytes needs 43 of reads (256 bytes) + what still ..
run 43 times ..
while (i++ < mod) {
// read from binary file 256 byte
in.read(buffer, 256);
// init packet and save it in list by string.
handler << buffer; // this line save buffer in list<string>
}
then i print my list to file.
the idea is: save buffer (size 256 byte) except the last one (118
bytes) into list, witch mean's size of list must be 44, 43
(256 bytes) + 1 (118 bytes)
then print list to file.

There is a problem with this:
char buffer[257];
// ..
while (i++ < mod) {
// read from binary file 256 byte
in.read(buffer, 256);
// init packet and save it in list by string.
handler << buffer;
}
Specifically this:
handler << buffer;
Because buffer is a char* it will treat it as a null terminated string and it will output characters from the buffer until it finds a zero. What it won't do is output the whole buffer like you expect.
You can use write() for that:
while (i++ < mod) {
// read from binary file 256 byte
in.read(buffer, 256);
// init packet and save it in list by string.
handler.write(buffer, in.gcount()); // output all that was read
}
NOTE: Function in.gcount() tells us how many characters were read in the previous in.read() function (it won't always be exactly 256, it could be less if we reach the end).

C++ how to create a PNG file with known data?

Just wondering, if I read a PNG file as a binary file, and I know how to write the hex numbers into another plain txt or whatever file, then how can I recreate the PNG file with those hex numbers?
This is the code I use to read from a PNG file and write to another plain txt file:
unsigned char x;
ifile.open("foo.png",ios::binary);
ifile>>noskipws>>hex;
while(ifile>>x){
ofile<<setw(2)<<setfill('0')<<(int)x;
//do some formatting stuff to the ofile, ofile declaration omitted
//some ifs to see if IEND is read in, which is definitely correct
//if IEND, break, so the last four hex numbers in ofile are 49 45 4E 44
}
//read another 4 bytes and write to ofile, which are AE 42 60 82, the check sum
The reason why I am doing this is because I have some PNG files which have some irrelevant messages after IEND chunk, and I want to get rid of them and only keep the chunks related to the actual picture and split them into different files. By "irrelevant messages" I mean they are not the actual part of the picture but I have some other use with them.

It's easy, you just need to read every 2 characters and convert them from hex back to binary.
unsigned char x;
char buf[3] = {0};
ifile.open("foo.hex");
while(ifile>>buf[0]>>buf[1]){
char *end;
x = (unsigned char) strtol(buf, &end, 16);
if (*end == 0) // no conversion error
// output the byte

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ binary files not read correctly - c++

It seems that char is signed in your environment and therefore 0xEA in the data is sign-extended to 0xFFFFFFEA. This will break the higher digits. To prevent this, you should use unsigned char instead of char. (for both of element type of bytes and the argument of format())

Related

zlib error -3 while decompressing archive: Incorrect data check

how to copy binary data of a file

C++ bitmap editing

Why size of file .jpg is't same after read binary in C++?

C++ how to create a PNG file with known data?

Categories

Resources