Understanding binary conversions - c++

I'm writing a resource file which I want to insert a bunch of data from various common files such as .JPG, .BMP (for example) and I want it to be in binary.
I'm going to code something to retrieve these data later on organized by index, and this is what I got so far:
float randomValue = 23.14f;
ofstream fileWriter;
fileWriter.open("myFile.dat", ios::binary);
fileWriter.write((char*)&randomValue, sizeof(randomValue));
fileWriter.close();
//With this my .dat file, when opened in notepad has "B!¹A" in it
float retrieveValue = 0.0f;
ifstream fileReader;
fileReader.open("myFile.dat", ios::binary);
fileReader.read((char*)&retrieveValue, sizeof(retrieveValue));
fileReader.close();
cout << retrieveValue << endl; //This gives me exactly the 23.14 I wanted, perfect!
While this works nicely, I'd like to understand what exactly is happening there.
I'm converting the address of randomValue to char*, and writing the values in this address to the file?
I'm curious also because I need to do this for an array, and I can't do this:
int* myArray = new int[10];
//fill myArray values with random stuff
fileWriter.open("myFile.dat", ios::binary);
fileWriter.write((char*)&myArray, sizeof(myArray));
fileWriter.close();
From what I understand, this would just write the first address' value in the file, not all the array. So, for testing, I'm trying to simply convert a variable to a char* which I would write to a file, and convert back to the variable to see if I'm retrieving the values correctly, so I'm with this:
int* intArray = new int[10];
for(int i = 0; i < 10; i++)
{
cout << &intArray[i]; //the address of each number in my array
cout << intArray[i]; //it's value
cout << reinterpret_cast<char*>(&intArray[i]); //the char* value of each one
}
But for some reason I don't know, my computer "beeps" when I run this code. During the array, I'm also saving these to a char* and trying to convert back to int, but I'm not getting the results expected, I'm getting some really long values.
Something like:
float randomValue = 23.14f;
char* charValue = reinterpret_cast<char*>(&randomValue);
//charValue contains "B!¹A" plus a bunch of other (un-initiallized values?) characters, so I'm guessing the value is correct
//Now I'm here
I want to convert charValue back to randomValue, how can I do it?
edit: There's valuable information in the answers below, but they don't solve my (original) problem. I was testing these type of conversions because I'm doing a code that I will pick a bunch of resource files such as BMP, JPG, MP3, and save them in a single .DAT file organized by some criteria I still haven't fully figured out.
Later, I am going to use this resource file to read from and load these contents into a program (game) I'm coding.
The criteria I am still thinking but I was wondering if it's possible to do something like this:
//In my ResourceFile.DAT
[4 bytes = objectID][3 bytes = objectType (WAV, MP3, JPG, BMP, etc)][4 bytes = objectLength][objectLength bytes = actual objectData]
//repeating this until end of file
And then in the code that reads the resource file, I want to do something like this (untested):
ifstream fileReader;
fileReader.open("myFile.DAT", ios::binary);
//file check stuff
while(!fileReader.eof())
{
//Here I'll load
int objectID = 0;
fileReader((char*)&objectID, 4); //read 4 bytes to fill objectID
char objectType[3];
fileReader(&objectType, 3); //read the type so I know which parser use
int objectLength = 0;
fileReader((char*)&objectLength, 4); //get the length of the object data
char* objectData = new char[objectLength];
fileReader(objectData, objectLength); //fill objectData with the data
//Here I'll use a parser to fill classes depending on the type etc, and move on to the next obj
}
Currently my code is working with the original files (BMP, WAV, etc) and filling them into classes, and I want to know how I can save the data from these files into a binary data file.
For example, my class that manages BMP data has this:
class FileBMP
{
public:
int imageWidth;
int imageHeight;
int* imageData;
}
When I load it, I call:
void FileBMP::Load(int iwidth, int iheight)
{
int imageTotalSize = iwidth * iheight * 4;
imageData = new int[imageTotalSize]; //This will give me 4 times the amount of pixels in the image
int cPixel = 0;
while(cPixel < imageTotalSize)
{
imageData[cPixel] = 0; //R value
imageData[cPixel + 1] = 0; //G value
imageData[cPixel + 2] = 0; //B value
imageData[cPixel + 3] = 0; //A value
cPixel += 4;
}
}
So I have this single dimension array containing values in the format of [RGBA] per pixel, which I am using later on for drawing on screen.
I want to be able to save just this array in the binary data format that I am planning that I stated above, and then read it and fill this array.
I think it's asking too much for a code like this, so I'd like to understand what I need to know to save these values into a binary file and then read back to fill it.
Sorry for the long post!
edit2: I solved my problem by making the first edit... thanks for the valuable info, I also got to know what I wanted to!

By using the & operator, you're getting a pointer to the contents of the variable (think of it as just a memory address).
float a = 123.45f;
float* p = &a; // now p points to a, i.e. has the memory address to a's contents.
char* c = (char*)&a; // c points to the same memory location, but the code says to treat the contents as char instead of float.
When you gave the (char*)&randomValue for write(), you simply told "take this memory address having char data and write sizeof(randomValue) chars from there". You're not writing the address value itself, but the contents from that location of memory ("raw binary data").
cout << reinterpret_cast<char*>(&intArray[i]); //the char* value of each one
Here you're expected to give char* type data, terminated with a null char (zero). However, you're providing the raw bytes of the float value instead. Your program might crash here, as cout will input chars until it finds the terminator char -- which it might not find anytime soon.
float randomValue = 23.14f;
char* charValue = reinterpret_cast<char*>(&randomValue);
float back = *(float*)charValue;
Edit: to save binary data, you simply need to provide the data and write() it. Do not use << operator overloads with ofstream/cout. For example:
int values[3] = { 5, 6, 7 };
struct AnyData
{
float a;
int b;
} data;
cout.write((char*)&values, sizeof(int) * 3); // the other two values follow the first one, you can write them all at once.
cout.write((char*)&data, sizeof(data)); // you can also save structs that do not have pointers.
In case you're going to write structs, have a look at #pragma pack compiler directive. Compilers will align (use padding) variable to certain size (int), which means that the following struct actually might require 8 bytes:
#pragma pack (push, 1)
struct CouldBeLongerThanYouThink
{
char a;
char b;
};
#pragma pack (pop)
Also, do not write pointer values itself (if there are pointer members in a struct), because the memory addresses will not point to any meaningful data once read back from a file. Always write the data itself, not pointer values.

What's happening is that you're copying the internal
representation of your data to a file, and then copying it back
into memory, This works as long as the program doing the
writing was compiled with the same version of the compiler,
using the same options. Otherwise, it might or it might not
work, depending on any number of things beyond your control.
It's not clear to me what you're trying to do, but formats like
.jpg and .bmp normally specify the format they want the
different types to have, and you have to respect that format.

It is unclear what you really want to do, so I cannot recommend a way of solving your real problem. But I would not be surprised if running the program actually caused beeps or any other strange behavior in your program.
int* intArray = new int[10];
for(int i = 0; i < 10; i++)
{
cout << reinterpret_cast<char*>(&intArray[i]);
}
The memory returned by new above is uninitialized, but you are trying to print it as if it was a null terminated string. That uninitialized memory could have the bell character (that causes beeps when printed to the terminal) or any other values, including that it might potentially not have a null termination and the insertion operator into the stream will overrun the buffer until it either finds a null or your program crashes accessing invalid memory.
There are other incorrect assumptions in your code, like for example given int *p = new int[10]; the expression sizeof(p) will be the size of a pointer in your architecture, not 10 times the size of an integer.

Related

QByteArray::append() leads to unexpected QByteArray sizes

QByteArray array;
union {
char bytes[sizeof(float)];
float value;
} myFloat;
for (int i = 0; i < 10; i++) {
myFloat.value = 2.3 + i;
array.append(myFloat.bytes);
qDebug() << array.length(); //9, 18, 27, etc, instead of 4, 8, 12, etc?
}
Hey, I'm trying to construct a QByteArray to store and send via TCP at a later stage, via QTcpSocket::write(QByteArray);. However, the length increase of the array was not what I expected, and when I send it via Tcp, my readHandler seems to start reading gibberish after the first float. This seems to be solved by using another append function array.append(float.bytes, sizeof(float));. Does anyone know:
What went wrong in the first place? Why does adding a 4 byte char result in a 9 bytes longer QByteArray? Has it to do with the \o's being added?
Will array.append(float.bytes, sizeof(float)); method work? Meaning, if I send the array, will I send 10*4 bytes of raw float values?
The append() overload you (unintentionally) picked treats the passed argument as a zero-terminated string. Obviously, the float value seems to contain a zero byte at some point so the append() function eventually stops reading.
To append binary data, I'd use QByteArray::fromRawData() to obtain a QByteArray and then append it:
float f = 3.14;
QByteArray a = QByteArray::fromRawData(&f, sizeof(f));
QByteArray b;
b += a;
This makes your intention clear and it avoids the union trick, which is undefined behaviour anyway.

What is the best solution for writing numbers into file and than read them?

I have 640*480 numbers. I need to write them into a file. I will need to read them later. What is the best solution? Numbers are between 0 - 255.
For me the best solution is to write them binary(8 bits). I wrote the numbers into txt file and now it looks like 1011111010111110 ..... So there are no questions where the number starts and ends.
How am I supposed to read them from the file?
Using c++
It's not good idea to write bit values like 1 and 0 to text file. The file size will bigger in 8 times. 1 byte = 8 bits. You have to store bytes, 0-255 - is byte. So your file will have size 640*480 bytes instead of 640*480*8. Every symbol in text file has size of 1 byte minimum. If you want to get bits, use binary operators of programming language that you use. To read bytes much easier. Use binary file for saving your data.
Presumably you have some sort of data structure representing your image, which somewhere inside holds the actual data:
class pixmap
{
public:
// stuff...
private:
std::unique_ptr<std::uint8_t[]> data;
};
So you can add a new constructor which takes a filename and reads bytes from that file:
pixmap(const std::string& filename)
{
constexpr int SIZE = 640 * 480;
// Open an input file stream and set it to throw exceptions:
std::ifstream file;
file.exceptions(std::ios_base::badbit | std::ios_base::failbit);
file.open(filename.c_str());
// Create a unique ptr to hold the data: this will be cleaned up
// automatically if file reading throws
std::unique_ptr<std::uint8_t[]> temp(new std::uint8_t[SIZE]);
// Read SIZE bytes from the file
file.read(reinterpret_cast<char*>(temp.get()), SIZE);
// If we get to here, the read worked, so we move the temp data we've just read
// into where we'd like it
data = std::move(temp); // or std::swap(data, temp) if you prefer
}
I realise I've assumed some implementation details here (you might not be using a std::unique_ptr to store the underlying image data, though you probably should be) but hopefully this is enough to get you started.
You can print the number between 0-255 as the char value in the file.
See the below code. in this example I am printing integer 70 as char.
So this result in print as 'F' on the console.
Similarly you can read it as char and then convert this char to integer.
#include <stdio.h>
int main()
{
int i = 70;
char dig = (char)i;
printf("%c", dig);
return 0;
}
This way you can restrict the file size.

Reading in raw encoded nrrd data file into double

Does anyone know how to read in a file with raw encoding? So stumped.... I am trying to read in floats or doubles (I think). I have been stuck on this for a few weeks. Thank you!
File that I am trying to read from:
http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.raw
Description of raw encoding:
hello://teem.sourceforge.net/nrrd/format.html#encoding (change hello to http to go to page)
- "raw" - The data appears on disk exactly the same as in memory, in terms of byte values and byte ordering. Produced by write() and fwrite(), suitable for read() or fread().
Info of file:
http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.nhdr - I think the only things that matter here are the big endian (still trying to understand what that means from google) and raw encoding.
My current approach, uncertain if it's correct:
//Function ripped off from example of c++ ifstream::read reference page
void scantensor(string filename){
ifstream tdata(filename, ifstream::binary); // not sure if I should put ifstream::binary here
// other things I tried
// ifstream tdata(filename) ifstream tdata(filename, ios::in)
if(tdata){
tdata.seekg(0, tdata.end);
int length = tdata.tellg();
tdata.seekg(0, tdata.beg);
char* buffer = new char[length];
tdata.read(buffer, length);
tdata.close();
double* d;
d = (double*) buffer;
} else cerr << "failed" << endl;
}
/* P.S. I attempted to print the first 100 elements of the array.
Then I print 100 other elements at some arbitrary array indices (i.e. 9,900 - 10,000). I actually kept increasing the number of 0's until I ran out of bound at 100,000,000 (I don't think that's how it works lol but I was just playing around to see what happens)
Here's the part that makes me suspicious: so the ifstream different has different constructors like the ones I tried above.
the first 100 values are always the same.
if I use ifstream::binary, then I get some values for the 100 arbitrary printing
if I use the other two options, then I get -6.27744e+066 for all 100 of them
So for now I am going to assume that ifstream::binary is the correct one. The thing is, I am not sure if the file I provided is how binary files actually look like. I am also unsure if these are the actual numbers that I am supposed to read in or just casting gone wrong. I do realize that my casting from char* to double* can be unsafe, and I got that from one of the threads.
*/
I really appreciate it!
Edit 1: Right now the data being read in using the above method is apparently "incorrect" since in paraview the values are:
Dxx,Dxy,Dxz,Dyy,Dyz,Dzz
[0, 1], [-15.4006, 13.2248], [-5.32436, 5.39517], [-5.32915, 5.96026], [-17.87, 19.0954], [-6.02961, 5.24771], [-13.9861, 14.0524]
It's a 3 x 3 symmetric matrix, so 7 distinct values, 7 ranges of values.
The floats that I am currently parsing from the file right now are very large (i.e. -4.68855e-229, -1.32351e+120).
Perhaps somebody knows how to extract the floats from Paraview?
Since you want to work with doubles, I recommend to read the data from file as buffer of doubles:
const long machineMemory = 0x40000000; // 1 GB
FILE* file = fopen("c:\\data.bin", "rb");
if (file)
{
int size = machineMemory / sizeof(double);
if (size > 0)
{
double* data = new double[size];
int read(0);
while (read = fread(data, sizeof(double), size, file))
{
// Process data here (read = number of doubles)
}
delete [] data;
}
fclose(file);
}

Heap Corruption caused by Invalid Casting?

I have the code:
unsigned char *myArray = new unsigned char[40000];
char pixelInfo[3];
int c = 0;
while(!reader.eof()) //reader is a ifstream open to a BMP file
{
reader.read(pixelInfo, 3);
myArray[c] = (unsigned char)pixelInfo[0];
myArray[c + 1] = (unsigned char)pixelInfo[1];
myArray[c + 2] = (unsigned char)pixelInfo[2];
c += 3;
}
reader.close();
delete[] myArray; //I get HEAP CORRUPTION here
After some tests, I found it to be caused by the cast in the while loop, if I use a signed char myArray I don't get the error, but I must use unsigned char for the rest of my code.
Casting pixelInfo to unsigned char also gives the same error.
Is there any solution to this?
This is what you should do:
reader.read((char*)myArray, myArrayLength); /* note, that isn't (sizeof myArray) */
if (!reader) { /* report error */ }
If there's processing going on inside the loop, then
int c = 0;
while (c + 2 < myArraySize) //reader is a ifstream open to a BMP file
{
reader.read(pixelInfo, 3);
myArray[c] = (unsigned char)pixelInfo[0];
myArray[c + 1] = (unsigned char)pixelInfo[1];
myArray[c + 2] = (unsigned char)pixelInfo[2];
c += 3;
}
Trying to read after you've hit the end is not a problem -- you'll get junk in the rest of the array, but you can deal with that at the end.
Assuming your array is big enough to hold the whole file invites buffer corruption. Buffer overrun attacks involving image files with carefully crafted incorrect metadata are quite well-known.
in Mozilla
in Sun Java
in Internet Explorer
in Windows Media Player
again in Mozilla
in MSN Messenger
in Windows XP
Do not rely on the entire file content fitting in the calculated buffer size.
reader.eof() will only tell you if the previous read hit the end of the file, which causes your final iteration to write past the end of the array. What you want instead is to check if the current read hits the end of file. Change your while loop to:
while(reader.read(pixelInfo, 3)) //reader is a ifstream open to a BMP file
{
// ...
}
Note that you are reading 3 bytes at a time. If the total number of bytes is not divisible by 3 (not a multiple of 3) then only part of the pixelInfo array will actually be filled with correct data which may cause an error with your program. You could try the following piece of not tested code.
while(!reader.eof()) //reader is a ifstream open to a BMP file
{
reader.read(pixelInfo, 3);
for (int i = 0; i < reader.gcount(); i++) {
myArray[c+i] = pixelInfo[i];
}
c += 3;
}
Your code does follow the documentation on cplusplus.com very well since eof bit will be set after an incomplete read so this code will terminate after your last read however, as I mentioned before the likely cause of your issue is the fact that you are assigning likely junk data to the heap since pixelInfo[x] might not necessarily be set if 3 bytes were not read.

Reading data from binary file

I am trying to read data from binary file, and having issues. I have reduced it down to the most simple case here, and it still won't work. I am new to c++ so I may be doing something silly but, if anyone could advise I would be very grateful.
Code:
int main(int argc,char *argv[]) {
ifstream myfile;
vector<bool> encoded2;
cout << encoded2 << "\n"<< "\n" ;
myfile.open(argv[2], ios::in | ios::binary |ios::ate );
myfile.seekg(0,ios::beg);
myfile.read((char*)&encoded2, 1 );
myfile.close();
cout << encoded2 << "\n"<< "\n" ;
}
Output
00000000
000000000000000000000000000011110000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Compression_Program(58221) malloc: * error for object 0x10012d: Non-aligned pointer being freed
* set a breakpoint in malloc_error_break to debug
Thanks in advance.
Do not cast a vector<bool>* to a char*. It is does not do anything predictable.
You are reading on encoded2: myfile.read((char*)&encoded2, 1 );. this is wrong. you can to read a bool and then put it in encoded2
bool x;
myfile.read( &x, 1 );
encoded2[0] = x;
Two mistakes here:
you assume the address of a vector is the address of the first element
you rely on vector<bool>
Casting a vector into a char * is not really a good thing, because a vector is an object and stores some state along with its elements.
Here you are probably overwriting the state of the vector, thus the destructor of fails.
Maybe you would like to cast the elements of the vector (which are guaranteed to be stored contiguously in memory). But another trap is that vector<bool> may be implementation-optimized.
Therefore you should do a encoded2.reserve(8) and use myfile.read(reinterpret_cast<char *>(&encoded2[0])).
But probably you want to do something else and we need to know what the purpose is here.
You're overwriting a std::vector, which you shouldn't do. A std::vector is actually a pointer to a data array and an integer (probably a size_t) holding its size; if you overwrite these with practically random bits, data corruption will occur.
Since you're only reading a single byte, this will suffice:
char c;
myfile.read(&c, 1);
The C++ language does not provide an efficient I/O method for reading bits as bits. You have to read bits in groups. Also, you have to worry about Endianess when reading int the bits.
I suggest the old fashioned method of allocating a buffer, reading into the buffer then operating on the buffer.
Allocating a buffer
const unsigned int BUFFER_SIZE = 1024 * 1024; // Let the compiler calculate it.
//...
unsigned char * const buffer = new unsigned char [BUFFER_SIZE]; // The pointer is constant.
Reading in the data
unsigned int bytes_read = 0;
ifstream data_file("myfile.bin", ios::binary); // Open file for input without translations.
data_file.read(buffer, BUFFER_SIZE); // Read data into the buffer.
bytes_read = data_file.gcount(); // Get actual count of bytes read.
Reminders:
delete the buffer when you are
finished with it.
Close the file when you are finished
with it.
myfile.read((char*) &encoded2[0], sizeof(int)* COUNT);
or you can use push_back();
int tmp;
for(int i = 0; i < COUNT; i++) {
myfile.read((char*) &tmp, 4);
encoded2.push_back(tmp);
}