Save short int in binary file instead of text file [duplicate] - c++

This question already has answers here:
Unexpected results with std::ofstream binary write
(2 answers)
Closed 6 years ago.
Let's say I have a vector with 9 integers.
in total, I should have 36 bytes.
some of these integers fit in the size of a short, so I wanna store the ones who fit as short in 2 bytes and the ones who don't, in 4.
I noticed that a file with 120 98 99 99 98 257 259 98 0 was 28 bytes and I wonder what I did wrong.
ofstream out(file, ios::binary);
int len = idx.size(); //idx is the vector<int>
string end = " 0", space = " "; //end is just to finish the saving.
for(int i = 0; i < len; i++) {
if(idx[i] <= SHRT_MAX){
short half = idx[i];
out<<half;
}
else out<<idx[i];
if(i == len-1) out<<end; else out<<space;
}

First piece of advice, use the header cstdint if you want to work with types of a guaranteed size. Types such as uint16_t are standard and are there for a reason.
Next, this idea of sometimes writing two bytes and sometimes writing four. Keep in mind that when you write data to a file like this, it's just going to look like a big chunk of data. There will not be any way to magically know when to read two bytes and when to read four. You can store metadata about the file, but that would probably be more inefficient than simply just consistently using the same size. Write everything as two bytes or four bytes. That's up to you, but whatever it is you should probably stick with it.
Now, moving on to why you have 28 bytes of data written.
You're writing the ASCII representations of your numbers. This ends up being "120 98 99 99 98 257 259 98 9" which has a size of 28 bytes.
When writing your data, you probably want to do something like
out.write( (char*)&my_data, sizeof(my_data));
Keep in mind though this isn't really a safe way to write binary data. I think you already understand the necessity to make sure you write the size you intend. Sadly the complications with creating portable files doesn't end there. You also need to worry about the endianess of the machine your program is running on. This is an article that I think you might enjoy reading to learn more about the subject.
Disch's Tutorial To Good Binary Files

Related

Does endianness affect writing an odd number of bytes?

Imagine you had a uint64_t bytes and you know that you only need 7 bytes because the integers you store will not exceed the limit of 7 bytes.
When writing a file you could do something like
std::ofstream fout(fileName);
fout.write((char *)&bytes, 7);
to only write 7 bytes.
The question I'm trying to figure out is whether endianess of a system affects the bytes that are written to the file. I know that endianess affects the order in which the bytes are written, but does it also affect which bytes are written? (Only for the case when you write less bytes than the integer usually has.)
For example, on a little endian system the first 7 bytes are written to the file, starting with the LSB. On a big endian system what is written to the file?
Or to put it differently, on a little endian system the MSB(the 8th byte) is not written to the file. Can we expect the same behavior on a big endian system?
Endianess affects only the way (16, 32, 64) int are written. If you are writing bytes, (as it is your case) they will be written in the exact same order you are doing it.
For example, this kind of writing will be affected by endianess:
std::ofstream fout(fileName);
int i = 67;
fout.write((char *)&i, sizeof(int));
uint64_t bytes = ...;
fout.write((char *)&bytes, 7);
This will write exactly 7 bytes starting from the address of &bytes. There is a difference between LE and BE systems how the eight bytes in memory are laid out, though (let's assume the variable is located at address 0xff00):
0xff00 0xff01 0xff02 0xff03 0xff04 0xff05 0xff06 0xff07
LE: [byte 0 (LSB!)][byte 1][byte 2][byte 3][byte 4][byte 5][byte 6][byte 7 (MSB)]
BE: [byte 7 (MSB!)][byte 6][byte 5][byte 4][byte 3][byte 2][byte 1][byte 0 (LSB)]
Starting address (0xff00) won't change if casting to char*, and you'll print out the byte at exactly this address plus the next six following ones – in both cases (LE and BE), address 0xff07 won't be printed. Now if you look at my memory table above, it should be obvious that on BE system, you lose the LSB while storing the MSB, which does not carry information...
On a BE-System, you could instead write fout.write((char *)&bytes + 1, 7);. Be aware, though, that this yet leaves a portability issue:
fout.write((char *)&bytes + isBE(), 7);
// ^ giving true/false, i. e. 1 or 0
// (such function/test existing is an assumption!)
This way, data written by a BE-System would be misinterpreted by a LE-system, when read back, and vice versa. Safe version would be decomposing each single byte as geza did in his answer. To avoid multiple system calls, you might decompose the values into an array instead and print out that one.
If on linux/BSD, there's a nice alternative, too:
bytes = htole64(bytes); // will likely result in a no-op on LE system...
fout.write((char *)&bytes, 7);
The question I'm trying to figure out is whether endianess of a system affects the bytes that are written to the file.
Yes, it affects the bytes are written to the file.
For example, on a little endian system the first 7 bytes are written to the file, starting with the LSB. On a big endian system what is written to the file?
The first 7 bytes are written to the file. But this time, starting with the MSB. So, in the end, the lowest byte is not written in the file, because on big endian systems, the last byte is the lowest byte.
So, this is not what you've wanted, because you lose information.
A simple solution is to convert uint64_t to little endian, and write the converted value. Or just write the value byte-by-byte in a way that a little endian system would write it:
uint64_t x = ...;
write_byte(uint8_t(x));
write_byte(uint8_t(x>>8));
write_byte(uint8_t(x>>16));
// you get the idea how to write the remaining bytes

Is it possible to read bit to bit from a binary file with c++?

I'm new here so I'll try to be very clear with my issue. I've tried to get a direct answer, but when I check on other questions, they are very particular and I get confused.
I have a binary file and I need to read it for my project. I also have an specification sheet, and I'm reading the file accordingly to those specs. So I've created a cpp file, and writing a simple program to read each element. I use ifstream, and read() functions to read from file.
The problem is when on the specification sheet, I get that I need to read a bitstring with size 12. From the details, it's very clear that I should read only 12 bits for each of this elements. But I'm not really sure if reading bit to bit is possible. Rest of elements were read in bytes. And also, If I read 2 bytes each time and use bit "masks" to get 12 bits only, the rest of elements read after this does not match correctly. So my guess is that I really need to read only 12 bits.
So my question. Is it possible to read 12 bits from a binary file? or reading Bit to bit? . And I mean only 12, without reading bytes and then masking them.
Thanks a lot.
No, this is not possible.
What you should do is read 2 bytes, mask 12 bits to get the result you want but also store the other 4 bits somewhere. Now when you need 12 bits again, read only 1 byte and combine it with the 4 stored bits.
Assuming little endian.
read file to an array of uint8_t that is padded to a multiple of 6 bytes
make your access function
uint16_t get12Bits(uint8_t *ptr, int loc)
{
uint64_t temp;// use lower 48 bits
memcpy(&temp, ptr+(loc&~0x03), 6*uint8_t);//6bytes, 4 elements
return 0xfff&(temp>>(loc&0x03)*12);
}

How to nicely print Buffer in OCaml?

I have a Buffer.
Question 1
How can I print out all byte inside one by one?
Question 2
How can I control the format of the printing?
For example, if I have a buffer like 33 33 33 33 33 33 14 40 (every byte is in HEX format), how can I print it as \x33\x33\x33\x33\x33\x33\x14\x40?
To apply an imperative function f to every byte in a buffer b, you can use String.iter f (Buffer.contents b).
To print a value with a desired format, you can use Printf.printf.
To get the integer value of a byte in a string you can use Char.code.
As a side comment, many of your recent questions could be answered extremely quickly by reading through the OCaml standard library documentation. I think this would be a good thing for you to do. There's not a lot of deep intellectual content, it's just something you should know about as an OCaml programmer.

What's an AoB (Array of Bytes)

I have encountered this term a couple of times now, and I have googled for explanations, but couldn't find any.
I'm accessing the memory of a running software-game. I do have an address but I'm also given an AoB, for example
89 8B ? ? 00 00 8B 50 ? 89 93 ? ?.
What do I do with it?
I'd appreciate it if you could give me a guide or something.
Thanks
An array of bytes is best explained in C/++ as an array of [unsigned] char.
The values you see are only hexadecimal representations of these bytes or unsigned char's.
An array of bytes is a contiguous series of values, usually in the range 0 to 255 (0x00 to 0xFF).
The contents must be interpreted by the programmer and can be anything from addresses to pixels for a bitmap.
A common use of AoB, a.k.a. buffer, is for I/O, reading and writing data. The fundamental I/O routines do not care about content, just quantity, source and destination. A program may read large amounts of data into an AOB, then later cast it as some kind of structure or assign fields with data from the buffer. See also "serialization." This is a performance technique with I/O: convert many small reads into one large block read.
Not all data has to be in structures or objects; those are just a convenience.

Binary file write problem in C++

This is my function which creates a binary file
void writefile()
{
ofstream myfile ("data.abc", ios::out | ios::binary);
streamoff offset = 1;
if(myfile.is_open())
{
char c='A';
myfile.write(&c, offset );
c='B';
myfile.write(&c, offset );
c='C';
myfile.write(&c,offset);
myfile.write(StartAddr,streamoff (16) );
myfile.close();
}
else
cout << "Some error" << endl ;
}
The value of StartAddr is 1000, hence the expected output file is:
A B C 1000 NUL NUL NUL
However, strangely my output file appends this: data.abc
So the final outcome is: A B C 1000 NUL NUL NUL data.abc
Please help me out with this. How to deal with this? Why is this strange behavior?
I recommend you quit with binary writing and work on writing the data in a textual format. You've already encountered some of the problems with writing data. There are still issues for you to come across about reading the data and portability. Expect more pain if you continue this route.
Use textual representations. For simplicity you can put one field per line and use std::getline to read it in. The textual representation allows you to view the data in any text editor, easily. Try using Notepad to view a binary file!
Oh, but binary data is soo much faster and takes up less space in the file. You've already wasted enough time and money than you would gain by using binary data. The speed of computers and huge memory capacities (disk and RAM) make binary representations a thing of the past (except in extreme cases).
As a learning tool, go ahead and use binary. For ease of development and quick schedules (IOW, finishing early), use textual representations.
Search Stack Overflow for "C++ micro optimization" for the justifications.
There are several issues with this code.
For starters, if you want to write individual characters t a stream, you don't need to use ostream::write. Instead, just use ostream::put, as shown here:
myfile.put('A');
Second, if you want to write out a string into a file stream, just use the stream insertion operator:
myfile << StartAddr;
This is perfectly safe, even in binary mode.
As for the particular problem you're reporting, I think that the issue is that you're trying to write out a string of length four (StartAddr), but you've told the stream to write out sixteen bytes. This means that you're writing out the four bytes for the string contents, then the null terminator, and then nine bytes of whatever happens to be in memory after the buffer. In your case, this is two more null bytes, then the meaningless text that you saw after that. To fix this, either change your code to write fewer bytes or, if StartAddr is a string, then just write it using <<.
With the line myfile.write(StartAddr,streamoff (16) ); you are instructing the myfile object to write 16 bytes to the stream starting at the address StartAddr. Imagine that StartAddr is an array of 16 bytes:
char StartAddr[16] = "1000\0\0\0data.b32\0";
myfile.write(StartAddr, sizeof(StartAddr));
Would generate the output that you see. Without seeing the declaration / definition of StartAddr I cannot say for certain, but it appears you are writing out a five byte nul terminated string "1000" followed by whatever happens to reside in the next 11 bytes after StartAddr. In this case, it appears a couple of nul bytes followed by the constant nul terminated string "data.b32" (which the compiler must put somewhere in memory) are what follow StartAddr.
Regardless, it is clear that you overread a buffer.
If you are trying to write a 16 bit integer type to a stream you have a couple of options, both based on the fact that there are typically 8 bits in a byte. The 'cleanest' one would be something like:
char x = (StartAddr & 0xFF);
myfile.write(x);
x = (StartAddr >> 8);
myfile.write(x);
This assumes StartAddr is a 16 bit integer type and does not take into account any translation that might occur (such as potential conversion of a value of 10 [a linefeed] into a carriage return / linefeed sequence).
Alternatively, you could write something like:
myfile.write(reinterpret_cast<char*>(&StartAddr), sizeof(StartAddr));