How to write byte(s) to a file in C++? - c++

I have created a bitset using std::bitset<8> bits which is equivalent to 00000000 i.e., 1 byte.
I have output file defined as std::ofstream outfile("./compressed", std::ofstream::out | std::ofstream::binary) but when I write the bits using outfile << bits, the content of outfile becomes 00000000 but the size of file is 8 bytes. (each bit of bits end up taking 1 byte in the file). Is there any way to truly write byte to a file? For example if I write 11010001 then this should be written as a byte and the file size should be 1 byte not 8 bytes. I am writing a code for Huffman encoder and I am not able to find a way to write the encoded bytes to the output compressed file.

The issue is operator<< is the text encoding method, even if you've specified std::ofstream::binary. You can use put to write a single binary character or write to output multiple characters. Note that you are responsible for the conversion of data to its char representation.
std::bitset<8> bits = foo();
std::ofstream outfile("compressed", std::ofstream::out | std::ofstream::binary);
// In reality, your conversion code is probably more complicated than this
char repr = bits.to_ulong();
// Use scoped sentries to output with put/write
{
std::ofstream::sentry sentry(outfile);
if (sentry)
{
outfile.put(repr); // <- Option 1
outfile.write(&repr, sizeof repr); // <- Option 2
}
}

Related

Is there a way to convert an array of bytes to a number in C++?

I have read 4 bytes as an array from a file from a SD-Card on an Arduino Mega. Now I want to convert this array in one number, so that I can work with the number as integer(The bytes are a length of the next File section). Is there any included function for my problem or must I code my own?
I read the File into the byte array with the file.read() function from SDFat:
byte array[4]; //creates the byte array
file.read(array,4); //reads 4 bytes from the file and stores it in the array
I hope, you can understand my Problem.
It depends on the endianess of the stored bytes.
If the endianess matches the one of your target system (I believe the Atmegas are big endian) you can just do
int32_t number = *(int32_t*)array;
to get a 32 bit integer.
If the endianess is not matching you have to shift the bytes around yourself, for a little endian encoded number:
int32_t number = uint32_t(array[3]) << 24 | uint32_t(array[2]) << 16 | uint32_t(array[1]) << 8 | uint32_t(array[0]);

How to understand MNIST Binary converter in c++?

I've recently needed to convert mnist data-set to images and labels, it is binary and the structure is in the previous link, so i did a little research and as I'm fan of c++ ,I've read the I/O binary in c++,after that I've found this link in stack. That link works well but no code commenting and no explanation of algorithm so I've get confused and that raise some question in my mind which i need a professional c++ programmer to ask.
1-What is the algorithm to convert the data-set in c++ with help of ifstream?
I've realized to read a file as a binary with file.read and move to the next record, but in C , we define a struct and move it inside the file but i can't see any struct in c++ program for example to read this:
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.
2-What the function reverseInt is doing? (It is not obviously doing simple reversing an integer)
int ReverseInt (int i)
{
unsigned char ch1, ch2, ch3, ch4;
ch1 = i & 255;
ch2 = (i >> 8) & 255;
ch3 = (i >> 16) & 255;
ch4 = (i >> 24) & 255;
return((int) ch1 << 24) + ((int)ch2 << 16) + ((int)ch3 << 8) + ch4;
}
I've did a little debugging with cout and when it revised for example 270991360 it return 10000 , which i cannot find any relation, I understand it AND the number multiples with two with 255 but why?
PS :
1-I already have the MNIST converted images but i want to understand the algorithm.
2-I've already unzip the gz files so the file is pure binary.
1-What is the algorithm to convert the data-set in c++ with help of ifstream?
This function read a file (t10k-images-idx3-ubyte.gz) as follow:
Read a magic number and adjust endianness
Read number of images and adjust endianness
Read number rows and adjust endianness
Read number of columns and adjust endianness
Read all the given images x rows x columns characters (but loose them).
The function use normal int and always switch endianness, that means it target a very specific architecture and is not portable.
How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.
ifstream provides a function to seek to a given position:
file.seekg( posInBytes, std::ios_base::beg);
At the given position, you could read the 32-bit integer:
int32_t val;
file.read ((char*)&val,sizeof(int32_t));
2- What the function reverseInt is doing?
This function reverse order of the bytes of an int value:
Considering an integer of 32bit like aaaaaaaabbbbbbbbccccccccdddddddd, it return the integer ddddddddccccccccbbbbbbbbaaaaaaaa.
This is useful for normalizing endianness, however, it is probably not very portable, as int might not be 32bit (but e.g. 16bit or 64bit)

c++ how to write integers to a binary file that stay 4 bytes long

I want to write a bunch of integers to a file and then be able to read them later. My problem is that when I write the integers to a file, smaller integers end up using less than 4 bytes. So 1 for example is represented as 01 rather than 00 00 00 01. This means I'll have trouble reading the file because I don't know where one integer begins and ends. How do I make it so that the integer I write to the file is always 4 bytes long? My code is below:
std::fstream file;
file.open("test.bin", std::ios::out | std::ios::binary);
for each(int i in vectorOfInts) {
file << i;
}
file.close();
You seem to be confused between text and binary files. The << operator is used for text files. It converts the value to text and writes that to the file. You need to use the write method to write an integer in native binary format to a file. The below would write out the 4 bytes to the file.
file.write( reinterpret_cast<const char *>(&i), sizeof(i));
You may also need to consider the endianness of data depending on what will be reading the data back.
You could also write the whole vector without a loop using:
file.write( reinterpret_cast<const char *>(&vectorOfInts[0]), vectorOfInts.size()*sizeof(int));

File Size to store an integer

I want to write an integer (for ex - 222222) to a text file in a way that the size of the file is reduced. If I write the integer in the form of a string, it takes 6 Bytes because of the six characters present. If I store the integer in the form of an integer, it again takes 6 Bytes. Why isn't the file size equal to 4 Bytes since an int takes 4 Bytes?
#include <iostream>
#include<stdlib.h>
#include<stdio.h>
using namespace std;
int main()
{
//char* x = "222222.2222";
//double x = 222222.2222;
int x = 222222;
FILE *fp = fopen("now.txt","w");
fprintf(fp,"%d",x);
return 0;
}
Here is the definition of fprintf:
writes the C string pointed by format to the stream.
So whatever you pass to the function, they are treated as a string, that's the output file all has 222222 stored in it.
If you want to store a integer rather than a string in the file, you could use: fwrite.
int x = 222222;
FILE *fp = fopen("now.txt","w");
fwrite(&x, sizeof(int), 1, fp);
Then the file stores: 0E 64 03 00 if you change you editor to hex mode. It's 4 bytes.
There is a simple reason behind this.
Whenever we write to file it's stored in characters. So when you write integer 222222 into a file it's written character by character not as an integer.
when you write integer as integer, that file turns in to a binary file.
When you write and read binary files, it's required to take care of the paddings , byte order etc.
The other way around is plain text and you read it as strings and with the help of libraries we convert it to integers.

What is the correct way to output hex data to a file?

I've read about [ostream] << hex << 0x[hex value], but I have some questions about it
(1) I defined my file stream, output, to be a hex output file stream, using output.open("BWhite.bmp",ios::binary);, since I did that, does that make the hex parameter in the output<< operation redundant?
(2)
If I have an integer value I wanted to store in the file, and I used this:
int i = 0;
output << i;
would i be stored in little endian or big endian? Will the endi-ness change based on which computer the program is executed or compiled on?
Does the size of this value depend on the computer it's run on? Would I need to use the hex parameter?
(3) Is there a way to output raw hex digits to a file? If I want the file to have the hex digit 43, what should I use?
output << 0x43 and output << hex << 0x43 both output ASCII 4, then ASCII 3.
The purpose of outputting these hex digits is to make the header for a .bmp file.
The formatted output operator << is for just that: formatted output. It's for strings.
As such, the std::hex stream manipulator tells streams to output numbers as strings formatted as hex.
If you want to output raw binary data, use the unformatted output functions only, e.g. basic_ostream::put and basic_ostream::write.
You could output an int like this:
int n = 42;
output.write(&n, sizeof(int));
The endianness of this output will depend on the architecture. If you wish to have more control, I suggest the following:
int32_t n = 42;
char data[4];
data[0] = static_cast<char>(n & 0xFF);
data[1] = static_cast<char>((n >> 8) & 0xFF);
data[2] = static_cast<char>((n >> 16) & 0xFF);
data[3] = static_cast<char>((n >> 24) & 0xFF);
output.write(data, 4);
This sample will output a 32 bit integer as little-endian regardless of the endianness of the platform. Be careful converting that back if char is signed, though.
You say
"Is there a way to output raw hex digits to a file? If I want the file to have the hex digit 43, what should I use? "
"Raw hex digits" will depend on the interpretation you do on a collection of bits. Consider the following:
Binary : 0 1 0 0 1 0 1 0
Hex : 4 A
Octal : 1 1 2
Decimal : 7 4
ASCII : J
All the above represents the same numeric quantity, but we interpret it differently.
So you can simply need to store the data as binary format, that is the exact bit pattern which is represent by the number.
EDIT1
When you open a file in text mode and write a number in it, say when you write 74 (as in above example) it will be stored as two ASCII character '7' and '4' . To avoid this open the file in binary mode ios::binary and write it with write () . Check http://courses.cs.vt.edu/~cs2604/fall00/binio.html#write
The purpose of outputting these hex digits is to make the header for a .bmp file.
You seem to have a large misconception of how files work.
The stream operators << generate text (human readable output). The .bmp file format is a binary format that is not human readable (will it is but its not nice and I would not read it without tools).
What you really want to do is generate binary output and place it the file:
char x = 0x43;
output.write(&x, sizeof(x));
This will write one byte of data with the hex value 0x43 to the output stream. This is the binary representation you want.
would i be stored in little endian or big endian? Will the endi-ness change based on which computer the program is executed or compiled on?
Neither; you are again outputting text (not binary data).
int i = 0;
output.write(reinterpret_cast<char*>(&i), sizeof(i)); // Writes the binary representation of i
Here you do need to worry about endianess (and size) of the integer value and this will vary depending on the hardware that you run your application on. For the value 0 there is not much tow worry about endianess but you should worry about the size of the integer.
I would stick some asserts into my code to validate the architecture is OK for the code. Then let people worry about if their architecture does not match the requirements:
int test = 0x12345678;
assert((sizeof(test) * CHAR_BITS == 32) && "BMP uses 32 byte ints");
assert((((char*)&test)[0] == 0x78) && "BMP uses little endian");
There is a family of functions that will help you with endianess and size.
http://www.gnu.org/s/hello/manual/libc/Byte-Order.html
Function: uint32_t htonl (uint32_t hostlong)
This function converts the uint32_t integer hostlong from host byte order to network byte order.
// Writing to a file
uint32_t hostValue = 0x12345678;
uint32_t network = htonl(hostValue);
output.write(&network, sizeof(network));
// Reading from a file
uint32_t network;
output.read(&network, sizeof(network);
uint32_t hostValue = ntohl(network); // convert back to platform specific value.
// Unfortunately the BMP was written with intel in-mind
// and thus all integers are in liitle-endian.
// network bye order (as used by htonl() and family) is big endian.
// So this may not be much us to you.
Last thing. When you open a file in binary format output.open("BWhite.bmp",ios::binary) it does nothing to stream apart from how it treats the end of line sequence. When the file is in binary format the output is not modified (what you put in the stream is what is written to the file). If you leave the stream in text mode then '\n' characters are converted to the end of line sequence (OS specific set of characters that define the end of line). Since you are writing a binary file you definitely do not want any interference in the characters you write so binary is the correct format. But it does not affect any other operation that you perform on the stream.