This question already has answers here:
How to read little endian integers from file in C++?
(5 answers)
Closed 7 years ago.
Say I have a binary file; it contains positive binary numbers, but written in big endian as 32-bit integers
How do I read this file? I have this right now.
int main() {
FILE * fp;
char buffer[4];
int num = 0;
fp=fopen("file.bin","rb");
while ( fread(&buffer, 1, 4,fp) != 0) {
// I think buffer should be 32 bit integer I read,
// how can I let num equal to 32 bit big endian integer?
}
return 0;
}
Declare your buffer as:
unsigned char buffer[4];
and you may use this to convert endianess:
int num = (int)buffer[0] | (int)buffer[1]<<8 | (int)buffer[2]<<16 | (int)buffer[3]<<24;
BTW
Of course this applies to x86 architectures that are little endian - otherwise your platform endianess may match your file's endianess so no conversion needed. This way you could read directly into your int without convesions.
You need to find out your endianess first:
How can I find Endian-ness of my PC programmatically using C?
Then you need to act accordingly. If you're the same as the file, you can read the value as is and if you are in a different endianess you need to reorder the bytes:
union Num
{
char buffer[4];
int num;
} num ;
void swapChars(char* pChar1, char* pChar2)
{
char temp = *pChar1;
*pChar1 = *pChar2;
*pChar2 = temp;
}
int swapOrder(Num num)
{
swapChar( &num.buffer[0], &num.buffer[3]);
swapChar( &num.buffer[1], &num.buffer[2]);
return num.num;
}
while ( fread(&num.buffer, 1, 4,fp) != 0)
{
int convertedNum;
if (1 == amIBigEndian)
{
convertedNum = num.num
}
else
{
convertedNum = swapOrder(num);
}
// Do what ever you want with convertedNum here...
}
It is operating system and processor architecture specific.
You might perhaps use routines like htonl(3) or ntohl etc...
but you really should have serialized in a well defined format.
On current machines (where I/O is very slow, w.r.t. CPU speed) I am in favor of using textual serialization formats like JSON, YAML, .... But you could also use binary serialization (and libraries) like BSON, XDR, ASN.1 or the s11n library....
If possible, improve the producer code (the one writing your file.bin file), and the consumer code accordingly.
Binary data is inherently brittle, because it is system and architecture specific. At the very least, document extremely well its format, and preferably give some tools to convert it from and to textual formats.
There are several JSON libraries for C++, like jsoncpp and rapidjson and for C like jansson etc etc...
Related
I have implemented the Huffman coding algorithm in C++, and it's working fine. I want to create a text compression algorithm.
behind every file or data in the digital world, there is 0/1.
I want to persist the sequence of bits(0/1) that are generated by the Huffman encoding algorithm in the file.
my goal is to save the number of bits used in the file to store. I'm storing metadata for decoding in a separate file. I want to write bit by bit data to file, and then read the same bit by bit in c++.
the problem I'm facing with the binary mode is that it not allowing me to put data bit by bit.
I want to put "10101" as bit by bit to file but it put asci values or 8-bits of each character at a time.
code
#include "iostream"
#include "fstream"
using namespace std;
int main(){
ofstream f;
f.open("./one.bin", ios::out | ios::binary);
f<<"10101";
f.close();
return 0;
}
output
any help or pointer to help is appreciated. thank you.
"Binary mode" means only that you have requested that the actual bytes you write are not corrupted by end-of-line conversions. (This is only a problem on Windows. No other system has the need to deliberately corrupt your data.)
You are still writing a byte at a time in binary mode.
To write bits, you accumulate them in an integer. For convenience, in an unsigned integer. This is your bit buffer. You need to decide whether to accumulate them from the least to most or from the most to least significant positions. Once you have eight or more bits accumulated, you write out one byte to your file, and remove those eight bits from the buffer.
When you're done, if there are bits left in your buffer, you write out those last one to seven bits to one byte. You need to carefully consider how exactly you do that, and how to know how many bits there were, so that you can properly decode the bits on the other end.
The accumulation and extraction are done using the bit operations in your language. In C++ (and many other languages), those are & (and), | (or), >> (right shift), and << (left shift).
For example, to insert one bit, x, into your buffer, and later three bits in y, ending up with the earliest bits in the most significant positions:
unsigned buf = 0, bits = 0;
...
// some loop
{
...
// write one bit (don't need the & if you know x is 0 or 1)
buf = (buf << 1) | (x & 1);
bits++;
...
// write three bits
buf = (buf << 3) | (y & 7);
bits += 3;
...
// write bytes from the buffer before it fills the integer length
if (bits >= 8) { // the if could be a while if expect 16 or more
// out is an ostream -- must be in binary mode if on Windows
bits -= 8;
out.put(buf >> bits);
}
...
}
...
// write any leftover bits (it is assumed here that bits is in 0..7 --
// if not, first repeat if or while from above to clear out bytes)
if (bits) {
out.put(buf << (8 - bits));
bits = 0;
}
...
i am trying to make an compress program that for example write instead regular 8 bits (char) only 1 or 2 bits, depend of the char we are trying to write. i tried to write with:
//I dont know what the function should return
char getBytes(char c)
{
return 0xff;
}
ofstream fout;
fout.open("file.bin", ios::binary | ios::out);
fout << getBytes(c);
but so far i succed writing only chars.
so how can i write for example: '01'? or only '1'? in what function i should use for write into file with only bytes? thanks.
Streams are sequences of bytes. There is no standard interface to write individual bits. If you want to write individual bits you’ll need to create your own abstraction of a bit stream built on top of streams. It would aggregate multiple bits into a byte which is then written to the underlying stream. If you want reasonable efficient writing you’ll probably need to aggregate multiple bytes before writing them to the stream.
A naïve implementation could look something like this:
class bitstream {
std::streambuf* sbuf;
int count = 0;
unsigned char byte = 0;
public:
bitstream(std::ostream& out): sbuf(out.rdbuf()) {}
~bitstream() {
if (this->count) {
this->sbuf->sputc(this->byte);
}
}
bitstream& operator<< (bool b) {
this->byte = (this->byte << 1) | b;
if (++this->count == 8) {
this->sbuf->sputc(this->byte);
this->count = 0;
}
return *this;
}
};
Note that this implementation has rather basic handling for the last byte: if a byte was started it will be written as is. Whether that is the intended behavior or whether it needs to be shifted for the written bits being in the top bits of the last byte depends on how things are being used. Also, there is no error handling for the case that the underlying stream couldn’t be written. (and the code is untest - I don’t have an easy way to compile on my mobile phone)
You’d use the class something like this:
int main() {
std::ofstream file(“bits.txt”, std::ios_base::binary);
bitstream out(file);
out << false << true << false << false
<< false << false << false << true;
}
Unless I messed things up, the above code should write A into the file bits.txt (the ASCII code for A is 65).
Just for context: files are actually organized into blocks of bytes. However, the stream abstraction aggregates the individual bytes written into blocks. Although byte oriented interfaces are provided by all popular operating systems writing anything but blocks of data tends to be rather inefficient.
I'm working on Huffman coding and I have built the chars frequency table with a
std::map<char,int> frequencyTable;
Then I have built the Huffman tree and then i have built the codes table in this way:
std::map<char,std::vector<bool> > codes;
Now I would read the input file, character by character ,and encode them through the codes table but i don't know how write bits into a binary output file.
Any advice?
UPDATE:
Now i'm trying with these functions:
void Encoder::makeFile()
{
char c,ch;
unsigned char ch2;
while(inFile.get(c))
{
ch=c;
//send the Huffman string to output file bit by bit
for(unsigned int i=0;i < codes[ch].size();++i)
{
if(codes[ch].at(i)==false){
ch2=0;
}else{
ch2=1;
}
encode(ch2, outFile);
}
}
ch2=2; // send EOF
encode(ch2, outFile);
inFile.close();
outFile.close();
}
and this:
void Encoder::encode(unsigned char i, std::ofstream & outFile)
{
int bit_pos=0; //0 to 7 (left to right) on the byte block
unsigned char c; //byte block to write
if(i<2) //if not EOF
{
if(i==1)
c |= (i<<(7-bit_pos)); //add a 1 to the byte
else //i==0
c=c & static_cast<unsigned char>(255-(1<<(7-bit_pos))); //add a 0
++bit_pos;
bit_pos%=8;
if(bit_pos==0)
{
outFile.put(c);
c='\0';
}
}
else
{
outFile.put(c);
}
}
but ,I don't know why ,it doesn't work, the loop is never executed and the encode function is never used, why?
You can't write a single bit directly to a file. The I/O unit of reading/writing is a byte (8-bits). So you need to pack your bools into chunks of 8 bits and then write the bytes. See Writing files in bit form to a file in C or How to write single bits to a file in C for example.
The C++ Standard streams support an access of the smallest unit the underlying CPU supports. That's a byte.
There are implementation of a bit stream class in C++ like the
Stanford Bitstream Class.
Another approach could use the std::bitset class.
I'm building some code to read a RIFF wav file and I've bumped into something odd.
The first 4 bytes of the file header are the word RIFF in big-endian ascii coding:
0x5249 0x4646
I read this first element using:
char *fileID = new char[4];
filestream.read(fileID,4);
When I write this to screen the results are as expected:
std::cout << fileID << std::endl;
>> RIFF
Now, the next 4 bytes give the size of the file, but crucially they're little-endian.
So, I write a little function to flip the bytes, based on a union:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[3];
flip.flip_char[1] = input[2];
flip.flip_char[2] = input[1];
flip.flip_char[3] = input[0];
return flip.flip_int;
}
This looks good to me, except when I call it, the value returned is totally wrong. Interestingly, the following code (where the bytes are not reversed!) works correctly:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[0];
flip.flip_char[1] = input[1];
flip.flip_char[2] = input[2];
flip.flip_char[3] = input[3];
return flip.flip_int;
}
This has thoroughly confused me. Is the union somehow reversing the bytes for me?! If not, how are the bytes being converted to int correctly without being reversed?
I think there's some facet of endian-ness here that I'm ignorant to..
You are simply on a little-endian machine, and the "RIFF" string is just a string and thus neither little- nor big-endian, but just a sequence of chars. You don't need to reverse the bytes on a little-endian machine, but you need to when operating on a big-endian.
You need to figure of the endianess of your machine. #include <sys/param.h> will help you do that.
You could also use the fact that network byte order is big ended (if my memory serves me correctly - you need to check). In which case convert to big ended and use the ntohs function. That should work on any machine that you compile the code on.
This question already has answers here:
How do I convert between big-endian and little-endian values in C++?
(35 answers)
Closed 9 years ago.
I've been looking around how to convert big-endian to little-endians. But I didn't find any good that could solve my problem. It seem to be there's many way you can do this conversion. Anyway this following code works ok in a big-endian system. But how should I write a conversion function so it will work on little-endian system as well?
This is a homework, but it just an extra since the systems at school running big-endian system. It's just that I got curious and wanted to make it work on my home computer also
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream file;
file.open("file.bin", ios::in | ios::binary);
if(!file)
cerr << "Not able to read" << endl;
else
{
cout << "Opened" << endl;
int i_var;
double d_var;
while(!file.eof())
{
file.read( reinterpret_cast<char*>(&i_var) , sizeof(int) );
file.read( reinterpret_cast<char*>(&d_var) , sizeof(double) );
cout << i_var << " " << d_var << endl;
}
}
return 0;
}
Solved
So Big-endian VS Little-endian is just a reverse order of the bytes. This function i wrote seem to serve my purpose anyway. I added it here in case someone else would need it in future. This is for double only though, for integer either use the function torak suggested or you can modify this code by making it swap 4 bytes only.
double swap(double d)
{
double a;
unsigned char *dst = (unsigned char *)&a;
unsigned char *src = (unsigned char *)&d;
dst[0] = src[7];
dst[1] = src[6];
dst[2] = src[5];
dst[3] = src[4];
dst[4] = src[3];
dst[5] = src[2];
dst[6] = src[1];
dst[7] = src[0];
return a;
}
You could use a template for your endian swap that will be generalized for the data types:
#include <algorithm>
template <class T>
void endswap(T *objp)
{
unsigned char *memp = reinterpret_cast<unsigned char*>(objp);
std::reverse(memp, memp + sizeof(T));
}
Then your code would end up looking something like:
file.read( reinterpret_cast<char*>(&i_var) , sizeof(int) );
endswap( &i_var );
file.read( reinterpret_cast<char*>(&d_var) , sizeof(double) );
endswap( &d_var );
cout << i_var << " " << d_var << endl;
You might be interested in the ntohl family of functions. These are designed to transform data from network to host byte order. Network byte order is big endian, therefore on big endian systems they don't do anything, while the same code compiled on a little endian system will perform the appropriate byte swaps.
Linux provides endian.h, which has efficient endian swapping routines up to 64-bit. It also automagically accounts for your system's endianness. The 32-bit functions are defined like this:
uint32_t htobe32(uint32_t host_32bits); // host to big-endian encoding
uint32_t htole32(uint32_t host_32bits); // host to lil-endian encoding
uint32_t be32toh(uint32_t big_endian_32bits); // big-endian to host encoding
uint32_t le32toh(uint32_t little_endian_32bits); // lil-endian to host encoding
with similarly-named functions for 16 and 64-bit.
So you just say
x = le32toh(x);
to convert a 32-bit integer in little-endian encoding to the host CPU encoding. This is useful for reading little-endian data.
x = htole32(x);
will convert from the host encoding to 32-bit little-endian. This is useful for writing little-endian data.
Note on BSD systems, the equivalent header file is sys/endian.h
Assuming you're going to be going on, it's handy to keep a little library file of helper functions. 2 of those functions should be endian swaps for 4 byte values, and 2 byte values. For some solid examples (including code) check out this article.
Once you've got your swap functions, any time you read in a value in the wrong endian, call the appropriate swap function. Sometimes a stumbling point for people here is that single byte values do not need to be endian swapped, so if you're reading in something like a character stream that represents a string of letters from a file, that should be good to go. It's only when you're reading in a value this is multiple bytes (like an integer value) that you have to swap them.
It is good to add that MS has this supported on VS too check this inline functions:
htond
htonf
htonl
htonll
htons