Endian-ness in a char array containing binary characters

Endian-ness in a char array containing binary characters - c++

I'm building some code to read a RIFF wav file and I've bumped into something odd.
The first 4 bytes of the file header are the word RIFF in big-endian ascii coding:
0x5249 0x4646
I read this first element using:
char *fileID = new char[4];
filestream.read(fileID,4);
When I write this to screen the results are as expected:
std::cout << fileID << std::endl;
>> RIFF
Now, the next 4 bytes give the size of the file, but crucially they're little-endian.
So, I write a little function to flip the bytes, based on a union:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[3];
flip.flip_char[1] = input[2];
flip.flip_char[2] = input[1];
flip.flip_char[3] = input[0];
return flip.flip_int;
}
This looks good to me, except when I call it, the value returned is totally wrong. Interestingly, the following code (where the bytes are not reversed!) works correctly:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[0];
flip.flip_char[1] = input[1];
flip.flip_char[2] = input[2];
flip.flip_char[3] = input[3];
return flip.flip_int;
}
This has thoroughly confused me. Is the union somehow reversing the bytes for me?! If not, how are the bytes being converted to int correctly without being reversed?
I think there's some facet of endian-ness here that I'm ignorant to..

You are simply on a little-endian machine, and the "RIFF" string is just a string and thus neither little- nor big-endian, but just a sequence of chars. You don't need to reverse the bytes on a little-endian machine, but you need to when operating on a big-endian.

You need to figure of the endianess of your machine. #include <sys/param.h> will help you do that.
You could also use the fact that network byte order is big ended (if my memory serves me correctly - you need to check). In which case convert to big ended and use the ntohs function. That should work on any machine that you compile the code on.

Related

How to store Huffman Codes in a binary file c++?

I was working on a Huffman project to compress text files. I was able to generate the required codes. I read the whole file and accordingly stored the codes in a "vector char" variable. I also padded the encoded vector.
vector<char> padding(vector<char> text)
{
int num = text.size();
unsigned int pad_value = 32-(num%32);
for(int i=0;i<pad_value;i++){
text.push_back('0');
}
string pad_info = bitset<32>(pad_value).to_string();
for(int i=pad_info.length()-1;i>=0;i--){
text.insert(text.begin(),pad_info[i]);
}
return text;
}
I padded on the base of 32 bits, as I was thinking if using an array of "unsigned int" to directly store the integers in a binary file so that they occupy 4 bytes for every 32 characters. I used this function for that:
vector<unsigned int> build_byte_array(vector<char> padded_text)
{
vector<unsigned int> byte_arr;
for(int i=0;i<padded_text.size();i+=32)
{
string byte="";
for(int j=i;j<i+32;j++){
byte += padded_text[j];
}
unsigned int b = stoul(byte,nullptr,2);
//cout<<b<<":"<<byte<<endl;
byte_arr.push_back(b);
}
return byte_arr;
}
Now the problem is when I write this byte array to binary file using
ofstream output("compressed.bin",ios::binary);
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
}
I get a binary file which is bigger than the original text file. How do I solve that or what error am I making.
Edit : I tried to compress a file of about 2,493 KB (for testing purposes) and it generated a compressed.bin file of 3,431 KB. So, I don't think padding is the issue here.
I also tried with 15KB file but the size of always increases after using this algo.
I tried using:
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
char b = (char)a;
output.write((char*)(&a),sizeof(b));
}
but after using this I am unable to recover the original byte array when decompressing the file.

unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
The size of the write is sizeof(a) which is usually 4 bytes.
An unsigned int is not a byte. A more suitable type for a byte would be std::byte, uint8_t, or unsigned char.

You are expanding your data with padding, so if you're not getting much compression or there's not much data to begin with, the output could easily be larger.
You don't need to pad nearly as much as you do. First off, you are adding 32 bits when the data already ends on a word boundary (when num is a multiple of 32). Pad zero bits in that case. Second, you are inserting 32 bits at the start to record how many bits you padded, where five bits would suffice to encode 0..31. Third, you could write bytes instead of ints, so the padding on the end could be 0..7 bits, and you could prepend three bits instead of five. The padding overall could be reduced from your current 33..64 bits to 3..10 bits.

ifstream doesn't load whole file

name of this topic is probably incorrect, but I haven't idea how to name this issue.
Something about background, I am programming one game, which 3d surface is divided to chunks. I thought out a saving mechanism, in which all chunk objects with their properties are saved in compressed form to unordered map, which is then serialized to file, so parts of world can be loaded and saved effectively regarding current needs.
Of course, when loading, file is loaded, deserialized to unordered map and strings are converted to chunks objects in real time.
That is a plann, but with hard problems with realization.
I tried all possible searches, but without result. during my play with tests, I wrote a small test script like this:
#include <iostream>
#include <fstream>
#include <sstream>
int main()
{
std::ifstream reader("output.dat", std::ios::binary);
std::string data;
reader>>data;
reader.close();
std::cout<<data.size()<<std::endl;
std::stringstream ss;
ss.str(data);
unsigned char id_prefix=0, zone_prefix=1;
while (ss.peek()!=EOF)
{
unsigned char type;
ss>>type;
if (type==id_prefix)
{
unsigned char tempx, tempy, tempz;
unsigned short tempid;
if (!(ss>>tempx)) std::cout<<"reading of x failed."<<std::endl;
if (!(ss>>tempy)) std::cout<<"Reading of y failed"<<std::endl;
if (!(ss>>tempz)) std::cout<<"Reading of z failed."<<std::endl;
if (!(ss>>tempid)) std::cout<<"Reading of id failed, position is "+std::to_string(ss.tellg())+", values are "+std::to_string(type)+" "+std::to_string(tempx)+" "+std::to_string(tempy)+" "+std::to_string(tempz)<<std::endl;
std::cout<<(int)tempx<<" "<<(int)tempy<<" "<<(int)tempz<<" "<<(int)tempid<<std::endl;
}
else if (type==zone_prefix)
{
unsigned char tempx, tempy, tempz;
unsigned int tempzone;
ss>>tempx;
ss>>tempy;
ss>>tempz;
ss>>tempzone;
std::cout<<(int)tempx<<" "<<(int)tempy<<" "<<(int)tempz<<" "<<(int)tempzone<<std::endl;
}
}
}
Output.dat is a file with one experimental decompressed chunk to reproduce parsing process in the game.
You can download it from:
https://www.dropbox.com/s/mljsb0t6gvfedc5/output.dat?dl=1
if you want, it have about 160 kb in size. And here is a first problem.
It is probably only my stupidity, but I thought that when I use std::ios::binary to open ifstream, and then extract its content to string, it will load whole file, but it loaded only first 46 bytes.
That is first problem, next in the game, I used other system to load data which worked, but then stringstream processing as can be seen in lower part of code failed too around this position.
I guess there are problems also with data, as you can see, format is uchar type (indicates whether following bytes refer to id or zone), coordinates (each as uchar), and ushort in case of id, uint in case of zone.
But when I looked into the file with my own created binary editor, it showed id as one byte only, not two as I expected from short value. Saving was done also with stringstream, in form:
unsigned short tempid=3; //example value
ss<
and in result file this was represented as a 51 (in one byte), what is ascii code for 3, so I am little confused, or little more than little.
Can you please help me with this? I am using Mingw g++ 4.9.3 on win7 64-bit.
Thanks much!
Edit from 1.1.2017
Now whole file is read in stringstream, but extraction of values still fails.
When >> extraction reads to the next whitespace, how is it with extraction to unsigned short for example?
I was playing with code bit, trying to change for example unsigned short tempid to unsigned char tempid.
And output does not make sense to me.
In short version, bytes like:
0;1;0;0;51
were read as type 0, x 1, y 0, z 0 and id 3 what is correct, even I don't understand why 51 is here instead of a 3.
Writing to the stream before seemed as:
unsigned short idtowrite=3;
ss<<idtowrite;
But when I changed unsigned short tempid to unsigned char tempid, it read it as type 0, x 1, y 0, z 0 and id 51, what is not correct, but I expect it from writed file.
I wouldn't solve it if it read correctly through full stream, but for some reason until 0;8;0;0;51 all is correct, and from 0;9;0;0;51, which is next to it fails, with x readed as 0, y as 0 and z as 51 and EOF is set.
I am thinking if reading haven't missed a byte, but I don't see a reason to do it.
Can you please recommend me some effective and working way how to store values in stringstream?
Thanks in advance!

std::ios::binary only has the effect of suppressing end-of-line conversion (so that e.g. \r\n in file is not converted to just \n in memory). It is certainly correct to supply this when dealing with binary files.
However, >> is still a formatted input function, which skips leading whitespace, terminates at whitespace and so on.
If you want to actually read the file as binary data, you must use the read function on the stream object.

Copy struct to char[] buffer

I have to copy the following structure to a char[] buffer.
struct AMG_ANGLES {
unsigned char bIsEnCrypted;
unsigned char bIsError;
unsigned short usErrorFlag;
unsigned char byteNumDABs;
unsigned short usBagId;
unsigned short usKvMa;
unsigned char byteDataType;
};
AMG_ANGLES struct_data;
struct_data.bIsEnCrypted = 1;
struct_data.bIsError = 1;
struct_data.usErrorFlag = 2;
struct_data.byteNumDABs = 1;
struct_data.usBagId =10;
struct_data.usKvMa=20;
struct_data.byteDataType = 1;
// here I am coping structure to a char buffer
char sendbuf[sizeof(struct_data)] = "";
memcpy(sendbuf,(char*)&struct_data, sizeof(struct_data));
on copy the buffer having first two unsigned char data and short (1,1,2) and size is only 3 bytes. reaming data was not copying.
Please help where i am doing wrong.
I tried following way also
memcpy(sendbuf+0, &struct_data.bIsEnCrypted, sizeof(struct_data.bIsEnCrypted));
memcpy(sendbuf+1, &struct_data.bIsError, sizeof(struct_data.bIsError));
memcpy(sendbuf+2, &struct_data.usErrorFlag, sizeof(struct_data.usErrorFlag));
memcpy(sendbuf+4, &struct_data.byteNumDABs, sizeof(struct_data.byteNumDABs));
memcpy(sendbuf+6, &struct_data.usBagId, sizeof(struct_data.usBagId));
memcpy(sendbuf+8, &struct_data.usKvMa, sizeof(struct_data.usKvMa));
memcpy(sendbuf+10, &struct_data.byteDataType, sizeof(struct_data.byteDataType));
same result I am getting.

Your code is fine; your approach to determine whether the contents of the buffer are correct is flawed.
You have not told us how you have determined that the contents of the buffer are wrong, but from your description I suspect that you did something like printf( "%s\n", sendbuf ). Well, that won't work, because your buffer does not really contain characters, it contains binary data.
Specifically, your short usErrorFlag is two bytes long, and since the value you store in it is 2, this means that it will be stored in sendbuf in two consecutive bytes, one byte will have the value of 0x02 and the next byte will have the value of 0x00. (Assuming, from hints in your description, that your hardware is "Little Endian".) So, when you try to view the contents of your sendbuf as a string, printf() will stop printing as soon as it encounters the 0x00 byte.
So, your code is correct. Proceed with sending your sendbuf to your UDP socket.

If I read "sendbuf" I immediately assume that you are sending data from one computer to another. These computers will have different compilers, the compilers will for example order their bytes in a different order. memcpy isn't going to work on all compilers.
I suggest you find where the contents of sendbuf is documented, and assign the individual bytes accordingly. For example
sendbuf [0] = struct_data.bIsEncrypted;
sendbuf [1] = struct_data.bIsError;
sendbuf [2] = struct_data.uIsErrorFlag >> 8;
sendbuf [3] = struct_data.uIsErrorFlag & 0xff;
This makes your code independent of byte ordering, independent of struct padding, independent of reordering of items once you are not using a POD, and so on. In your case I would bet money that there is at least padding between byteNumDABs and usBagId, and at the end.
(Bytes 2 and 3 might be exactly the other way round, that's why you find a spec for that data structure).

Convert big endian to little endian when reading from a binary file [duplicate]

This question already has answers here:
How do I convert between big-endian and little-endian values in C++?
(35 answers)
Closed 9 years ago.
I've been looking around how to convert big-endian to little-endians. But I didn't find any good that could solve my problem. It seem to be there's many way you can do this conversion. Anyway this following code works ok in a big-endian system. But how should I write a conversion function so it will work on little-endian system as well?
This is a homework, but it just an extra since the systems at school running big-endian system. It's just that I got curious and wanted to make it work on my home computer also
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream file;
file.open("file.bin", ios::in | ios::binary);
if(!file)
cerr << "Not able to read" << endl;
else
{
cout << "Opened" << endl;
int i_var;
double d_var;
while(!file.eof())
{
file.read( reinterpret_cast<char*>(&i_var) , sizeof(int) );
file.read( reinterpret_cast<char*>(&d_var) , sizeof(double) );
cout << i_var << " " << d_var << endl;
}
}
return 0;
}
Solved
So Big-endian VS Little-endian is just a reverse order of the bytes. This function i wrote seem to serve my purpose anyway. I added it here in case someone else would need it in future. This is for double only though, for integer either use the function torak suggested or you can modify this code by making it swap 4 bytes only.
double swap(double d)
{
double a;
unsigned char *dst = (unsigned char *)&a;
unsigned char *src = (unsigned char *)&d;
dst[0] = src[7];
dst[1] = src[6];
dst[2] = src[5];
dst[3] = src[4];
dst[4] = src[3];
dst[5] = src[2];
dst[6] = src[1];
dst[7] = src[0];
return a;
}

You could use a template for your endian swap that will be generalized for the data types:
#include <algorithm>
template <class T>
void endswap(T *objp)
{
unsigned char *memp = reinterpret_cast<unsigned char*>(objp);
std::reverse(memp, memp + sizeof(T));
}
Then your code would end up looking something like:
file.read( reinterpret_cast<char*>(&i_var) , sizeof(int) );
endswap( &i_var );
file.read( reinterpret_cast<char*>(&d_var) , sizeof(double) );
endswap( &d_var );
cout << i_var << " " << d_var << endl;

You might be interested in the ntohl family of functions. These are designed to transform data from network to host byte order. Network byte order is big endian, therefore on big endian systems they don't do anything, while the same code compiled on a little endian system will perform the appropriate byte swaps.

Linux provides endian.h, which has efficient endian swapping routines up to 64-bit. It also automagically accounts for your system's endianness. The 32-bit functions are defined like this:
uint32_t htobe32(uint32_t host_32bits); // host to big-endian encoding
uint32_t htole32(uint32_t host_32bits); // host to lil-endian encoding
uint32_t be32toh(uint32_t big_endian_32bits); // big-endian to host encoding
uint32_t le32toh(uint32_t little_endian_32bits); // lil-endian to host encoding
with similarly-named functions for 16 and 64-bit.
So you just say
x = le32toh(x);
to convert a 32-bit integer in little-endian encoding to the host CPU encoding. This is useful for reading little-endian data.
x = htole32(x);
will convert from the host encoding to 32-bit little-endian. This is useful for writing little-endian data.
Note on BSD systems, the equivalent header file is sys/endian.h

Assuming you're going to be going on, it's handy to keep a little library file of helper functions. 2 of those functions should be endian swaps for 4 byte values, and 2 byte values. For some solid examples (including code) check out this article.
Once you've got your swap functions, any time you read in a value in the wrong endian, call the appropriate swap function. Sometimes a stumbling point for people here is that single byte values do not need to be endian swapped, so if you're reading in something like a character stream that represents a string of letters from a file, that should be good to go. It's only when you're reading in a value this is multiple bytes (like an integer value) that you have to swap them.

It is good to add that MS has this supported on VS too check this inline functions:
htond
htonf
htonl
htonll
htons

C/C++ read a byte from an hexinput from stdin

Can't exactly find a way on how to do the following in C/C++.
Input : hexdecimal values, for example: ffffffffff...
I've tried the following code in order to read the input :
uint16_t twoBytes;
scanf("%x",&twoBytes);
Thats works fine and all, but how do I split the 2bytes in 1bytes uint8_t values (or maybe even read the first byte only). Would like to read the first byte from the input, and store it in a byte matrix in a position of choosing.
uint8_t matrix[50][50]
Since I'm not very skilled in formating / reading from input in C/C++ (and have only used scanf so far) any other ideas on how to do this easily (and fast if it goes) is greatly appreciated .
Edit: Found even a better method by using the fread function as it lets one specify how many bytes it should read from the stream (stdin in this case) and save to a variable/array.
size_t fread ( void * ptr, size_t size, size_t count, FILE * stream );
Parameters
ptr - Pointer to a block of memory with a minimum size of (size*count) bytes.
size - Size in bytes of each element to be read.
count - Number of elements, each one with a size of size bytes.
stream - Pointer to a FILE object that specifies an input stream.
cplusplus ref

%x reads an unsigned int, not a uint16_t (thought they may be the same on your particular platform).
To read only one byte, try this:
uint32_t byteTmp;
scanf("%2x", &byteTmp);
uint8_t byte = byteTmp;
This reads an unsigned int, but stops after reading two characters (two hex characters equals eight bits, or one byte).

You should be able to split the variable like this:
uint8_t LowerByte=twoBytes & 256;
uint8_t HigherByte=twoBytes >> 8;

A couple of thoughts:
1) read it as characters and convert it manually - painful
2) If you know that there are a multiple of 4 hexits, you can just read in twobytes and then convert to one-byte values with high = twobytes << 8; low = twobyets & FF;
3) %2x

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Endian-ness in a char array containing binary characters - c++

You are simply on a little-endian machine, and the "RIFF" string is just a string and thus neither little- nor big-endian, but just a sequence of chars. You don't need to reverse the bytes on a little-endian machine, but you need to when operating on a big-endian.

Related

How to store Huffman Codes in a binary file c++?

ifstream doesn't load whole file

Copy struct to char[] buffer

Convert big endian to little endian when reading from a binary file [duplicate]

C/C++ read a byte from an hexinput from stdin

Categories

Resources