Reading Binary Files Using Bitwise Shifters and Buffers in C++ - c++

I'm trying to read a binary file and simple convert the data to usable unsigned integers. The code below works for 2-byte reading, for certain file locations, and correctly prints the unsigned integer. When I use the 4-byte code though my value turns out to be a number much larger than it is supposed to be. I believe the issue lies within the read function, it seems as though I am getting the wrong character/decimal number (101 for example) which when bit shifted becomes a number much larger than it should be (~6662342).(when the program runs it throws an exception every now and then "stack around the variable buf runtime error #2" in visual studios). Any ideas? It may be my fundamental knowledge of how the data is stored in the char array that is affecting my data output.
working 2-byte code
unsigned char buf[2];
file.seekg(3513);
uint64_t value = readBufferLittleEndian(buf, &file);
printf("%i", value);
system("PAUSE");
return 0;
}
uint64_t readBufferLittleEndian(unsigned char buf[], std::ifstream *file)
{
file->read((char*)(&buf[0]), 2);
return (buf[1] << 8 | buf[0]);
}
broken 4-byte code
unsigned char buf[8 + 1]; //= { 0, 2 , 0 , 0 , 0 , 0 , 0, 0, 0 };
uint64_t buf1[9];
file.seekg(3213);
uint64_t value = readBufferLittleEndian(buf, &file, buf1);
std::cout << value;
system("PAUSE");
return 0;
}
uint64_t readBufferLittleEndian(unsigned char buf[], std::ifstream *file, uint64_t buf1[])
{
file->read((char*)(&buf[0]), 4);
for (int index = 0; index < 4; index++)
{
buf1[index] = buf[index];
}
buf1[0];
buf1[1];
buf1[2];
buf1[3];
//return (buf1[7] << 56 | buf1[6] << 48 | buf1[5] << 40 | buf1[4] << 32 | buf1[3] << 24 | buf1[2] << 16 | buf1[1] << 8 | buf1[0]);
return (buf1[3] << 24 | buf1[2] << 16 | buf1[1] << 8 | buf1[0]);
//return (buf1[1] << 8 | buf1[0]);
}
Please correct me if I got the endianess reversed.
code is C++ except for the printf line

you have to cast before shifting. You cannot shift a char left 56 bits.
ie do ((uint64_t)buf[n] << NN

Seekg(0) = byte 1, seekg(3212) = byte 3213. Not entirely sure why I was getting a zero in byte 3214 before, considering I now get 220 (indicating big-endianess). Getting 220 would have indicated that I was interpreting the functionality of seekg(). Oh well it is solved where it matters now anyway.

Related

What is *(int*)&data[18] actually doing in this code?

I came across this syntax for reading a BMP file in C++
#include <fstream>
int main() {
std::ifstream in('filename.bmp', std::ifstream::binary);
in.seekg(0, in.end);
size = in.tellg();
in.seekg(0);
unsigned char * data = new unsigned char[size];
in.read((unsigned char *)data, size);
int width = *(int*)&data[18];
// omitted remainder for minimal example
}
and I don't understand what the line
int width = *(int*)&data[18];
is actually doing. Why doesn't a simple cast from unsigned char * to int, int width = (int)data[18];, work?
Note
As #user4581301 indicated in the comments, this depends on the implementation and will fail in many instances. And as #NathanOliver- Reinstate Monica and #ChrisMM pointed out this is Undefined Behavior and the result is not guaranteed.
According to the bitmap header format, the width of the bitmap in pixels is stored as a signed 32-bit integer beginning at byte offset 18. The syntax
int width = *(int*)&data[18];
reads bytes 19 through 22, inclusive (assuming a 32-bit int) and interprets the result as an integer.
How?
&data[18] gets the address of the unsigned char at index 18
(int*) casts the address from unsigned char* to int* to avoid loss of precision on 64 bit architectures
*(int*) dereferences the address to get the referred int value
So basically, it takes the address of data[18] and reads the bytes at that address as if they were an integer.
Why doesn't a simple cast to `int` work?
sizeof(data[18]) is 1, because unsigned char is one byte (0-255) but sizeof(&data[18]) is 4 if the system is 32-bit and 8 if it is 64-bit, this can be larger (or even smaller for 16-bit systems) but with the exception of 16-bit systems it should be at minimum 4 bytes. Obviously reading more than 4 bytes is not desired in this case, and the cast to (int*) and subsequent dereference to int yields 4 bytes, and indeed the 4 bytes between offsets 18 and 21, inclusive. A simple cast from unsigned char to int will also yield 4 bytes, but only one byte of the information from data. This is illustrated by the following example:
#include <iostream>
#include <bitset>
int main() {
// Populate 18-21 with a recognizable pattern for demonstration
std::bitset<8> _bits(std::string("10011010"));
unsigned long bits = _bits.to_ulong();
for (int ii = 18; ii < 22; ii ++) {
data[ii] = static_cast<unsigned char>(bits);
}
std::cout << "data[18] -> 1 byte "
<< std::bitset<32>(data[18]) << std::endl;
std::cout << "*(unsigned short*)&data[18] -> 2 bytes "
<< std::bitset<32>(*(unsigned short*)&data[18]) << std::endl;
std::cout << "*(int*)&data[18] -> 4 bytes "
<< std::bitset<32>(*(int*)&data[18]) << std::endl;
}
data[18] -> 1 byte 00000000000000000000000010011010
*(unsigned short*)&data[18] -> 2 bytes 00000000000000001001101010011010
*(int*)&data[18] -> 4 bytes 10011010100110101001101010011010

Alternative to perl unpack in c++ [duplicate]

How do I write C++ code that does what the pack -N option does in Perl?
I want to convert an integer variable to some binary form such that the unpack -N option on it gives back the integer variable.
My integer variable name is timestamp.
I found that it is related to htonl, but still htonl(timestamp) does not give the binary form.
I wrote a library, libpack, similar to Perl's pack function. It's a C library so it would be quite usable from C++ as well:
FILE *f;
fpack(f, "u32> u32>", value_a, value_b);
A u32 > specifies an unsigned 32-bit integer in big-endian format; i.e. equivalent to Perl's N format to pack().
http://www.leonerd.org.uk/code/libpack/
It takes 4 bytes and forms a 32-bit int as follows:
uint32_t n;
n = buf[0] << 24
| buf[1] << 16
| buf[2] << 8
| buf[3] << 0;
For example,
uint32_t n;
unsigned char buf[4];
size_t bytes_read = fread(buf, 1, 4, stream);
if (bytes_read < 4) {
if (ferror(stream)) {
// Error
// ...
}
else if (feof(stream)) {
// Premature EOF
// ...
}
}
else {
n = buf[0] << 24
| buf[1] << 16
| buf[2] << 8
| buf[3] << 0;
}

C++ equivalent of 'pack' in Perl

How do I write C++ code that does what the pack -N option does in Perl?
I want to convert an integer variable to some binary form such that the unpack -N option on it gives back the integer variable.
My integer variable name is timestamp.
I found that it is related to htonl, but still htonl(timestamp) does not give the binary form.
I wrote a library, libpack, similar to Perl's pack function. It's a C library so it would be quite usable from C++ as well:
FILE *f;
fpack(f, "u32> u32>", value_a, value_b);
A u32 > specifies an unsigned 32-bit integer in big-endian format; i.e. equivalent to Perl's N format to pack().
http://www.leonerd.org.uk/code/libpack/
It takes 4 bytes and forms a 32-bit int as follows:
uint32_t n;
n = buf[0] << 24
| buf[1] << 16
| buf[2] << 8
| buf[3] << 0;
For example,
uint32_t n;
unsigned char buf[4];
size_t bytes_read = fread(buf, 1, 4, stream);
if (bytes_read < 4) {
if (ferror(stream)) {
// Error
// ...
}
else if (feof(stream)) {
// Premature EOF
// ...
}
}
else {
n = buf[0] << 24
| buf[1] << 16
| buf[2] << 8
| buf[3] << 0;
}

c++ 64 bit network to host translation

I know there are answers for this question using using gcc byteswap and other alternatives on the web but was wondering why my code below isn't working.
Firstly I have gcc warnings ( which I feel shouldn't be coming ) and reason why I don't want to use byteswap is because I need to determine if my machine is big endian or little endian and use byteswap accordingly i.,e if my machine is big endian I could memcpy the bytes as is without any translation otherwise I need to swap them and copy it.
static inline uint64_t ntohl_64(uint64_t val)
{
unsigned char *pp =(unsigned char *)&val;
uint64_t val2 = ( pp[0] << 56 | pp[1] << 48
| pp[2] << 40 | pp[3] << 32
| pp[4] << 24 | pp[5] << 16
| pp[6] << 8 | pp[7]);
return val2;
}
int main()
{
int64_t a=0xFFFF0000;
int64_t b=__const__byteswap64(a);
int64_t c=ntohl_64(a);
printf("\n %lld[%x] [%lld] [%lld]\n ", a, a, b, c);
}
Warnings:-
In function \u2018uint64_t ntohl_64(uint64_t)\u2019:
warning: left shift count >= width of type
warning: left shift count >= width of type
warning: left shift count >= width of type
warning: left shift count >= width of type
Output:-
4294901760[00000000ffff0000] 281470681743360[0000ffff00000000] 65535[000000000000ffff]
I am running this on a little endian machine so byteswap and ntohl_64 should result in exact same values but unfortunately I get completely unexpected results. It would be great if someone can pointout whats wrong.
The reason your code does not work is because you're shifting unsigned chars. As they shift the bits fall off the top and any shift greater than 7 can be though of as returning 0 (though some implementations end up with weird results due to the way the machine code shifts work, x86 is an example). You have to cast them to whatever you want the final size to be first like:
((uint64_t)pp[0]) << 56
Your optimal solution with gcc would be to use htobe64. This function does everything for you.
P.S. It's a little bit off topic, but if you want to make the function portable across endianness you could do:
Edit based on Nova Denizen's comment:
static inline uint64_t htonl_64(uint64_t val)
{
union{
uint64_t retVal;
uint8_t bytes[8];
};
bytes[0] = (val & 0x00000000000000ff);
bytes[1] = (val & 0x000000000000ff00) >> 8;
bytes[2] = (val & 0x0000000000ff0000) >> 16;
bytes[3] = (val & 0x00000000ff000000) >> 24;
bytes[4] = (val & 0x000000ff00000000) >> 32;
bytes[5] = (val & 0x0000ff0000000000) >> 40;
bytes[6] = (val & 0x00ff000000000000) >> 48;
bytes[7] = (val & 0xff00000000000000) >> 56;
return retVal;
}
static inline uint64_t ntohl_64(uint64_t val)
{
union{
uint64_t inVal;
uint8_t bytes[8];
};
inVal = val;
return bytes[0] |
((uint64_t)bytes[1]) << 8 |
((uint64_t)bytes[2]) << 16 |
((uint64_t)bytes[3]) << 24 |
((uint64_t)bytes[4]) << 32 |
((uint64_t)bytes[5]) << 40 |
((uint64_t)bytes[6]) << 48 |
((uint64_t)bytes[7]) << 56;
}
Assuming the compiler doesn't do something to the uint64_t on it's way back through the return, and assuming the user treats the result as an 8-byte value (and not an integer), that code should work on any system. With any luck, your compiler will be able to optimize out the whole expression if you're on a big endian system and use some builtin byte swapping technique if you're on a little endian machine (and it's guaranteed to still work on any other kind of machine).
uint64_t val2 = ( pp[0] << 56 | pp[1] << 48
| pp[2] << 40 | pp[3] << 32
| pp[4] << 24 | pp[5] << 16
| pp[6] << 8 | pp[7]);
pp[0] is an unsigned char and 56 is an int, so pp[0] << 56 performs the left-shift as an unsigned char, with an unsigned char result. This isn't what you want, because you want all these shifts to have type unsigned long long.
The way to fix this is to cast, like ((unsigned long long)pp[0]) << 56.
Since pp[x] is 8-bit wide, the expression pp[0] << 56 results in zero. You need explicit masking on the original value and then shifting:
uint64_t val2 = (( val & 0xff ) << 56 ) |
(( val & 0xff00 ) << 48 ) |
...
In any case, just use compiler built-ins, they usually result in a single byte-swapping instruction.
Casting and shifting works as PlasmaHH suggesting but I don't know why 32 bit shifts upconvert automatically and not 64 bit.
typedef uint64_t __u64;
static inline uint64_t ntohl_64(uint64_t val)
{
unsigned char *pp =(unsigned char *)&val;
return ((__u64)pp[0] << 56 |
(__u64)pp[1] << 48 |
(__u64)pp[2] << 40 |
(__u64)pp[3] << 32 |
(__u64)pp[4] << 24 |
(__u64)pp[5] << 16 |
(__u64)pp[6] << 8 |
(__u64)pp[7]);
}

How to simply reconstruct numbers from a buffer in little endian format

Suppose I have:
typedef unsigned long long uint64;
unsigned char data[BUF_SIZE];
uint64 MyPacket::GetCRC()
{
return (uint64)(data[45] | data[46] << 8 |
data[47] << 16 | data[48] << 24 |
(uint64)data[49] << 32| (uint64)data[50] << 40 |
(uint64)data[51] << 48| (uint64)data[52] << 56);
}
Just wondering, if there is an cleaner way. I tried a memcpy to an uint64 variable
but that gives me the wrong value. I think I need the reverse. The data is in little endian format.
The big advantage of using the shift-or sequence is that it will work regardless if your host machine is big- or little-endian.
Of course, you would always tweak the expression. Personally, I try to join "pairs", that is two bytes at a time, then two shorts, and finally two longs, as this will help compilers to generate better code.
Well, maybe better idea is to swap order + cast?
typedef unsigned long long uint64;
unsigned char data[BUF_SIZE];
uint64 MyPacket::GetCRC()
{
uint64 retval;
unsigned char *rdata = reinterpret_cast<unsigned char*>(&retval);
for(unsigned i = 0; i < 8; ++i) rdata[i] = data[52-i];
return retval;
}
Here's one of my own that is similar to that provided by #x13n.
uint64 MyPacket::GetCRC()
{
int offset=45;
uint64 crc;
memcpy(&crc, data+offset, 8);
//std::reverse((char*)&crc, (char*)&crc + 8); // if this was a big endian machine
return crc;
}
Nothing much wrong with what you have there to be honest. It will work and it's quick.
The thing I'd change would be to format it a little better for readability:
uint64 MyPacket::GetCRC()
{
return (uint64) data[45] |
(uint64) data[46] << 8 |
(uint64) data[47] << 16 |
(uint64) data[48] << 24 |
(uint64) data[49] << 32 |
(uint64) data[50] << 40 |
(uint64) data[51] << 48 |
(uint64) data[52] << 56;
}
I guess your other option would be to do it in a loop instead:
uint64 MyPacket::GetCRC()
{
const int crcoffset = 45;
uint64 crc = 0;
for (int i = 0; i < 8; i++)
{
crc |= (uint64)data[i + crcoffset] << (i * 8);
}
return crc;
}
That would probably result in very similar assembly (as the compiler would probably do loop-unwinding for such a small loop) but it is a bit harder to grok in my opinion so you are better off with what you have.