c++ 64 bit network to host translation - c++

I know there are answers for this question using using gcc byteswap and other alternatives on the web but was wondering why my code below isn't working.
Firstly I have gcc warnings ( which I feel shouldn't be coming ) and reason why I don't want to use byteswap is because I need to determine if my machine is big endian or little endian and use byteswap accordingly i.,e if my machine is big endian I could memcpy the bytes as is without any translation otherwise I need to swap them and copy it.
static inline uint64_t ntohl_64(uint64_t val)
{
unsigned char *pp =(unsigned char *)&val;
uint64_t val2 = ( pp[0] << 56 | pp[1] << 48
| pp[2] << 40 | pp[3] << 32
| pp[4] << 24 | pp[5] << 16
| pp[6] << 8 | pp[7]);
return val2;
}
int main()
{
int64_t a=0xFFFF0000;
int64_t b=__const__byteswap64(a);
int64_t c=ntohl_64(a);
printf("\n %lld[%x] [%lld] [%lld]\n ", a, a, b, c);
}
Warnings:-
In function \u2018uint64_t ntohl_64(uint64_t)\u2019:
warning: left shift count >= width of type
warning: left shift count >= width of type
warning: left shift count >= width of type
warning: left shift count >= width of type
Output:-
4294901760[00000000ffff0000] 281470681743360[0000ffff00000000] 65535[000000000000ffff]
I am running this on a little endian machine so byteswap and ntohl_64 should result in exact same values but unfortunately I get completely unexpected results. It would be great if someone can pointout whats wrong.

The reason your code does not work is because you're shifting unsigned chars. As they shift the bits fall off the top and any shift greater than 7 can be though of as returning 0 (though some implementations end up with weird results due to the way the machine code shifts work, x86 is an example). You have to cast them to whatever you want the final size to be first like:
((uint64_t)pp[0]) << 56
Your optimal solution with gcc would be to use htobe64. This function does everything for you.
P.S. It's a little bit off topic, but if you want to make the function portable across endianness you could do:
Edit based on Nova Denizen's comment:
static inline uint64_t htonl_64(uint64_t val)
{
union{
uint64_t retVal;
uint8_t bytes[8];
};
bytes[0] = (val & 0x00000000000000ff);
bytes[1] = (val & 0x000000000000ff00) >> 8;
bytes[2] = (val & 0x0000000000ff0000) >> 16;
bytes[3] = (val & 0x00000000ff000000) >> 24;
bytes[4] = (val & 0x000000ff00000000) >> 32;
bytes[5] = (val & 0x0000ff0000000000) >> 40;
bytes[6] = (val & 0x00ff000000000000) >> 48;
bytes[7] = (val & 0xff00000000000000) >> 56;
return retVal;
}
static inline uint64_t ntohl_64(uint64_t val)
{
union{
uint64_t inVal;
uint8_t bytes[8];
};
inVal = val;
return bytes[0] |
((uint64_t)bytes[1]) << 8 |
((uint64_t)bytes[2]) << 16 |
((uint64_t)bytes[3]) << 24 |
((uint64_t)bytes[4]) << 32 |
((uint64_t)bytes[5]) << 40 |
((uint64_t)bytes[6]) << 48 |
((uint64_t)bytes[7]) << 56;
}
Assuming the compiler doesn't do something to the uint64_t on it's way back through the return, and assuming the user treats the result as an 8-byte value (and not an integer), that code should work on any system. With any luck, your compiler will be able to optimize out the whole expression if you're on a big endian system and use some builtin byte swapping technique if you're on a little endian machine (and it's guaranteed to still work on any other kind of machine).

uint64_t val2 = ( pp[0] << 56 | pp[1] << 48
| pp[2] << 40 | pp[3] << 32
| pp[4] << 24 | pp[5] << 16
| pp[6] << 8 | pp[7]);
pp[0] is an unsigned char and 56 is an int, so pp[0] << 56 performs the left-shift as an unsigned char, with an unsigned char result. This isn't what you want, because you want all these shifts to have type unsigned long long.
The way to fix this is to cast, like ((unsigned long long)pp[0]) << 56.

Since pp[x] is 8-bit wide, the expression pp[0] << 56 results in zero. You need explicit masking on the original value and then shifting:
uint64_t val2 = (( val & 0xff ) << 56 ) |
(( val & 0xff00 ) << 48 ) |
...
In any case, just use compiler built-ins, they usually result in a single byte-swapping instruction.

Casting and shifting works as PlasmaHH suggesting but I don't know why 32 bit shifts upconvert automatically and not 64 bit.
typedef uint64_t __u64;
static inline uint64_t ntohl_64(uint64_t val)
{
unsigned char *pp =(unsigned char *)&val;
return ((__u64)pp[0] << 56 |
(__u64)pp[1] << 48 |
(__u64)pp[2] << 40 |
(__u64)pp[3] << 32 |
(__u64)pp[4] << 24 |
(__u64)pp[5] << 16 |
(__u64)pp[6] << 8 |
(__u64)pp[7]);
}

Related

C++ Bitshift 4 int_8t into a normal integer (32 bit )

I had already asked a question how to get 4 int8_t into a 32bit int, I was told that I have to cast the int8_t to a uint8_t first to pack it with bitshifting into a 32bit integer.
int8_t offsetX = -10;
int8_t offsetY = 120;
int8_t offsetZ = -60;
using U = std::uint8_t;
int toShader = (U(offsetX) << 24) | (U(offsetY) << 16) | (U(offsetZ) << 8) | (0 << 0);
std::cout << (int)(toShader >> 24) << " "<< (int)(toShader >> 16) << " " << (int)(toShader >> 8) << std::endl;
My Output is
-10 -2440 -624444
It's not what I expected, of course, does anyone have a solution?
In the shader I want to unpack the int16 later and that is only possible with a 32bit integer because glsl does not have any other data types.
int offsetX = data[gl_InstanceID * 3 + 2] >> 24;
int offsetY = data[gl_InstanceID * 3 + 2] >> 16 ;
int offsetZ = data[gl_InstanceID * 3 + 2] >> 8 ;
What is written in the square bracket does not matter it is about the correct shifting of the bits or casting after the bracket.
If any of the offsets is negative, then the shift results in undefined behaviour.
Solution: Convert the offsets to an unsigned type first.
However, this brings another potential problem: If you convert to unsigned, then negative numbers will have very large values with set bits in most significant bytes, and OR operation with those bits will always result in 1 regardless of offsetX and offsetY. A solution is to convert into a small unsigned type (std::uint8_t), and another is to mask the unused bytes. Former is probably simpler:
using U = std::uint8_t;
int third = U(offsetX) << 24u
| U(offsetY) << 16u
| U(offsetZ) << 8u
| 0u << 0u;
I think you're forgetting to mask the bits that you care about before shifting them.
Perhaps this is what you're looking for:
int32 offsetX = (data[gl_InstanceID * 3 + 2] & 0xFF000000) >> 24;
int32 offsetY = (data[gl_InstanceID * 3 + 2] & 0x00FF0000) >> 16 ;
int32 offsetZ = (data[gl_InstanceID * 3 + 2] & 0x0000FF00) >> 8 ;
if (offsetX & 0x80) offsetX |= 0xFFFFFF00;
if (offsetY & 0x80) offsetY |= 0xFFFFFF00;
if (offsetZ & 0x80) offsetZ |= 0xFFFFFF00;
Without the bit mask, the X part will end up in offsetY, and the X and Y part in offsetZ.
on CPU side you can use union to avoid bit shifts and bit masking and branches ...
int8_t x,y,z,w; // your 8bit ints
int32_t i; // your 32bit int
union my_union // just helper union for the casting
{
int8_t i8[4];
int32_t i32;
} a;
// 4x8bit -> 32bit
a.i8[0]=x;
a.i8[1]=y;
a.i8[2]=z;
a.i8[3]=w;
i=a.i32;
// 32bit -> 4x8bit
a.i32=i;
x=a.i8[0];
y=a.i8[1];
z=a.i8[2];
w=a.i8[3];
If you do not like unions the same can be done with pointers...
Beware on GLSL side is this not possible (nor unions nor pointers) and you have to use bitshifts and masks like in the other answer...

Reading Binary Files Using Bitwise Shifters and Buffers in C++

I'm trying to read a binary file and simple convert the data to usable unsigned integers. The code below works for 2-byte reading, for certain file locations, and correctly prints the unsigned integer. When I use the 4-byte code though my value turns out to be a number much larger than it is supposed to be. I believe the issue lies within the read function, it seems as though I am getting the wrong character/decimal number (101 for example) which when bit shifted becomes a number much larger than it should be (~6662342).(when the program runs it throws an exception every now and then "stack around the variable buf runtime error #2" in visual studios). Any ideas? It may be my fundamental knowledge of how the data is stored in the char array that is affecting my data output.
working 2-byte code
unsigned char buf[2];
file.seekg(3513);
uint64_t value = readBufferLittleEndian(buf, &file);
printf("%i", value);
system("PAUSE");
return 0;
}
uint64_t readBufferLittleEndian(unsigned char buf[], std::ifstream *file)
{
file->read((char*)(&buf[0]), 2);
return (buf[1] << 8 | buf[0]);
}
broken 4-byte code
unsigned char buf[8 + 1]; //= { 0, 2 , 0 , 0 , 0 , 0 , 0, 0, 0 };
uint64_t buf1[9];
file.seekg(3213);
uint64_t value = readBufferLittleEndian(buf, &file, buf1);
std::cout << value;
system("PAUSE");
return 0;
}
uint64_t readBufferLittleEndian(unsigned char buf[], std::ifstream *file, uint64_t buf1[])
{
file->read((char*)(&buf[0]), 4);
for (int index = 0; index < 4; index++)
{
buf1[index] = buf[index];
}
buf1[0];
buf1[1];
buf1[2];
buf1[3];
//return (buf1[7] << 56 | buf1[6] << 48 | buf1[5] << 40 | buf1[4] << 32 | buf1[3] << 24 | buf1[2] << 16 | buf1[1] << 8 | buf1[0]);
return (buf1[3] << 24 | buf1[2] << 16 | buf1[1] << 8 | buf1[0]);
//return (buf1[1] << 8 | buf1[0]);
}
Please correct me if I got the endianess reversed.
code is C++ except for the printf line
you have to cast before shifting. You cannot shift a char left 56 bits.
ie do ((uint64_t)buf[n] << NN
Seekg(0) = byte 1, seekg(3212) = byte 3213. Not entirely sure why I was getting a zero in byte 3214 before, considering I now get 220 (indicating big-endianess). Getting 220 would have indicated that I was interpreting the functionality of seekg(). Oh well it is solved where it matters now anyway.

Conversion of big-endian long in C++?

I need a C++ function that returns the value of four consecutive bytes interpreted as a bigendian long. A pointer to the first byte should be updated to point after the last. I have tried the following code:
inline int32_t bigendianlong(unsigned char * &p)
{
return (((int32_t)*p++ << 8 | *p++) << 8 | *p++) << 8 | *p++;
}
For instance, if p points to 00 00 00 A0 I would expect the result to be 160, but it is 0. How come?
The issue is explained clearly by this warning (emitted by the compiler):
./endian.cpp:23:25: warning: multiple unsequenced modifications to 'p' [-Wunsequenced]
return (((int32_t)*p++ << 8 | *p++) << 8 | *p++) << 8 | *p++;
Breaking down the logic in the function in order to explicitly specify sequence points...
inline int32_t bigendianlong(unsigned char * &p)
{
int32_t result = *p++;
result = (result << 8) + *p++;
result = (result << 8) + *p++;
result = (result << 8) + *p++;
return result;
}
... will solve it
This function is named ntohl() (convert Network TO Host byte order Long) on both Unix and Windows, or g_ntohl() in glib. Add 4 to your pointer afterward. If you want to roll your own, a union type whose members are a uint32_t and a uint8_t[4] will be useful.

Bitwise unpacking using signed data

I've been trying for a while pack & unpack some chars into an integer. Although there are some topics related to this question, my problem is related with the signed shift. I don't get the 'trick' to unpack a signed value, i.e.:
char c1 = -119;
char c2 = 26;
// pack
int packed = (unsigned char)c1 | (c2 << 8);
// unpack
c1 = packed >> 0;
c2 = packed >> 8;
// printf(c1, c2) -> Unpacked data: -119 | 26
That works as expected but when i try to pack more data, i.e:
char c0 = -42;
char c1 = -119;
char c2 = 26;
// pack
int packed = (unsigned char)c0 | (unsigned char)(c1 << 8) | (c2 << 16);
// unpack
c0 = packed >> 0;
c1 = packed >> 8;
c2 = packed >> 16;
// printf -> Unpacked data: -42 | 0 | 26
c1 value is missed. I guess It's related to something with the sign bit is shifted into the high-order position.
How could i get back c1 value?
Thanks in advance.
You are casting c1 to unsigned char after shifting it out of the range of that type, so the result of the cast is zero. You should do the cast before shifting:
int packed = (unsigned char)c0 | ((unsigned char)c1 << 8) | (c2 << 16);
(unsigned char)(c1 << 8)
This will
shift the wrong (sign-extended) value
trim the result to 8 bits (yielding 0)
You don't want any of that so you should use ((unsigned char)c1 << 8).
Some ints are 16bits. For this code to be portable use int32_t. The correct way to accomplish this (if slightly paranoid) is:
int32_t packed = ((uint8_t)c0) | (((uint8_t)c1)<<8) | (((uint8_t)c2) << 16);
I also tend to list these in reverse order, so it is more natural which characters become the most and least significant bytes.

How to simply reconstruct numbers from a buffer in little endian format

Suppose I have:
typedef unsigned long long uint64;
unsigned char data[BUF_SIZE];
uint64 MyPacket::GetCRC()
{
return (uint64)(data[45] | data[46] << 8 |
data[47] << 16 | data[48] << 24 |
(uint64)data[49] << 32| (uint64)data[50] << 40 |
(uint64)data[51] << 48| (uint64)data[52] << 56);
}
Just wondering, if there is an cleaner way. I tried a memcpy to an uint64 variable
but that gives me the wrong value. I think I need the reverse. The data is in little endian format.
The big advantage of using the shift-or sequence is that it will work regardless if your host machine is big- or little-endian.
Of course, you would always tweak the expression. Personally, I try to join "pairs", that is two bytes at a time, then two shorts, and finally two longs, as this will help compilers to generate better code.
Well, maybe better idea is to swap order + cast?
typedef unsigned long long uint64;
unsigned char data[BUF_SIZE];
uint64 MyPacket::GetCRC()
{
uint64 retval;
unsigned char *rdata = reinterpret_cast<unsigned char*>(&retval);
for(unsigned i = 0; i < 8; ++i) rdata[i] = data[52-i];
return retval;
}
Here's one of my own that is similar to that provided by #x13n.
uint64 MyPacket::GetCRC()
{
int offset=45;
uint64 crc;
memcpy(&crc, data+offset, 8);
//std::reverse((char*)&crc, (char*)&crc + 8); // if this was a big endian machine
return crc;
}
Nothing much wrong with what you have there to be honest. It will work and it's quick.
The thing I'd change would be to format it a little better for readability:
uint64 MyPacket::GetCRC()
{
return (uint64) data[45] |
(uint64) data[46] << 8 |
(uint64) data[47] << 16 |
(uint64) data[48] << 24 |
(uint64) data[49] << 32 |
(uint64) data[50] << 40 |
(uint64) data[51] << 48 |
(uint64) data[52] << 56;
}
I guess your other option would be to do it in a loop instead:
uint64 MyPacket::GetCRC()
{
const int crcoffset = 45;
uint64 crc = 0;
for (int i = 0; i < 8; i++)
{
crc |= (uint64)data[i + crcoffset] << (i * 8);
}
return crc;
}
That would probably result in very similar assembly (as the compiler would probably do loop-unwinding for such a small loop) but it is a bit harder to grok in my opinion so you are better off with what you have.