Conversion of big-endian long in C++?

Conversion of big-endian long in C++? - c++

I need a C++ function that returns the value of four consecutive bytes interpreted as a bigendian long. A pointer to the first byte should be updated to point after the last. I have tried the following code:
inline int32_t bigendianlong(unsigned char * &p)
{
return (((int32_t)*p++ << 8 | *p++) << 8 | *p++) << 8 | *p++;
}
For instance, if p points to 00 00 00 A0 I would expect the result to be 160, but it is 0. How come?

The issue is explained clearly by this warning (emitted by the compiler):
./endian.cpp:23:25: warning: multiple unsequenced modifications to 'p' [-Wunsequenced]
return (((int32_t)*p++ << 8 | *p++) << 8 | *p++) << 8 | *p++;
Breaking down the logic in the function in order to explicitly specify sequence points...
inline int32_t bigendianlong(unsigned char * &p)
{
int32_t result = *p++;
result = (result << 8) + *p++;
result = (result << 8) + *p++;
result = (result << 8) + *p++;
return result;
}
... will solve it

This function is named ntohl() (convert Network TO Host byte order Long) on both Unix and Windows, or g_ntohl() in glib. Add 4 to your pointer afterward. If you want to roll your own, a union type whose members are a uint32_t and a uint8_t[4] will be useful.

Related

Why is this data being flipped

Below is an example of processing very similar to what I am working with. I understand the concept of endianness and have read through the suggested posts but it doesn't seem to explain what is happening here.
I have an array of unsigned characters that I am packing with data. I was under the impression that memcpy was endianness agnostic. I would think that the left-most bit would stay the left must bit. However when I attempt to print the characters each word is copied backwards.
Why does this happen?
#include <iostream>
#include <cstring>
#include <array>
const unsigned int MAX_VALUE = 64ul;
typedef unsigned char DDS_Octet[MAX_VALUE];
int main()
{
// create an array and populate it with printable
// characters
DDS_Octet octet;
for(int i = 0; i < MAX_VALUE; ++i)
octet[i] = (i + 33);
// print characters before the memcpy operation
for(int i = 0; i < MAX_VALUE; ++i)
{
if(i && !(i % 4)) std::cout << "\n";
std::cout << octet[i] << "\t";
}
std::cout << "\n\n------------------------------\n";
// This is an equivalent copy operation
// to what is actually being used
std::array<unsigned int, 16> arr;
memcpy(
arr.data(),
octet,
sizeof(octet));
// print the character contents of each
// word left to right (MSB to LSB on little endian)
for(auto i : arr)
std::cout
<< (char)(i >> 24) << "\t"
<< (char)((i >> 16) & 0xFF) << "\t"
<< (char)((i >> 8) & 0xFF) << "\t"
<< (char)(i & 0xFF) << "\n";
** output **
! " # $
% & ' (
) * + ,
- . / 0
1 2 3 4
5 6 7 8
9 : ; <
= > ? #
A B C D
E F G H
I J K L
M N O P
Q R S T
U V W X
Y Z [ \
] ^ _ `
------------------------------
$ # " !
( ' & %
, + * )
0 / . -
4 3 2 1
8 7 6 5
< ; : 9
# ? > =
D C B A
H G F E
L K J I
P O N M
T S R Q
X W V U
\ [ Z Y
` _ ^ ]
----Update-----
I took a look at the memcpy source code (below) which was far more simple than expected. It actually explains everything. It would seem that it would be correct to say that the endianness of the integer is the cause for this, but incorrect to say that memcpy does not play a role. What I was overlooking what that data is being copied on a byte-by-byte operation. Given that, it makes sense that the little endian integer would reverse it.
void *
memcpy (void *dest, const void *src, size_t len)
{
char *d = dest;
const char *s = src;
while (len--)
*d++ = *s++;
return dest;
}

When you memcpy 4 chars into a 4-byte unsigned int they get stored in the same order they were in the original array. That is, the first char in the input array will be stored in the lowest address byte of the unsigned int, the second in the second lowest address byte, and so on.
x86 is little-endian. The lowest address byte of an unsigned int is the least significant byte.
The shift operators are endianess-independent though. They work on the logical representation of an integer, not the physical bytes. That means, for an unsigned int i on a little-endain platform, i & 0xFF gives the lowest address byte and (i >> 24) & 0xFF gives the highest address byte, while on a big-endian platform i & 0xFF gives the highest address byte and (i >> 24) & 0xFF gives the lowest address byte.
Taken together, these threee facts explain why your data is reversed. '!' is the first char in your array, so when you memcpy that array into an array of unsigned int '!' becomes the lowest address byte of the first unsigned int in the destination array. The lowest address byte is the least significant on your little-endian platform, and so that is the byte you retrieve with i & 0xFF.

Maybe this will let you understand easier. Let's say we have these data defined:
uint32_t val = 0x01020304;
auto *pi = reinterpret_cast<unsigned char *>( &val );
Following code will produce the same result on big-endian and little-endian platform:
std::cout << ( (val >> 24) & 0xFF ) << '\t'
<< ( (val >> 16) & 0xFF ) << '\t'
<< ( (val >> 8) & 0xFF ) << '\t'
<< ( (val >> 0) & 0xFF ) << '\n';
but this code will have different output:
std::cout << static_cast<unsigned int>( pi[0] ) << '\t'
<< static_cast<unsigned int>( pi[1] ) << '\t'
<< static_cast<unsigned int>( pi[2] ) << '\t'
<< static_cast<unsigned int>( pi[3] ) << '\n';
it has nothing to do with memcpy(), it is how ints are stored in memory and how bit shifting operation works.

The value 0x12345678 is stored as 4 bytes: 0x78 0x56 0x34 0x12. But 0x12345678>>24 is still 0x12 because that has nothing to do with the 4 separate bytes.
If you have the 4 bytes: 0x78 0x56 0x34 0x12, and interpret them as a 4-byte little-endian integer, you get 0x12345678. If you right-shift by 24 bits, you get the 4th byte: 0x12. If you right-shift by 16 bits and mask with 0xff, you get the 3rd byte: 0x34. And so on. Because ((0x12345678 >> 16) & 0xff) == 0x34
The memcpy has nothing to do with it.

why can't you shift a uint16_t [duplicate]

This question already has an answer here:
right shift count >= width of type or left shift count >= width of type
(1 answer)
Closed 3 years ago.
I am trying to fill a 64-bit unsigned variable by combining 16-bit and 8-bit values:
uint8_t byte0 = 0x00;
uint8_t byte1 = 0xAA;
uint8_t byte2 = 0x00;
uint8_t byte3 = 0xAA;
uint16_t hword0 = 0xAA00;
uint16_t hword1 = 0xAAAA;
uint64_t result = ( hword0 << 32 ) + ( byte3 << 24 ) +
( byte2 << 16 ) + ( byte1 << 8 ) + ( byte0 << 0 );
This gives me a warning.
left shift count >= width of type [-Wshift-count-overflow]
uint64_t result = ( hword0 << 32 )

hword0 is 16 bits long and you request for a 32 bit shift. Shifting more than the number of bits - 1 is undefined.
Solution is to convert your components to the destination type : uint64_t result = ( ((uint64_t)hword0) << 32 ) + etc.

As opposed to your question tile, you can shift a uint16_t. But you cannot shift it (losslessly) by more than its width.
Your input operand's type is applied to the output operand as well, so in your original question, you have a uint16_t << 32 which is 0 (because any value shifted by 32 to the left and then clipped to 16 bits is 0), and so are nearly all of your uint8_t values.
The solution is simple: before shifting, cast your values to the appropriate type suitable for shifting:
uint64_t result = ( (uint64_t)hword0 << 32 ) +
( (uint32_t)byte3 << 24 ) + ( (uint32_t)byte2 << 16 ) + ( (uint32_t)byte1 << 8 ) + ( (uint32_t)byte0 << 0 );

You can shift a uint16_t. What you can't do is shift an integer value by a number greater than or equal to the size of the type. Doing so invokes undefined behavior. This is documented in section 6.5.7p3 of the C standard regarding bitwise shift operators:
The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand. If
the value of the right operand is negative or is greater than
or equal to the width of the promoted left operand, the behavior is
undefined.
You would think that this means that any shift greater than or equal to 16 on a uint16_t is not valid. However, as mentioned above the operands of the << operator are subject to integer promotion. This means that any value with a rank lower than int is promoted to int before being used in an expression. So if int is 32 bits on your system, then you can left shift up to 31 bits.
This is why ( byte3 << 24 ) + ( byte2 << 16 ) + ( byte1 << 8 ) + ( byte0 << 0 ) don't generate a warning even though byte is a uint8_t while ( hword0 << 32 ) is not. There is still an issue here however because of the promotion to int. Because the promoted value is now signed, you run the risk of shifting a 1 into the sign bit. Doing so invokes undefined behavior as well.
To fix this, any value that is shifted left by 32 or more must be first casted to uint64_t so that the value can be operated on properly, as well as any value that may end up shifting a 1 into the sign bit:
uint64_t result = ( (uint64_t)hword0 << 32 ) +
( (uint64_t)byte3 << 24 ) + ( (uint64_t)byte2 << 16 ) +
( (uint64_t)byte1 << 8 ) + ( byte0 << 0 );

According to the warning, 32 bits is more or equal to the size of the operand on the target system. The C++ standard says:
[expr.shift]
The operands shall be of integral or unscoped enumeration type and integral promotions are performed.The type of the result is that of the promoted left operand. The behavior is undefined if the right operandis negative, or greater than or equal to the length in bits of the promoted left operand.
Corresponding rule from the C standard:
Bitwise shift operators
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
According to the quoted rule, the behaviour of the your program is undefined whether it is written in C or C++.
You can solve the problem by explicitly converting the left hand operand of the shift to a sufficient large unsigned type.
P.S. On systems where uint16_t is smaller than int (which is quite typical), a uint16_t oprand will be promoted to int when used as an arithmetic operand. As such, byte2 << 16 is not unconditionally† undefined on such systems. You shouldn't rely on this detail, but that explains why you see no warning from the compiler regarding that shift.
† byte2 << 16 can still be undefined if the result is outside the range of representable values of the (signed) int type. It would be well defined if the promoted type was unsigned.

byte2 << 16
is left-shifting an 8-byte value 16 bytes. That won't work. Per 6.5.7 Bitwise shift operators, paragraph 4 of the C standard:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 x 2E2 , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 x 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Since you're using a left shift on unsigned values, you get zero.
EDIT
Per paragraph 3 of the same section, it's actually undefined behavior:
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
You want something like
( ( uint64_t ) byte2 ) << 16
The cast to a 64-bit value will ensure the result doesn't lose bits.

To do what you want to do, the key idea is to use intermediate uint64_t (the final size) in which to shuffle bits.
The following compiles with no warnings:
you can use auto promotion (and no cast)
{
uint64_t b4567 = hword0; // auto promotion
uint64_t b3 = byte3;
uint64_t b2 = byte2;
uint64_t b1 = byte1;
uint64_t b0 = byte0;
uint64_t result = (
(b4567 << 32) |
(b3 << 24) |
(b2 << 16) |
(b1 << 8) |
(b0 << 0) );
}
you can also use static cast (multiple times):
{
uint64_t result = (
(static_cast<uint64_t>(hword0) << 32) |
(static_cast<uint64_t>(byte3) << 24) |
(static_cast<uint64_t>(byte2) << 16) |
(static_cast<uint64_t>(byte1) << 8) |
(static_cast<uint64_t>(byte0) << 0 )
);
cout << "\n " << hex << result << endl;
}
And you can do both by creating a function to a) perform the static cast and b) with a formal parameter to get the compiler to auto-promote.
function looks like:
// vvvvvvvv ---- formal parameter
uint64_t sc (uint64_t ui64) {
return static_cast<uint64_t>(ui64);
}
// using static cast function
{
uint64_t result = (
(sc(hword0) << 32) |
(sc(byte3) << 24) |
(sc(byte2) << 16) |
(sc(byte1) << 8) |
(sc(byte0) << 0)
);
cout << "\n " << hex << result << endl;
}

From a C perspective:
Much discussion here omits that a uint8_t applied to a shift (left or right) is first promoted to an int, and then the shift rules are applied.
Same occurs with uint16_t when int is 32-bit. (17 bit or more)
When int is 32-bit
hword0 << 32 is UB due to the shift amount too great: outside 0 to 31.
byte3 << 24 is UB when attempting to shift into the sign bit. byte3 & 0x80 is true.
Other shifts are OK.
Had int been 64-bit, OP's original code is fine - no UB, including hword0 << 32.
Had int been 16-bit, all of code's shifts (aside from << 0) are UB or potential UB.
To do this, without casting (Something I try to avoid), consider
// uint64_t result = (hword0 << 32) + (byte3 << 24) + (byte2 << 16) + (byte1 << 8) + byte0
// Let an optimizing compiler do its job
uint64_t result = hword0;
result <<= 8;
result += byte3;
result <<= 8;
result += byte2;
result <<= 8;
result += byte1;
result <<= 8;
result += byte0;
Or
uint64_t result = (1ull*hword0 << 32) + (1ul*byte3 << 24) + (1ul*byte2 << 16) +
(1u*byte1 << 8) + byte0;

C++ equivalent of 'pack' in Perl

How do I write C++ code that does what the pack -N option does in Perl?
I want to convert an integer variable to some binary form such that the unpack -N option on it gives back the integer variable.
My integer variable name is timestamp.
I found that it is related to htonl, but still htonl(timestamp) does not give the binary form.

I wrote a library, libpack, similar to Perl's pack function. It's a C library so it would be quite usable from C++ as well:
FILE *f;
fpack(f, "u32> u32>", value_a, value_b);
A u32 > specifies an unsigned 32-bit integer in big-endian format; i.e. equivalent to Perl's N format to pack().
http://www.leonerd.org.uk/code/libpack/

It takes 4 bytes and forms a 32-bit int as follows:
uint32_t n;
n = buf[0] << 24
| buf[1] << 16
| buf[2] << 8
| buf[3] << 0;
For example,
uint32_t n;
unsigned char buf[4];
size_t bytes_read = fread(buf, 1, 4, stream);
if (bytes_read < 4) {
if (ferror(stream)) {
// Error
// ...
}
else if (feof(stream)) {
// Premature EOF
// ...
}
}
else {
n = buf[0] << 24
| buf[1] << 16
| buf[2] << 8
| buf[3] << 0;
}

c++ 64 bit network to host translation

I know there are answers for this question using using gcc byteswap and other alternatives on the web but was wondering why my code below isn't working.
Firstly I have gcc warnings ( which I feel shouldn't be coming ) and reason why I don't want to use byteswap is because I need to determine if my machine is big endian or little endian and use byteswap accordingly i.,e if my machine is big endian I could memcpy the bytes as is without any translation otherwise I need to swap them and copy it.
static inline uint64_t ntohl_64(uint64_t val)
{
unsigned char *pp =(unsigned char *)&val;
uint64_t val2 = ( pp[0] << 56 | pp[1] << 48
| pp[2] << 40 | pp[3] << 32
| pp[4] << 24 | pp[5] << 16
| pp[6] << 8 | pp[7]);
return val2;
}
int main()
{
int64_t a=0xFFFF0000;
int64_t b=__const__byteswap64(a);
int64_t c=ntohl_64(a);
printf("\n %lld[%x] [%lld] [%lld]\n ", a, a, b, c);
}
Warnings:-
In function \u2018uint64_t ntohl_64(uint64_t)\u2019:
warning: left shift count >= width of type
warning: left shift count >= width of type
warning: left shift count >= width of type
warning: left shift count >= width of type
Output:-
4294901760[00000000ffff0000] 281470681743360[0000ffff00000000] 65535[000000000000ffff]
I am running this on a little endian machine so byteswap and ntohl_64 should result in exact same values but unfortunately I get completely unexpected results. It would be great if someone can pointout whats wrong.

The reason your code does not work is because you're shifting unsigned chars. As they shift the bits fall off the top and any shift greater than 7 can be though of as returning 0 (though some implementations end up with weird results due to the way the machine code shifts work, x86 is an example). You have to cast them to whatever you want the final size to be first like:
((uint64_t)pp[0]) << 56
Your optimal solution with gcc would be to use htobe64. This function does everything for you.
P.S. It's a little bit off topic, but if you want to make the function portable across endianness you could do:
Edit based on Nova Denizen's comment:
static inline uint64_t htonl_64(uint64_t val)
{
union{
uint64_t retVal;
uint8_t bytes[8];
};
bytes[0] = (val & 0x00000000000000ff);
bytes[1] = (val & 0x000000000000ff00) >> 8;
bytes[2] = (val & 0x0000000000ff0000) >> 16;
bytes[3] = (val & 0x00000000ff000000) >> 24;
bytes[4] = (val & 0x000000ff00000000) >> 32;
bytes[5] = (val & 0x0000ff0000000000) >> 40;
bytes[6] = (val & 0x00ff000000000000) >> 48;
bytes[7] = (val & 0xff00000000000000) >> 56;
return retVal;
}
static inline uint64_t ntohl_64(uint64_t val)
{
union{
uint64_t inVal;
uint8_t bytes[8];
};
inVal = val;
return bytes[0] |
((uint64_t)bytes[1]) << 8 |
((uint64_t)bytes[2]) << 16 |
((uint64_t)bytes[3]) << 24 |
((uint64_t)bytes[4]) << 32 |
((uint64_t)bytes[5]) << 40 |
((uint64_t)bytes[6]) << 48 |
((uint64_t)bytes[7]) << 56;
}
Assuming the compiler doesn't do something to the uint64_t on it's way back through the return, and assuming the user treats the result as an 8-byte value (and not an integer), that code should work on any system. With any luck, your compiler will be able to optimize out the whole expression if you're on a big endian system and use some builtin byte swapping technique if you're on a little endian machine (and it's guaranteed to still work on any other kind of machine).

uint64_t val2 = ( pp[0] << 56 | pp[1] << 48
| pp[2] << 40 | pp[3] << 32
| pp[4] << 24 | pp[5] << 16
| pp[6] << 8 | pp[7]);
pp[0] is an unsigned char and 56 is an int, so pp[0] << 56 performs the left-shift as an unsigned char, with an unsigned char result. This isn't what you want, because you want all these shifts to have type unsigned long long.
The way to fix this is to cast, like ((unsigned long long)pp[0]) << 56.

Since pp[x] is 8-bit wide, the expression pp[0] << 56 results in zero. You need explicit masking on the original value and then shifting:
uint64_t val2 = (( val & 0xff ) << 56 ) |
(( val & 0xff00 ) << 48 ) |
...
In any case, just use compiler built-ins, they usually result in a single byte-swapping instruction.

Casting and shifting works as PlasmaHH suggesting but I don't know why 32 bit shifts upconvert automatically and not 64 bit.
typedef uint64_t __u64;
static inline uint64_t ntohl_64(uint64_t val)
{
unsigned char *pp =(unsigned char *)&val;
return ((__u64)pp[0] << 56 |
(__u64)pp[1] << 48 |
(__u64)pp[2] << 40 |
(__u64)pp[3] << 32 |
(__u64)pp[4] << 24 |
(__u64)pp[5] << 16 |
(__u64)pp[6] << 8 |
(__u64)pp[7]);
}

Bitwise unpacking using signed data

I've been trying for a while pack & unpack some chars into an integer. Although there are some topics related to this question, my problem is related with the signed shift. I don't get the 'trick' to unpack a signed value, i.e.:
char c1 = -119;
char c2 = 26;
// pack
int packed = (unsigned char)c1 | (c2 << 8);
// unpack
c1 = packed >> 0;
c2 = packed >> 8;
// printf(c1, c2) -> Unpacked data: -119 | 26
That works as expected but when i try to pack more data, i.e:
char c0 = -42;
char c1 = -119;
char c2 = 26;
// pack
int packed = (unsigned char)c0 | (unsigned char)(c1 << 8) | (c2 << 16);
// unpack
c0 = packed >> 0;
c1 = packed >> 8;
c2 = packed >> 16;
// printf -> Unpacked data: -42 | 0 | 26
c1 value is missed. I guess It's related to something with the sign bit is shifted into the high-order position.
How could i get back c1 value?
Thanks in advance.

You are casting c1 to unsigned char after shifting it out of the range of that type, so the result of the cast is zero. You should do the cast before shifting:
int packed = (unsigned char)c0 | ((unsigned char)c1 << 8) | (c2 << 16);

(unsigned char)(c1 << 8)
This will
shift the wrong (sign-extended) value
trim the result to 8 bits (yielding 0)
You don't want any of that so you should use ((unsigned char)c1 << 8).

Some ints are 16bits. For this code to be portable use int32_t. The correct way to accomplish this (if slightly paranoid) is:
int32_t packed = ((uint8_t)c0) | (((uint8_t)c1)<<8) | (((uint8_t)c2) << 16);
I also tend to list these in reverse order, so it is more natural which characters become the most and least significant bytes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js