AND with full bits? - c++

I have been reading over some code lately and came across some lines such as:
somevar &= 0xFFFFFFFF;
What is the point of anding something that has all bits turned on; doesn't it just equal somevar in the end?

"somevar" could be a 64-bit variable, this code would therefore extract the bottom 32 bits.
edit: if it's a 32-bit variable, I can think of other reasons but they are much more obscure:
the constant 0xFFFFFFFF was automatically generated code
someone is trying to trick the compiler into preventing something from being optimized
someone intentionally wants a no-op line to be able to set a breakpoint there during debugging.

Indeed, this wouldn't make sense if somevar is of type int (32-bit integer). If it is of type long (64-bit integer), however, then this would mask the upper (most significant) half of the value.
Note that a long value is not guaranteed to be 64 bits, but typically is on a 32-bit computer.

I suppose it depends on the length of somevar. This would, of course, not be a no-op if somevar were a 64-bit int. Or, is somevar is some type with an overloaded operator&=.

yes definitely to truncate 32 bits on a 64 bit environment.

If the code fragment was C++, then the &= operator might have been overridden so as to have an effect not particularly related to bitwise AND. Granted, that would be a nasty, evil, dirty thing to do...

sometimes the size of long is 32 bits, sometimes it's 64 bits, sometimes it's something else. Maybe the code is trying to compensate for that--and either just use the value (if it's 32 bits) or mask out the rest of the value and only use the lower 32 bits of it. Of course, this wouldn't really make sense, because if that were desired, it would have been easier to just use an int.

Related

Difference and nuances of std::uint8_t(-1) vs std::uint8_t(0xffu)

I have seen the use std::uint8_t(-1) for example here:
https://en.cppreference.com/w/cpp/language/fold
The CPP reference is illustrating endianness swap and I'm wondering what is the difference compared to a std::uint8_t(0xffu)?
On x86 there doesn't seem any difference:
https://godbolt.org/z/Kb7v8K1nT
My question could be reading into it too much and it's just a convention how somebody writes code and there is no deeper meaning to it. However, I suspect it's due to the portability of the code on some esoteric architectures where CHAR_BIT != 8
But then I was wondering in a case of byte order swap which needs to be 8-bit aligned then I would expect the std::uint8_t(0xffu) and forcing to do 8-bit calculations even when the CHAR_BIT != 8 then that would produce more portable code as it would be expected not to change between platforms? For example, when I'm producing TCP/IP packets and need to have specific endianness (and possibly swap some values), they need to be the values be same no matter what underlying architecture is used.
Maybe in a nibble, char swap (where we expect the size of the type to change and mechanism to adjust) then the std::uint8_t(-1) would be better?
In essence with std::uint8_t(-1) we are saying set all bits high no matter how many bits are there (even more if CHAR_BIT > 8), while with std::uint8_t(0xffu) we want 8-bit set (we get less if CHAR_BIT < 8)?
Or is there something I'm completely missing?
It's a shortcut to get the maximum value of an unsigned type without having to care how wide it is. All unsigned types behave in modulo 2n so unsigned_type(-1) is the same as std::numeric_limits<unsigned_type>::max().

Is static_cast on bounded types implementation-dependent?

I'm looking at static_cast with bounded types .
Is the behavior implementation-specific? In other words (given 16-bit shorts and 32-bit longs) is
long x = 70000;
short y = static_cast<short>(x);
guaranteed to produce y = 4464 (the low-order 16 bits of x)? Or only on a little-endian machine?
I have always assumed it would but I am getting odd results on a big-endian machine and trying to figure them out.
Here's the actual problem. I have two time_t's (presumably 64 bits) that I "know" will always be within some reasonable number of seconds of each other. I want to display that difference with printf. The code is multi-platform, so rather than worry about what the underlying type of time_t is, I am doing a printf("%d") passing static_cast<int>(time2-time1). I'm seeing a zero, despite the fact that the printf is in a block conditioned on (time2 != time1). (The printf is in a library; no reasonable possibility of using cout instead.)
Is static_cast possibly returning the high 32 bits of time_t?
Is there a better way to do this?
Thanks,
I think perhaps the problem was unrelated to the static_cast. #ifdef platform confusion. I'd still be interested if someone definitively knows the answer.

Can ~3 safely be widened automatically?

While answering another question, I ended up trying to justify casting the operand to the ~ operator, but I was unable to come up with a scenario where not casting it would yield wrong results.
I am asking this clarification question in order to be able to clean up that other question, removing the red herrings and keeping only the most relevant information intact.
The problem in question is that we want to clear the two lowermost bits of a variable:
offset = offset & ~3;
This looks dangerous, because ~3 will be an int no matter what offset is, so we might end up masking the bits that do not fit into int's width. For example if int is 32 bits wide and offset is of a 64 bit wide type, one could imagine that this operation would lose the 32 most significant bits of offset.
However, in practice this danger does not seem to manifest itself. Instead, the result of ~3 is sign-extended to fill the width of offset, even when offset is unsigned.
Is this behavior mandated by the standard? I am asking because it seems that this behavior could rely on specific implementation and/or hardware details, but I want to be able to recommend code that is correct according to the language standard.
I can make the operation produce an undesired result if I try to remove the 32. least significant bit. This is because the result of ~(1 << 31) will be positive in a 32 bit signed integer in two's complement representation (and indeed a one's complement representation), so sign-extending the result will make all the higher bits unset.
offset = offset & ~(1 << 31); // BZZT! Fragile!
In this case, if int is 32 bits wide and offset is of a wider type, this operation will clear all the high bits.
However, the proposed solution in the other question does not seem to resolve this problem!
offset = offset & ~static_cast<decltype(offset)>(1 << 31); // BZZT! Fragile!
It seems that 1 << 31 will be sign-extended before the cast, so regardless of whether decltype(offset) is signed or unsigned, the result of this cast will have all the higher bits set, such that the operation again will clear all those bits.
In order to fix this, I need to make the number unsigned before widening, either by making the integer literal unsigned (1u << 31 seems to work) or casting it to unsigned int:
offset = offset &
~static_cast<decltype(offset)>(
static_cast<unsigned int>(
1 << 31
)
);
// Now it finally looks like C++!
This change makes the original danger relevant. When the bitmask is unsigned, the inverted bitmask will be widened by setting all the higher bits to zero, so it is important to have the correct width before inverting.
This leads me to conclude that there are two ways to recommend clearing some bits:
1: offset = offset & ~3;
Advantages: Short, easily readable code.
Disadvantages: None that I know of. But is the behavior guaranteed by the standard?
2: offset = offset & ~static_cast<decltype(offset)>(3u);
Advantages: I understand how all elements of this code works, and I am fairly confident that its behavior is guaranteed by the standard.
Disadvantages: It doesn't exactly roll of the tounge.
Can you guys help me clarify if the behavior of option 1 is guaranteed or if I have to resort to recommending option 2?
It is not valid in sign-magnitude representation. In that representation with 32-bit ints, ~3 is -0x7FFFFFFC. When this is widened to 64-bit (signed) the value is retained, -0x7FFFFFFC. So we would not say that sign-extension happens in that system; and you will incorrectly mask off all the bits 32 and higher.
In two's complement, I think offset &= ~3 always works. ~3 is -4, so whether or not the 64-bit type is signed, you still get a mask with only the bottom 2 bits unset.
However, personally I'd try to avoid writing it, as then when checking over my code for bugs later I'd have to go through all this discussion again! (and what hope does a more casual coder have of understanding the intricacies here). I only do bitwise operations on unsigned types, to avoid all of this.

Bit order in C/C++

I have to implement a protocol which defines data in 8bit words, which starts with the least significant bit (LSB) first. I want to realize this data with unsigned char, but I don't know what's the bit order of LSB and most significant bit (MSB) in C/C++, that could possible require swapping the bits.
Can anybody explain me how to find out an unsigned char is encoded: with MSB-LSB or LSB-MSB?
Example:
unsigned char b = 1;
MSB-LSB: 0000 0001
LSB-MSB: 1000 0000
Endian-ness is platform dependent. Anyway, you don't have to worry about actual bit order unless you are serializing the bytes, which you may be trying to do. In which case, you still don't need to worry about how individual bytes are stored while they're on the machine, since you will have to dig the bits out individually anyway. Fortunately, if you bitwise AND with 1, you get the LSB, regardless of storage order; bit-AND with 2 and you get the next most significant bit, and so on. The compiler will sort out what constants to generate in the machine code, so that level of detail is abstracted away.
There is no such thing in C/C++. The least significant bit is -- well -- the least significant bit. Since the bits don't have addresses, there is no other ordering.

Getting 32 bit words out of 64-bit values in C/C++ and not worrying about endianness

It's my understanding that in C/C++ bitwise operators are supposed to be endian independent and behave the way you expect. I want to make sure that I'm truly getting the most significant and least significant words out of a 64-bit value and not worry about endianness of the machine. Here's an example:
uint64_t temp;
uint32_t msw, lsw;
msw = (temp & 0xFFFFFFFF00000000) >> 32;
lsw = temp & 0x00000000FFFFFFFF;
Will this work?
6.5.7 Bitwise shift operators
4 The result of E1 << E2 is E1
left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has
an unsigned type, the value of the
result is E1 × 2E2, reduced modulo one
more than the maximum value
representable in the result type. If
E1 has a signed type and nonnegative
value, and E1 × 2E2 is representable
in the result type, then that is the
resulting value; otherwise, the
behavior is undefined.
So, yes -- guranteed by the standard.
It will work, but the strange propensity of some authors for doing bit-masking before bit-shifting always puzzled me.
In my opinion, a much more elegant approach would be the one that does the shift first
msw = (temp >> 32) & 0xFFFFFFFF;
lsw = temp & 0xFFFFFFFF;
at least because it uses the same "magic" bit-mask constant every time.
Now, if your target type is unsigned and already has the desired bit-width, masking becomes completely unnecesary
msw = temp >> 32;
lsw = temp;
Yes, that should work. When you're retrieving the msw, your mask isn't really accomplishing much though -- the bits you mask to zero will be discarded when you do the shift anyway. Personally, I'd probably use something like this:
uint32_t lsw = -1, msw = -1;
lsw &= temp;
msw &= temp >> 32;
Of course, to produce a meaningful result, temp has to be initialized, which it wasn't in your code.
Yes.
It should work.
Just a thought I would like to share, perhaps you could get around the endianess of a value by using the functions or macros found in <arpa/inet.h>, to convert the Network to Host order and vice versa, it may be said that it is more used in conjunction to sockets, but it could be used for this instance to guarantee that a value such as 0xABCD from another processor is still 0xABCD on the Intel x86, instead of resorting to hand-coded custom functions to deal with the endian architecture....?!
Edit: Here's an article about Endianess on CodeProject and the author developed macros to deal with 64-bit values.
Hope this helps,
Best regards,
Tom.
Endianness is about memory layout. Shifting is about bits (and bit layout). Word significance is about bit layout, not memory layout. So endianness has nothing to do with word significance.
I think what you are saying is quite true, but where does this get you?
If you have some literal values hanging around, then you know which end is which. But if you find yourself with values that have come from outside the program, then you can't be sure, unless they have been encoded in some way.
In addition to the other responses, I shall add that you should not worry about endianness in C. Endianness trouble comes only from looking at some bytes under a different type than what was used to write those bytes in the first place. When you do that, you are very close to have aliasing issues, which means that your code may break when using another compiler or another optimization flag.
As long as you do not try to do such trans-type accesses, your code should be endian-neutral, and run flawlessly on both little-endian and big-endian architectures. Or, in other words, if you have endianness issues, then other kinds of bigger trouble are also lurking nearby.