query on bitwise ANDing after bitwise ORing - bit-manipulation

I came across a code snippet where AND-ing a value with all 1's is made after OR-ing the value with a number. For example:
value|=0x100;
value &=0xFFFF;
Why is the AND-ing required? I believe no matter if we do the AND-ing or not, the value in status remains same after OR-ing. But trying to understand the intention of AND-ing with all 1's again.

The ANDing is probably copy pasted from some other code where it made a difference. The intention is to keep the result to 16 bits, its like an assembler programmers typecast to 16 bits. Very useful if the result of the operation becomes negative.

Related

crc32 hash default/invalid value?

I am building a simple string ID system using crc32 to generate 32 bit integer handles from my strings. I'd like to default the hash inside my StringID wrapper class to an invalid index by default, is there a value that crc32 will never generate? Will I have to use a separate flag?
Clarification: I am not interested in language specific answers. I'd simply like to know if there is an integer outside of the crc32 range that can be used to represent an unhashed value.
Thanks!
Is there a value that crc32 will never generate?
No, it will generate any/all values in the range of a 32-bit integer.
Will I have to use a separate flag?
Not necessarily.
If you decide that (e.g.) 0x00000000 means "CRC not set" and non-zero is the CRC value; then after calculating the CRC (but before storing it or checking the stored value) you can do if(CRCvalue == 0) CRCvalue = 0xFFFFFFFF;.
This weakens the CRC by an extremely tiny amount. Specifically, for 2 random pieces of data, for pure CRC32 there's 1 chance in 4294967296 of the CRCs matching, and with "zero means unset" there's 1 chance in 4294967295.000000000232830643654 of the CRCs matching.
There is an easy demonstration to the fact that you can generate any crc32 value, as it is de division mod P (where P is the generator polynomial) in a galois field (which happens to be a field, as real or complex numbers are), you can subtract (this is a XOR operation, so adding and subtracting are indeed the same thing) to your polynomial its modulus, giving a 0 remainder, then you can add to this multiple of the modulus any of all the possible crc32 values to it (as they are already remainders of divisions, their crc32 is just themselves) to get any of the 2^32 possible values.
It is a common practice to add as many zero bits as necessary to complete a full 32 bit word (this appears as a multiplication by a constant value x^32), and then subtract(xor) the remainder to that, making the result a multiple of of the modulus (remember that the addition and subtraction are the same ---a xor operation) and so making the crc32(pol) = 0x0000;
edit(easier to see)
Indeed, each of the possible 2^32 values for crc32, when divided by the generator polynomial, give themselves as a result (they are coprime with the generator polynomial, as are the numbers 1 .. N when doing arithmetic modulo N on integers) so they all are possible results of the crc32() operator.
The crc operation, as implemented in many places, is not that simple... as some implementations initialize the remainder register as 0xffffffff and look for 0xffffffff at termination(indeed, crc32 does this).... If you do the maths, you'll guess the reason for that: Initializing the register to 0x11111111 is equivalent to having a previous remainder of 0xffffffff in a longer string... and looking for 0xffffffff at the end is like appending 0xffffffff to the original string. This has the effect of concatenating the bit string 0xffffffff before and after your string, making the remainder sensible to appends of a string of zeros before and after the crc32 calculated string (altering the string of bits by appending zeros at either side). Anyway, this modification doesn't alter the original algorithm of calculating a polynomial remainder, so any of the 2**32 values are possible also in this case.
No. A CRC-32 can be any 32-bit value. You'll need to indicate an invalid index somewhere else.
My spoof code allows you to choose bit locations in a message to modify and the desired CRC, and will solve for which of those locations to flip to get exactly that CRC.

How are Overflow situations dealt with? [duplicate]

This question already has answers here:
Why is unsigned integer overflow defined behavior but signed integer overflow isn't?
(6 answers)
Closed 7 years ago.
I just simply wanted to know, who is responsible to deal with mathematical overflow cases in a computer ?
For example, in the following C++ code:
short x = 32768;
std::cout << x;
Compiling and running this code on my machine gave me a result of -32767
A "short" variable's size is 2 bytes .. and we know 2 bytes can hold a maximum decimal value of 32767 (if signed) .. so when I assigned 32768 to x .. after exceeding its max value 32767 .. It started counting from -32767 all over again to 32767 and so on ..
What exactly happened so the value -32767 was given in this case ?
ie. what are the binary calculations done in the background the resulted in this value ?
So, who decided that this happens ? I mean who is responsible to decide that when a mathematical overflow happens in my program .. the value of the variable simply starts again from its min value, or an exception is thrown for example, or the program simply freezes .. etc ?
Is it the language standard, the compiler, my OS, my CPU, or who is it ?
And how does it deal with that overflow situation ? (Simple explanation or a link explaining it in details would be appreciated :) )
And btw, pls .. Also, who decides what a size of a 'short int' for example on my machine would be ? also is it a language standard, compiler, OS, CPU .. etc ?
Thanks in advance! :)
Edit:
Ok so I understood from here : Why is unsigned integer overflow defined behavior but signed integer overflow isn't?
that It's the processor who defines what happens in an overflow situation (like for example in my machine it started from -32767 all over again), depending on "representations for signed values" of the processor, ie. is it sign magnitude, one's complement or two's complement ...
is that right ?
and in my case (When the result given was like starting from the min value -32767 again.. how do you suppose my CPU is representing the signed values, and how did the value -32767 for example come up (again, binary calculations that lead to this, pls :) ? )
It doesn't start at it's min value per se. It just truncates its value, so for a 4 bit number, you can count until 1111 (binary, = 15 decimal). If you increment by one, you get 10000, but there is no room for that, so the first digit is dropped and 0000 remains. If you would calculate 1111 + 10, you'd get 1.
You can add them up as you would on paper:
1111
0010
---- +
10001
But instead of adding up the entire number, the processor will just add up until it reaches (in this case) 4 bits. After that, there is no more room to add up any more, but if there is still 1 to 'carry', it sets the overflow register, so you can check whether the last addition it did overflowed.
Processors have basic instructions to add up numbers, and they have those for smaller and larger values. A 64 bit processor can add up 64 bit numbers (actually, usually they don't add up two numbers, but actually add a second number to the first number, modifying the first, but that's not really important for the story).
But apart from 64 bits, they often can also add up 32, 16 and 8 bit numbers. That's partly because it can be efficient to add up only 8 bits if you don't need more, but also sometimes to be backwards compatible with older programs for a previous version of a processor which could add up to 32 bits but not 64 bits.
Such a program uses an instruction to add up 32 bits numbers, and the same instruction must also exist on the 64 bit processor, with the same behavior if there is an overflow, otherwise the program wouldn't be able to run properly on the newer processor.
Apart from adding up using the core constructions of the processor, you could also add up in software. You could make an inc function that treats a big chunk of bits as a single value. To increment it, you can let the processor increment the first 64 bits. The result is stored in the first part of your chunk. If the overflow flag is set in the processor, you take the next 64 bits and increment those too. This way, you can extend the limitation of the processor to handle large numbers from software.
And same goes for the way an overflow is handled. The processor just sets the flag. Your application can decide whether to act on it or not. If you want to have a counter that just increments to 65535 and then wraps to 0, you (your program) don't need to do anything with the flag.

Can ~3 safely be widened automatically?

While answering another question, I ended up trying to justify casting the operand to the ~ operator, but I was unable to come up with a scenario where not casting it would yield wrong results.
I am asking this clarification question in order to be able to clean up that other question, removing the red herrings and keeping only the most relevant information intact.
The problem in question is that we want to clear the two lowermost bits of a variable:
offset = offset & ~3;
This looks dangerous, because ~3 will be an int no matter what offset is, so we might end up masking the bits that do not fit into int's width. For example if int is 32 bits wide and offset is of a 64 bit wide type, one could imagine that this operation would lose the 32 most significant bits of offset.
However, in practice this danger does not seem to manifest itself. Instead, the result of ~3 is sign-extended to fill the width of offset, even when offset is unsigned.
Is this behavior mandated by the standard? I am asking because it seems that this behavior could rely on specific implementation and/or hardware details, but I want to be able to recommend code that is correct according to the language standard.
I can make the operation produce an undesired result if I try to remove the 32. least significant bit. This is because the result of ~(1 << 31) will be positive in a 32 bit signed integer in two's complement representation (and indeed a one's complement representation), so sign-extending the result will make all the higher bits unset.
offset = offset & ~(1 << 31); // BZZT! Fragile!
In this case, if int is 32 bits wide and offset is of a wider type, this operation will clear all the high bits.
However, the proposed solution in the other question does not seem to resolve this problem!
offset = offset & ~static_cast<decltype(offset)>(1 << 31); // BZZT! Fragile!
It seems that 1 << 31 will be sign-extended before the cast, so regardless of whether decltype(offset) is signed or unsigned, the result of this cast will have all the higher bits set, such that the operation again will clear all those bits.
In order to fix this, I need to make the number unsigned before widening, either by making the integer literal unsigned (1u << 31 seems to work) or casting it to unsigned int:
offset = offset &
~static_cast<decltype(offset)>(
static_cast<unsigned int>(
1 << 31
)
);
// Now it finally looks like C++!
This change makes the original danger relevant. When the bitmask is unsigned, the inverted bitmask will be widened by setting all the higher bits to zero, so it is important to have the correct width before inverting.
This leads me to conclude that there are two ways to recommend clearing some bits:
1: offset = offset & ~3;
Advantages: Short, easily readable code.
Disadvantages: None that I know of. But is the behavior guaranteed by the standard?
2: offset = offset & ~static_cast<decltype(offset)>(3u);
Advantages: I understand how all elements of this code works, and I am fairly confident that its behavior is guaranteed by the standard.
Disadvantages: It doesn't exactly roll of the tounge.
Can you guys help me clarify if the behavior of option 1 is guaranteed or if I have to resort to recommending option 2?
It is not valid in sign-magnitude representation. In that representation with 32-bit ints, ~3 is -0x7FFFFFFC. When this is widened to 64-bit (signed) the value is retained, -0x7FFFFFFC. So we would not say that sign-extension happens in that system; and you will incorrectly mask off all the bits 32 and higher.
In two's complement, I think offset &= ~3 always works. ~3 is -4, so whether or not the 64-bit type is signed, you still get a mask with only the bottom 2 bits unset.
However, personally I'd try to avoid writing it, as then when checking over my code for bugs later I'd have to go through all this discussion again! (and what hope does a more casual coder have of understanding the intricacies here). I only do bitwise operations on unsigned types, to avoid all of this.

AND with full bits?

I have been reading over some code lately and came across some lines such as:
somevar &= 0xFFFFFFFF;
What is the point of anding something that has all bits turned on; doesn't it just equal somevar in the end?
"somevar" could be a 64-bit variable, this code would therefore extract the bottom 32 bits.
edit: if it's a 32-bit variable, I can think of other reasons but they are much more obscure:
the constant 0xFFFFFFFF was automatically generated code
someone is trying to trick the compiler into preventing something from being optimized
someone intentionally wants a no-op line to be able to set a breakpoint there during debugging.
Indeed, this wouldn't make sense if somevar is of type int (32-bit integer). If it is of type long (64-bit integer), however, then this would mask the upper (most significant) half of the value.
Note that a long value is not guaranteed to be 64 bits, but typically is on a 32-bit computer.
I suppose it depends on the length of somevar. This would, of course, not be a no-op if somevar were a 64-bit int. Or, is somevar is some type with an overloaded operator&=.
yes definitely to truncate 32 bits on a 64 bit environment.
If the code fragment was C++, then the &= operator might have been overridden so as to have an effect not particularly related to bitwise AND. Granted, that would be a nasty, evil, dirty thing to do...
sometimes the size of long is 32 bits, sometimes it's 64 bits, sometimes it's something else. Maybe the code is trying to compensate for that--and either just use the value (if it's 32 bits) or mask out the rest of the value and only use the lower 32 bits of it. Of course, this wouldn't really make sense, because if that were desired, it would have been easier to just use an int.

How can I set all bits to '1' in a binary number of an unknown size?

I'm trying to write a function in assembly (but lets assume language agnostic for the question).
How can I use bitwise operators to set all bits of a passed in number to 1?
I know that I can use the bitwise "or" with a mask with the bits I wish to set, but I don't know how to construct a mask based off some a binary number of N size.
~(x & 0)
x & 0 will always result in 0, and ~ will flip all the bits to 1s.
Set it to 0, then flip all the bits to 1 with a bitwise-NOT.
You're going to find that in assembly language you have to know the size of a "passed in number". And in assembly language it really matters which machine the assembly language is for.
Given that information, you might be asking either
How do I set an integer register to all 1 bits?
or
How do I fill a region in memory with all 1 bits?
To fill a register with all 1 bits, on most machines the efficient way takes two instructions:
Clear the register, using either a special-purpose clear instruction, or load immediate 0, or xor the register with itself.
Take the bitwise complement of the register.
Filling memory with 1 bits then requires 1 or more store instructions...
You'll find a lot more bit-twiddling tips and tricks in Hank Warren's wonderful book Hacker's Delight.
Set it to -1. This is usually represented by all bits being 1.
Set x to 1
While x < number
x = x * 2
Answer = number or x - 1.
The code assumes your input is called "number". It should work fine for positive values. Note for negative values which are twos complement the operation attempt makes no sense as the high bit will always be one.
Use T(~T(0)).
Where T is the typename (if we are talking about C++.)
This prevents the unwanted promotion to int if the type is smaller than int.