Attempting to understand different CRC implementations - crc

Taken from IEEE 802.3,
Mathematically, the CRC value corresponding to a given MAC frame is defined by the following procedure:
a) The first 32 bits of the frame are complemented.
b) The n bits of the protected fields are then considered to be the
coefficients of a polynomial M(x) of degree n ā€“ 1. (The first bit
of the Destination Address field corresponds to the x(nā€“1) term and the last
bit of the MAC Client Data field (or Pad field if present) corresponds to the
x0 term.)
c) M(x) is multiplied by x32 and divided by G(x), producing a remainder R(x) of degree ā‰¤ 31.
d) The coefficients of R(x) are considered to be a 32-bit sequence.
e) The bit sequence is complemented and the result is the CRC.
https://www.kernel.org/doc/Documentation/crc32.txt
A big-endian CRC written this way would be coded like:
for (i = 0; i < input_bits; i++) {
multiple = remainder & 0x80000000 ? CRCPOLY : 0;
remainder = (remainder << 1 | next_input_bit()) ^ multiple;
}
Where is part c) M(x) is multiplied by x^32? I don't see 32 zeros appended to any number.
Also the following piece of code make no sense to me. The code and math don't really match up.
Evaluating the differences in CRC-32 implementations
and
unsigned short
crc16_update(unsigned short crc, unsigned char nextByte)
{
crc ^= nextByte;
for (int i = 0; i < 8; ++i) {
if (crc & 1)
crc = (crc >> 1) ^ 0xA001;
else
crc = (crc >> 1);
}
return crc;
}
What are these implementations doing? None of them really resemble the original procedure.
Even after reading the very end of this it still makes no sense:
http://www.relisoft.com/science/crcmath.html

This tutorial (also here, here, and here for those who will complain about link rot), in particular "10. A Slightly Mangled Table-Driven Implementation", explains well the optimization to avoid feeding an extra 32 zero bits at the end.
The bottom line is that you feed the bits into the end of the register instead of the start, which has the same effect as feeding a register-length's worth of zeros at the end.
The tutorial also shows nicely how the implementation you quoted implements the long division over GF(2).

Related

Getting exponent value using bit shifts (C, C++) [duplicate]

Note - This is NOT a duplicate of this question - Count the consecutive zero bits (trailing) on the right in parallel: an explanation? . The linked question has a different context, it only asks the purpose of signed() being use. DO NOT mark this question as duplicate.
I've been finding a way to acquire the number of trailing zeros in a number. I found a bit twiddling Stanford University Write up HERE here that gives the following explanation.
unsigned int v; // 32-bit word input to count zero bits on right
unsigned int c = 32; // c will be the number of zero bits on the right
v &= -signed(v);
if (v) c--;
if (v & 0x0000FFFF) c -= 16;
if (v & 0x00FF00FF) c -= 8;
if (v & 0x0F0F0F0F) c -= 4;
if (v & 0x33333333) c -= 2;
if (v & 0x55555555) c -= 1;
Why does this end up working ? I have an understanding of how Hex numbers are represented as binary and bitwise operators, but I am unable to figure out the intuition behind this working ? What is the working mechanism ?
The code is broken (undefined behavior is present). Here is a fixed version which is also slightly easier to understand (and probably faster):
uint32_t v; // 32-bit word input to count zero bits on right
unsigned c; // c will be the number of zero bits on the right
if (v) {
v &= -v; // keep rightmost set bit (the one that determines the answer) clear all others
c = 0;
if (v & 0xAAAAAAAAu) c |= 1; // binary 10..1010
if (v & 0xCCCCCCCCu) c |= 2; // binary 1100..11001100
if (v & 0xF0F0F0F0u) c |= 4;
if (v & 0xFF00FF00u) c |= 8;
if (v & 0xFFFF0000u) c |= 16;
}
else c = 32;
Once we know only one bit is set, we determine one bit of the result at a time, by simultaneously testing all bits where the result is odd, then all bits where the result has the 2's-place set, etc.
The original code worked in reverse, starting with all bits of the result set (after the if (c) c--;) and then determining which needed to be zero and clearing them.
Since we are learning one bit of the output at a time, I think it's more clear to build the output using bit operations not arithmetic.
This code (from the net) is mostly C, although v &= -signed(v); isn't correct C. The intent is for it to behave as v &= ~v + 1;
First, if v is zero, then it remains zero after the & operation, and all of the if statements are skipped, so you get 32.
Otherwise, the & operation (when corrected) clears all bits to the left of the rightmost 1, so at that point v contains a single 1 bit. Then c is decremented to 31, i.e. all 1 bits within the possible result range.
The if statements then determine its numeric position one bit at a time (one bit of the position number, not of v), clearing the bits that should be 0.
The code first transforms v is such a way that is is entirely null, except the left most one that remains. Then, it determines the position of this first one.
First let's see how we suppress all ones but the left most one.
Assume that k is the position of the left most one in v. v=(vn-1,vn-2,..vk+1,1,0,..0).
-v is the number that added to v will give 0 (actually it gives 2^n, but bit 2^n is ignored if we only keep the n less significant bits).
What must the value of bits in -v so that v+-v=0?
obviously bits k-1..0 of -k must be at 0 so that added to the trailing zeros in v they give a zero.
bit k must be at 1. Added to the one in vk, it will give a zero and a carry at one at order k+1
bit k+1 of -v will be added to vk+1 and to the carry generated at step k. It must be the logical complement of vk+1. So whatever the value of vk+1, we will have 1+0+1 if vk+1=0 (or 1+1+0 if vk+1=1) and result will be 0 at order k+1 with a carry generated at order k+2.
This is similar for bits n-1..k+2 and they must all be the logical complement of the corresponding bit in v.
Hence, we get the well-known result that to get -v, one must
leave unchanged all trailing zeros of v
leave unchanged the left most one of v
complement all the other bits.
If we compute v&-v, we have
v vn-1 vn-2 ... vk+1 1 0 0 ... 0
-v & ~vn-1 ~vn-2 ... ~vk+1 1 0 0 ... 0
v&-v 0 0 ... 0 1 0 0 ... 0
So v&-v only keeps the left most one in v.
To find the location of first one, look at the code:
if (v) c--; // no 1 in result? -> 32 trailing zeros.
// Otherwise it will be in range c..0=31..0
if (v & 0x0000FFFF) c -= 16; // If there is a one in left most part of v the range
// of possible values for the location of this one
// will be 15..0.
// Otherwise, range must 31..16
// remaining range is c..c-15
if (v & 0x00FF00FF) c -= 8; // if there is one in either byte 0 (c=15) or byte 2 (c=31),
// the one is in the lower part of range.
// So we must substract 8 to boundaries of range.
// Other wise, the one is in the upper part.
// Possible range of positions of v is now c..c-7
if (v & 0x0F0F0F0F) c -= 4; // do the same for the other bits.
if (v & 0x33333333) c -= 2;
if (v & 0x55555555) c -= 1;

Convering Big Endian Formatted Bits to Intended Decimal Value While Ignoring First Bit

I am a reading binary file and trying to convert from IBM 4 Byte floating point to double in C++. How exactly would one use the first byte of IBM data to find the ccccccc in the given picture
IBM to value conversion chart
The code below gives an exponent way larger than what the data should have. I am confused with how the line
exponent = ((IBM4ByteValue[0] & 127) - 64);
executes, I do not understand the use of the & operator in this statement. But essentially what the previous author of this code implied is that (IBM4ByteValue[0]) is the ccccccc , so does this mean that the ampersand sets a maximum value that the left side of the operator can equal? Even if this is correct though I'm sure how this line accounts for the fact that there Big Endian bitwise notation in the first byte (I believe it is Big Endian after viewing the picture). Not to mention 1000001 and 0000001 should have the same exponent (-63) however they will not with my current interpretation of the previously mentioned line.
So in short could someone show me how to find the ccccccc (shown in the picture link above) using the first byte --> IBM4ByteValue[0]. Maybe accessing each individual bit? However I do not know the code to do this using my array.
**this code is using the std namespace
**I believe ret should be mantissa * pow(16, 24+exponent) however if I'm wrong about the exponent I'm probable wrong about this (I got the IBM Conversion from a previously asked stackoverflow question) **I would have just commented on the old post, but this question was a bit too large, pun intended, for a comment. It is also different in that I am asking how exactly one accesses the bits in an array storing whole bytes.
Code I put together using an IBM conversion from previous question answer
for (long pos = 0; pos < fileLength; pos += BUF_LEN) {
file.seekg(bytePosition);
file.read((char *)(&IBM4ByteValue[0]), BUF_LEN);
bytePosition += 4;
printf("\n%8ld: ", pos);
//IBM Conversion
double ret = 0;
uint32_t mantissa = 0;
uint16_t exponent = 0;
mantissa = (IBM4ByteValue[3] << 16) | (IBM4ByteValue[2] << 8)|IBM4ByteValue[1];
exponent = ((IBM4ByteValue[0] & 127) - 64);
ret = mantissa * exp2(-24 + 4 * exponent);
if (IBM4ByteValue[0] & 128) ret *= -1.;
printf(":%24f", ret);
printf("\n");
system("PAUSE");
}
The & operator basically takes the bits in that value of the array and masks it with the binary value of 127. If a bit in the value of the array is 1, and the corresponding bit position of 127 is 1, the bit will be a resulting 1. 1 & 0 would be 0, and so would 0 & 0 , and 0 & 1. You would be changing the bits. Then you would take the resulting bit value, converted to decimal now, and subtract 64 from it to equal your exponent.
In floating point we always have a bias (in this case, 64) for the exponent. This means that if your exponent is 5, 69 will be stored. So what this code is trying to do is find the original value of the exponent.

Can XorShift return zero?

I've been reading about the XorShift PRNG especially the paper here
A guy here states that
The number lies in the range [1, 2**64). Note that it will NEVER be 0.
Looking at the code that makes sense:
uint64_t x;
uint64_t next(void) {
x ^= x >> 12; // a
x ^= x << 25; // b
x ^= x >> 27; // c
return x * UINT64_C(2685821657736338717);
}
If x would be zero than every next number would be zero too. But wouldn't that make it less useful? The usual use-pattern would be something like min + rand() % (max - min) or converting the 64 bits to 32 bits if you only need an int. But if 0 is never returned than that might be a serious problem. Also the bits are not 0 or 1 with the same probability as obviously 0 is missing so zeroes or slightly less likely. I even can't find any mention of that on Wikipedia so am I missing something?
So what is a good/appropriate way to generate random, equally distributed numbers from XorShift64* in a given range?
Short answer: No it cannot return zero.
According the Numeric Recipes "it produces a full period of 2^64-1 [...] the missing value is zero".
The essence is that those shift values have been chosen carefully to make very long sequences (full possible one w/o zero) and hence one can be sure that every number is produced. Zero is indeed the fixpoint of this generator, hence it produces 2 sequences: Zero and the other containing every other number.
So IMO for a sufficiently small range max-min it is enough to make a function (next() - 1) % (max - min) + min or even omitting the subtraction altogether as zero will be returned by the modulo.
If one wants better quality equal distribution one should use the 'usual' method by using next() as a base generator with a range of [1, 2^64)
I am nearly sure that there is an x, for which the xorshift operation returns 0.
Proof:
First, we have these equations:
a = x ^ (x >> 12);
b = a ^ (a << 25);
c = b ^ (b >> 27);
Substituting them:
b = (x ^ x >> 12) ^ ((x ^ x >> 12) << 25);
c = b ^ (b >> 27) = ((x ^ x >> 12) ^ ((x ^ x >> 12) << 25)) ^ (((x ^ x >> 12) ^ ((x ^ x >> 12) << 25)) >> 27);
As you can see, although c is a complex equation, it is perfectly abelian.
It means, you can express the bits of c as fully boolean expressions of the bits of x.
Thus, you can simply construct an equation system for the bits b0, b1, b2, ... so:
(Note: the coefficients are only examples, I didn't calculate them, but so would it look):
c0 = x1 ^ !x32 ^ x47 ...
c1 = x23 ^ x45 ^ !x61 ...
...
c63 = !x13 ^ ...
From that point, you have 64 equations and 64 unknowns. You can simply solve it with Gauss-elimination, you will always have a single unique solution.
Except some rare cases, i.e. if the determinant of the coefficients of the equation system is zero, but it is very unlikely in the size of such a big matrix.
Even if it happens, it would mean that you have an information loss in every iteration, i.e. you can't get all of the 2^64 possible values of x, only some of them.
Now consider the much more probable possibility, that the coefficient matrix is non-zero. In this case, for all the possible 2^64 values of x, you have all possible 2^64 values of c, and these are all different.
Thus, you can get zero.
Extension: actually you get zero for zero... sorry, the proof is more useful to show that it is not so simple as it seems for the first spot. The important part is that you can express the bits of c as a boolean function of the bits of x.
There is another problem with this random number generator. And this is that even if you somehow modify the function to not have such problem (for example, by adding 1 in every iteration):
You still can't guarantee that it won't get into a short loop *for any possible values of x. What if there is a 5 length loop for value 345234523452345? Can you prove for all possible initial values? I can't.
Actually, having a really pseudorandom iteration function, your system will likely loop after 2^32 iterations. It has a nearly trivial combinatoric reason, but "unfortunately this margin is small to contain it" ;-)
So:
If a 2^32 loop length is for your PRNG okay, then use a proven iteration function collected from somewhere on the net.
If it isn't, upgrade the bit length to at least 2^128. It will result a roughly 2^64 loop length which is not so bad.
If you still want a 64-bit output, then use 128-bit numeric internally, but return (x>>64) ^ (x&(2^64-1)) (i.e. xor-ing the upper and lower half of the internal state x).

How can you get the j first most significant bits of an integer in C++?

I know that to get the first j least significant bits of an integer you can do the following:
int res = (myInteger & ((1<<j)-1))
Can you do something similar for the most significant bits?
Simply right shift: (Warning, fails when you want 0 bits, but yours fails for all bits)
unsigned dropbits = CHAR_BIT*sizeof(int)-j;
//if you want the high bits moved to low bit position, use this:
ullong res = (ullong)myInteger >> dropbits;
//if you want the high bits in the origonal position, use this:
ullong res = (ullong)myInteger >> dropbits << dropbits;
Important! The cast must be the unsigned version of your type.
It's also good to note that your code for the lowest j bits fails when you ask it for all (32?) bits. As such, it can be easier to doubleshift:
unsigned dropbits = CHAR_BIT*sizeof(int)-j;
ullong res = (ullong)myInteger << dropbits >> dropbits;
See it working here: http://coliru.stacked-crooked.com/a/64eb843b3b255278 and here: http://coliru.stacked-crooked.com/a/29bc40188d852dd3
To get the j highest bits of an integer (or rather an unsigned integer, because bitwise operations in signed integers are a recipe for pain):
unsigned res = myUnsignedInteger & ~(~0u >> j);
~0u consists of only set bits. Shifting that j bits to the right gives us j zero-bits on the left side followed by one-bits, and inverting that gives us j one-bits on the left followed by zeroes, which is the mask we need to isolate the j highest bits of another integer.
Note: This is under the assumption that you want the isolated bits to remain in the same place, which is to say
(0xdeadbeef & ~(~0u >> 12)) == 0xdea00000

Shift left/right adding zeroes/ones and dropping first bits

I've got to program a function that receives
a binary number like 10001, and
a decimal number that indicates how many shifts I should perform.
The problem is that if I use the C++ operator <<, the zeroes are pushed from behind but the first numbers aren't dropped... For example
shifLeftAddingZeroes(10001,1)
returns 100010 instead of 00010 that is what I want.
I hope I've made myself clear =P
I assume you are storing that information in int. Take into consideration, that this number actually has more leading zeroes than what you see, ergo your number is most likely 16 bits, meaning 00000000 00000001 . Maybe try AND-ing it with number having as many 1 as the number you want to have after shifting? (Assuming you want to stick to bitwise operations).
What you want is to bit shift and then limit the number of output bits which can be active (hold a value of 1). One way to do this is to create a mask for the number of bits you want, then AND the bitshifted value with that mask. Below is a code sample for doing that, just replace int_type with the type of value your using -- or make it a template type.
int_type shiftLeftLimitingBitSize(int_type value, int numshift, int_type numbits=some_default) {
int_type mask = 0;
for (unsigned int bit=0; bit < numbits; bit++) {
mask += 1 << bit;
}
return (value << numshift) & mask;
}
Your output for 10001,1 would now be shiftLeftLimitingBitSize(0b10001, 1, 5) == 0b00010.
Realize that unless your numbits is exactly the length of your integer type, you will always have excess 0 bits on the 'front' of your number.