Understanding two different ways of implementing CRC generation with LFSR - crc

There are two ways of implementing CRC generation with linear feedback shift registers (LFSR), as shown in this figure . The coefficients of generator polynomial in this picture are 100111, and the red "+" circles are exclusive-or operators. The initialization register values are 00000 for both.
For example, if the input data bit stream is 10010011, both A and B will give CRC checksum of 1010. The difference is A finishes with 8 shifts, while B with 8+5=13 shifts because of the 5 zeros appended to the input data. I can understand B very easily since it closely mimics the modulo-2 division. However, I can not understand mathematically how A can give the same result with 5 less shifts. I heard people were talking A took advantage of the pre-appending zeros, but I didn't get it. Can anyone explain it to me? Thanks!

Here is my quick understanding.
Let M(x) be the input message of order m (i.e. has m+1 bits) and G(x) be the CRC polynomial of order n. CRC result for such a message is given by
C(x) = (M(x) * xn) % G(x)
This is what the circuit B is implementing. The additional 5 cycles it takes is because of the xn operation.
Instead of following this approach, circuit A tries to do something smarter. Its trying to solve the question
If C(x) is the CRC of M(x), what would be the CRC for message {M(x), D}
where D is the new bit. So its trying to solve one bit at a time instead of entire message as in case of circuit b. Hence circuit A will take just 8 cycles for a 8 bit message.
Now since you already understand why circuit B looks the way it does, lets look at circuit A. The math, specifically for your case, for the effect of adding bit D to message M(x) on CRC is as below
Let current CRC C(x) be {c4, c3, c2, c1, c0} and new bit that is shifted in be D
NewCRC = {M(x), D}*x5) % G(x) = (({M(x), 0} * x5) % G(x)) XOR ((D * x5) % G(x))
which is ({c3, c2, c1, c0, 0} XOR {0, 0, c4, c4, c4}) XOR ({0, 0, D, D, D})
which is {c3, c2, c1^c4^D, c0^c4^D, c4^D}
i.e. the LFSR circuit A.

You can say that architecture (A) is implementing the modulo division by aligning MSB of the polyn with MSB of Message, so it is implementing something like the following (in my example I have another crc polyn actually):
But in Architecture (B), you can say we try to predict the MSB of the Message, so we align MSB of CRC polyn with MSB-1 of the message, something like the following:
I can recommend details about this operation in this tutorial

Related

Why is the result of a bitwise shift unrecoverable if there is a mathematical equivalent of the same operation?

Take for example the number 91. That number in binary is 1011011. If you shift that number to the right by 5 bits, you would get 2 (10 in binary). According to a google search, bit shifting to the left or right by a certain amount of bits is the same as multiplying or dividing the number by 2 to the power of the number of bits to be shifted, respectively. so to get from 91 to 2 by bit shifting, the equation would look like this: 91 / 2^5, which is also 91 / 32. Now, of course if you did that in your calculator, there would be some decimal values, which aren't included when bit shifting. The resulting 2 is actually 2.84357. I'm sure you know that if you do a certain operation on a number and then you do the inverse, the result would be what you had in the first place. So does decimal precision have something to do with this?
There is a mathematical equivalent of shifting to the right... and the mathematical operation is UNRECOVERABLE.
You seem to think that shifting to the right is:
bit shifting to the left or right by a certain amount of bits is the same as multiplying or dividing the number by 2
This is what you will hear people casually say, but it is only half right. As it it is not the same but only similar.
The correct statement is:
shifting a base-2 number one digit to the right is THE SAME as dividing by two in the integer domain
If you have an integer calculator, if you did 91/32 you will get 2. You will not get ANY decimal point because we are operating in the integer domain.
For real numbers, the equivalent operation is:
FLOOR(91/32)
Which is also unrecoverable because it also results in 2.
The lesson here is be careful when listening to what people CASUALLY say. Casual speech is often imprecise and assumes the listener is familiar with the subject. You need to dig deeper what the statement is actually trying to say.
As for why it is unrecoverable? Division of integers give two results: the quotient (which is the main result) and the remainder. When we divide 91 by 32 we are doing this:
2
_____
32 ) 91
64
__
27
So we get the result of 2 and a remainder of 27. The reason you can't get 91 by multiplying 2*32 is because we threw away the remainder.
You can get the result back if you saved the remainder. However, calculating the remainder is not a matter of simple shifts. Here's an example of how to make it reversable in C:
int test () {
int a = 91;
int b = 32;
int result;
int remainder;
result = a / b; // result will be 2
remainder = a % b; // remainder will be 27
return (result * b) + remainder; // returns 91
}
You can only recover the result of an operation if it has a 1-1 mapping between the inputs and outputs, i.e. it has an inverse function. But not all mathematical functions have an inverse function
For example if f(x) = x >> n with >> is the shift operator then it'll be equivalent to
f(x) = ⌊x/2n⌋
with ⌊ ⌋ being the floor function. Since there are many inputs that lead to the same output, the relationship isn't 1-1 and there can't be an inverse function for it. This function works the same for both signed and unsigned right shift:
91 >> 5 == floor(91.0/32.0) == 2
-91 >> 5 == floor(-91.0/32.0) == -3
Similarly for an unsigned left shift function g(x) = x << n then the equivalent is
g(x) = (x * 2n) mod 2N
with N being the size in bits of x, because integer math in hardware, C and many other languages always reduce modulo 2N due to the limit of register size and the use of two's complement. And it's clear that the modulo function also isn't invertible/recoverable. The signed left shift is almost the same with some small modifications

how does CF(Carry flag) get set according to the computation t = a-b where a and b are unsigned integers

I'm new to x64-64, just a question on how does CF get set? I was reading a textbook which says:
CF: Carry flag is used when most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.
I have two questions:
Q1-suppose we used one of the add instructions to perform the equivalent of the C assignment t = a+b, where variables a, b, and t are integers (only 3 bits for simplicity), so for 011(a) + 101(b) = 1000 = 000, since we have a carry out bit 1 in the fourth digit, so CF flag will be set to 1, is my understanding correct?
Q2-if my understanding in Q1 is true, and suppose we used one of the sub instructions to perform the equivalent of the C assignment t = a-b, where a, b, and t are unsigned integers, since a, b are unsigned, we can't actually do a+(-b), and I don't get how we can make 011(a) - 101(b) carry out of the most significant bit?
The carry flag is often called "borrow" when performing a subtraction. After a subtraction, it set if a 1 had to be borrowed from the next bit (or would have been borrowed if you used the grade-school subtraction method). The borrow flag is like a -1 in that bit position:
011 -1 211
- 101 -> - 101
----- -----
B 110
You can get the same result by adding a zero to the arguments, and then the carry or borrow will be the high bit of the result:
0011 - 0101 = 0011 + (-0101) = 0011 + 1011 = 1110

Lowest exponent of a CRC polynomial

I have never seen a CRC polynomial without the lowest term x⁰ = 1.
Are there any exceptions I haven't seen yet?
Why do all CRC polynomials have the lowest term x⁰?
A CRC polynomial of the form xn + ... + x0 is used for a n bit CRC (it is used with a borrowless divide of the data bits by the CRC polynomial that produces an n bit remainder, the CRC). If the CRC polynomial is of the form xn + ... + x1 (no x0 term), then it is effectively a n-1 bit CRC.
However, there are cases where common code may use different tables for fast computations of 32 bit or 16 bit CRC's, where the only difference in the main part of the code is the constants. The code is written as if the CRC is of the form x32 + ... + x0, but to allow most of the same code to generate a 16 bit CRC, the polynomial is of the form x32 + ... + x16. There's a final step correction done to shift the final CRC right by 16 bits to place the 16 bit CRC in the proper bits. An example of this is in this 500+ line fast crc32/16 assembly example using pclmulqdq insruction (carryless multiply), which in this case is setup to produce a 16 bit CRC.

Calculating polynomial division result as well as remainder (CRC)

I'm trying to write a table-based CRC routine for receiving Mode S uplink interrogator messages. On the downlink side, the CRC is just the 24-bit CRC based on polynomial P=0x1FFF409. So far, so good -- I wrote a table-based implementation that follows the usual byte-at-a-time convention, and it's working fine.
On the uplink side, though, things get weird. The protocol specification says that calculating the target uplink address is by finding:
U' = x^24 * U / G(x)
...where U is the received message and G(x) is the encoding polynomial 0x1FFF409, resulting in:
U' = x^24 * m(x) + A(x) + r(x) / G(x)
...where m(x) is the original message, A(x) is the address, and r(x) is the remainder. I want the low-order quotient A(x); e.g., the result of the GF(2) polynomial division operation instead of the remainder. The remainder is effectively discarded. The target address is encoded with the transmitted checksum such that the receiving aircraft can validate the checksum by comparing it with its address.
This is great and all, and I have a bitwise implementation which follows from the above. Please ignore the weird shifting of the polynomial and checksum, this has been cribbed from this Pascal implementation (on page 15) which assumes 32-bit registers and makes optimizations based on that assumption. In reality the message and checksum come as a single, 56-bit transmission.
#This is the reference bit-shifting implementation. It is slow.
def uplink_bitshift_crc():
p = 0xfffa0480 #polynomial (0x1FFF409 shifted left 7 bits)
a = 0x00000000 #rx'ed uplink data (32 bits)
adr = 0xcc5ee900 #rx'ed checksum (24 bits, shifted left 8 bits)
ad = 0 #will hold division result low-order bits
for j in range(56):
#if MSBit is 1, xor w/poly
if a & 0x80000000:
a = a ^ p
#shift off the top bit of A (we're done with it),
#and shift in the top bit of adr
a = ((a << 1) & 0xFFFFFFFF) + ((adr >> 31) & 1)
#shift off the top bit of adr
adr = (adr << 1) & 0xFFFFFFFF
if j > 30:
#shift ad left 1 bit and shift in the msbit of a
#this extracts the LS 24bits of the division operation
#and ignores the remainder at the end
ad = ad + ((a >> 31) & 1)
ad = ((ad << 1) & 0xFFFFFFFF)
#correct the ad
ad = ad >> 2
return ad
The above is of course slower than molasses in software and I'd really like to be able to construct a lookup table that would allow similar byte-at-a-time calculation of the received address, or massage the remainder (which is quickly calculated) into a quotient.
TL;DR:
Given a message, the encoding polynomial, and the remainder (calculated by the normal CRC method), is there a faster way to obtain the quotient of the polynomial division operation than by using shift registers to do polynomial division "longhand"?
You might take a look at the PyCRC library, I guess this may answer your questions.
Too late for the OP, but I'm posting this for others that might see this question. You can generate two tables to operate a byte at a time. The first 256 by 8 bit table is indexed by the current leading 8 bits of the dividend (message), and the 8 bit values are the quotients. The second 256 by 32 bit table is indexed by the 8 bit quotient and the 32 bit values are the 32 bit product of the 8 bit quotient times the 25 bit polynomial (since this is a carryless multiply, the product is 32 bits, (x^7 * x^24 = x^31)), which you xor to the upper 32 bits of the dividend, which will zero out the upper 8 bits of the dividend. Then loop back for the next 8 bits of the dividend.
A modern X86 cpu has the carryless multiply instruction, PCLMULQDQ that operates on 128 bit xmm registers, performing a 64 bit by 64 bit multiply to produce a 128 bit product (since it's a carryless multiply bit 127 is always 0, so it's really a 127 bit product). A multiply of the 56 bit message by the 41 bit constant 2^64/G(x) will produce a 96 bit product, of which the upper 32 bits will be the quotient (lower 64 bits are not used).

knuth multiplicative hash

Is this a correct implementation of the Knuth multiplicative hash.
int hash(int v)
{
v *= 2654435761;
return v >> 32;
}
Does overflow in the multiplication affects the algorithm?
How to improve the performance of this method?
Knuth multiplicative hash is used to compute an hash value in {0, 1, 2, ..., 2^p - 1} from an integer k.
Suppose that p is in between 0 and 32, the algorithm goes like this:
Compute alpha as the closest integer to 2^32 (-1 + sqrt(5)) / 2. We get alpha = 2 654 435 769.
Compute k * alpha and reduce the result modulo 2^32:
k * alpha = n0 * 2^32 + n1 with 0 <= n1 < 2^32
Keep the highest p bits of n1:
n1 = m1 * 2^(32-p) + m2 with 0 <= m2 < 2^(32 - p)
So, a correct implementation of Knuth multiplicative algorithm in C++ is:
std::uint32_t knuth(int x, int p) {
assert(p >= 0 && p <= 32);
const std::uint32_t knuth = 2654435769;
const std::uint32_t y = x;
return (y * knuth) >> (32 - p);
}
Forgetting to shift the result by (32 - p) is a major mistake. As you would lost all the good properties of the hash. It would transform an even sequence into an even sequence which would be very bad as all the odd slots would stay unoccupied. That's like taking a good wine and mixing it with Coke. By the way, the web is full of people misquoting Knuth and using a multiplication by 2 654 435 761 without taking the higher bits. I just opened the Knuth and he never said such a thing. It looks like some guy who decided he was "smart" decided to take a prime number close to 2 654 435 769.
Bare in mind that most hash tables implementations don't allow this kind of signature in their interface, as they only allow
uint32_t hash(int x);
and reduce hash(x) modulo 2^p to compute the hash value for x. Those hash tables cannot accept the Knuth multiplicative hash. This might be a reason why so many people completely ruined the algorithm by forgetting to take the higher p bits.
So you can't use the Knuth multiplicative hash with std::unordered_map or std::unordered_set. But I think that those hash tables use a prime number as a size, so the Knuth multiplicative hash is not useful in this case. Using hash(x) = x would be a good fit for those tables.
Source: "Introduction to Algorithms, third edition", Cormen et al., 13.3.2 p:263
Source: "The Art of Computer Programming, Volume 3, Sorting and Searching", D.E. Knuth, 6.4 p:516
Ok, I looked it up in TAOCP volume 3 (2nd edition), section 6.4, page 516.
This implementation is not correct, though as I mentioned in the comments it may give the correct result anyway.
A correct way (I think - feel free to read the relevant chapter of TAOCP and verify this) is something like this: (important: yes, you must shift the result right to reduce it, not use bitwise AND. However, that is not the responsibility of this function - range reduction is not properly part of hashing itself)
uint32_t hash(uint32_t v)
{
return v * UINT32_C(2654435761);
// do not comment about the lack of right shift. I'm not ignoring it. read on.
}
Note the uint32_t's (as opposed to int's) - they make sure the multiplication overflows modulo 2^32, as it is supposed to do if you choose 32 as the word size. There is also no right shift by k here, because there is no reason to give responsibility for range-reduction to the basic hashing function and it is actually more useful to get the full result. The constant 2654435761 is from the question, the actual suggested constant is 2654435769, but that's a small difference that as far as I know does not affect the quality of the hash.
Other valid implementations shift the result right by some amount (not the full word size though, that doesn't make sense and C++ doesn't like it), depending on how many bits of hash you need. Or they may use an other constant (subject to certain conditions) or an other word size. Reducing the hash modulo something is not a valid implementation, but a common mistake, likely it is a de-facto standard way to do range-reduction on a hash. The bottom bits of a multiplicative hash are the worst-quality bits (they depend on less of the input), you only want to use them if you really need more bits, while reducing the hash modulo a power of two would return only the worst bits. Indeed that is equivalent to throwing away most of the input bits too. Reducing modulo a non-power-of-two is not so bad since it does mix in the higher bits, but it's not how the multiplicative hash was defined.
So to be clear, yes there is a right shift, but that is range reduction not hashing and can only be the responsibility of the hash table, since it depends on its internal size.
The type should be unsigned, otherwise the overflow is unspecified (thus possibly wrong, not just on non-2's-complement architectures but also on overly clever compilers) and the optional right shift would be a signed shift (wrong).
On the page I mention at the top, there is this formula:
Here we have A = 2654435761 (or 2654435769), w = 232 and M = 232. Calculating AK/w gives a fixed-point result with the format Q32.32, the mod 1 step takes only the 32 fraction bits. But that's just the same thing as doing a modular multiplication and then saying that the result is the fraction bits. Of course when multiplied by M, all the fraction bits become integer bits because of how M was chosen, and so it simplifies to just a plain old modular multiplication. When M is a lower power of two, that just right-shifts the result, as mentioned.
Might be late, but heres a Java Implementation of Knuth's Method :
For a hashtable of Size N :
public long hash(int key) {
long l = 2654435769L;
return (key * l >> 32) % N ;
}
If the input argument is a pointer then I use this
#include <inttypes.h>
uint32_t knuth_mul_hash(void* k) {
ptrdiff_t v = (ptrdiff_t)k * UINT32_C(2654435761);
v >>= ((sizeof(ptrdiff_t) - sizeof(uint32_t)) * 8); // Right-shift v by the size difference between a pointer and a 32-bit integer (0 for x86, 32 for x64)
return (uint32_t)(v & UINT32_MAX);
}
I usually use this as the default fallback hashing function in hashmap implementations, dictionaries, sets, etc...