Calculating polynomial division result as well as remainder (CRC) - crc

I'm trying to write a table-based CRC routine for receiving Mode S uplink interrogator messages. On the downlink side, the CRC is just the 24-bit CRC based on polynomial P=0x1FFF409. So far, so good -- I wrote a table-based implementation that follows the usual byte-at-a-time convention, and it's working fine.
On the uplink side, though, things get weird. The protocol specification says that calculating the target uplink address is by finding:
U' = x^24 * U / G(x)
...where U is the received message and G(x) is the encoding polynomial 0x1FFF409, resulting in:
U' = x^24 * m(x) + A(x) + r(x) / G(x)
...where m(x) is the original message, A(x) is the address, and r(x) is the remainder. I want the low-order quotient A(x); e.g., the result of the GF(2) polynomial division operation instead of the remainder. The remainder is effectively discarded. The target address is encoded with the transmitted checksum such that the receiving aircraft can validate the checksum by comparing it with its address.
This is great and all, and I have a bitwise implementation which follows from the above. Please ignore the weird shifting of the polynomial and checksum, this has been cribbed from this Pascal implementation (on page 15) which assumes 32-bit registers and makes optimizations based on that assumption. In reality the message and checksum come as a single, 56-bit transmission.
#This is the reference bit-shifting implementation. It is slow.
def uplink_bitshift_crc():
p = 0xfffa0480 #polynomial (0x1FFF409 shifted left 7 bits)
a = 0x00000000 #rx'ed uplink data (32 bits)
adr = 0xcc5ee900 #rx'ed checksum (24 bits, shifted left 8 bits)
ad = 0 #will hold division result low-order bits
for j in range(56):
#if MSBit is 1, xor w/poly
if a & 0x80000000:
a = a ^ p
#shift off the top bit of A (we're done with it),
#and shift in the top bit of adr
a = ((a << 1) & 0xFFFFFFFF) + ((adr >> 31) & 1)
#shift off the top bit of adr
adr = (adr << 1) & 0xFFFFFFFF
if j > 30:
#shift ad left 1 bit and shift in the msbit of a
#this extracts the LS 24bits of the division operation
#and ignores the remainder at the end
ad = ad + ((a >> 31) & 1)
ad = ((ad << 1) & 0xFFFFFFFF)
#correct the ad
ad = ad >> 2
return ad
The above is of course slower than molasses in software and I'd really like to be able to construct a lookup table that would allow similar byte-at-a-time calculation of the received address, or massage the remainder (which is quickly calculated) into a quotient.
TL;DR:
Given a message, the encoding polynomial, and the remainder (calculated by the normal CRC method), is there a faster way to obtain the quotient of the polynomial division operation than by using shift registers to do polynomial division "longhand"?

You might take a look at the PyCRC library, I guess this may answer your questions.

Too late for the OP, but I'm posting this for others that might see this question. You can generate two tables to operate a byte at a time. The first 256 by 8 bit table is indexed by the current leading 8 bits of the dividend (message), and the 8 bit values are the quotients. The second 256 by 32 bit table is indexed by the 8 bit quotient and the 32 bit values are the 32 bit product of the 8 bit quotient times the 25 bit polynomial (since this is a carryless multiply, the product is 32 bits, (x^7 * x^24 = x^31)), which you xor to the upper 32 bits of the dividend, which will zero out the upper 8 bits of the dividend. Then loop back for the next 8 bits of the dividend.
A modern X86 cpu has the carryless multiply instruction, PCLMULQDQ that operates on 128 bit xmm registers, performing a 64 bit by 64 bit multiply to produce a 128 bit product (since it's a carryless multiply bit 127 is always 0, so it's really a 127 bit product). A multiply of the 56 bit message by the 41 bit constant 2^64/G(x) will produce a 96 bit product, of which the upper 32 bits will be the quotient (lower 64 bits are not used).

Related

Why is the result of a bitwise shift unrecoverable if there is a mathematical equivalent of the same operation?

Take for example the number 91. That number in binary is 1011011. If you shift that number to the right by 5 bits, you would get 2 (10 in binary). According to a google search, bit shifting to the left or right by a certain amount of bits is the same as multiplying or dividing the number by 2 to the power of the number of bits to be shifted, respectively. so to get from 91 to 2 by bit shifting, the equation would look like this: 91 / 2^5, which is also 91 / 32. Now, of course if you did that in your calculator, there would be some decimal values, which aren't included when bit shifting. The resulting 2 is actually 2.84357. I'm sure you know that if you do a certain operation on a number and then you do the inverse, the result would be what you had in the first place. So does decimal precision have something to do with this?
There is a mathematical equivalent of shifting to the right... and the mathematical operation is UNRECOVERABLE.
You seem to think that shifting to the right is:
bit shifting to the left or right by a certain amount of bits is the same as multiplying or dividing the number by 2
This is what you will hear people casually say, but it is only half right. As it it is not the same but only similar.
The correct statement is:
shifting a base-2 number one digit to the right is THE SAME as dividing by two in the integer domain
If you have an integer calculator, if you did 91/32 you will get 2. You will not get ANY decimal point because we are operating in the integer domain.
For real numbers, the equivalent operation is:
FLOOR(91/32)
Which is also unrecoverable because it also results in 2.
The lesson here is be careful when listening to what people CASUALLY say. Casual speech is often imprecise and assumes the listener is familiar with the subject. You need to dig deeper what the statement is actually trying to say.
As for why it is unrecoverable? Division of integers give two results: the quotient (which is the main result) and the remainder. When we divide 91 by 32 we are doing this:
2
_____
32 ) 91
64
__
27
So we get the result of 2 and a remainder of 27. The reason you can't get 91 by multiplying 2*32 is because we threw away the remainder.
You can get the result back if you saved the remainder. However, calculating the remainder is not a matter of simple shifts. Here's an example of how to make it reversable in C:
int test () {
int a = 91;
int b = 32;
int result;
int remainder;
result = a / b; // result will be 2
remainder = a % b; // remainder will be 27
return (result * b) + remainder; // returns 91
}
You can only recover the result of an operation if it has a 1-1 mapping between the inputs and outputs, i.e. it has an inverse function. But not all mathematical functions have an inverse function
For example if f(x) = x >> n with >> is the shift operator then it'll be equivalent to
f(x) = ⌊x/2n⌋
with ⌊ ⌋ being the floor function. Since there are many inputs that lead to the same output, the relationship isn't 1-1 and there can't be an inverse function for it. This function works the same for both signed and unsigned right shift:
91 >> 5 == floor(91.0/32.0) == 2
-91 >> 5 == floor(-91.0/32.0) == -3
Similarly for an unsigned left shift function g(x) = x << n then the equivalent is
g(x) = (x * 2n) mod 2N
with N being the size in bits of x, because integer math in hardware, C and many other languages always reduce modulo 2N due to the limit of register size and the use of two's complement. And it's clear that the modulo function also isn't invertible/recoverable. The signed left shift is almost the same with some small modifications

Lowest exponent of a CRC polynomial

I have never seen a CRC polynomial without the lowest term x⁰ = 1.
Are there any exceptions I haven't seen yet?
Why do all CRC polynomials have the lowest term x⁰?
A CRC polynomial of the form xn + ... + x0 is used for a n bit CRC (it is used with a borrowless divide of the data bits by the CRC polynomial that produces an n bit remainder, the CRC). If the CRC polynomial is of the form xn + ... + x1 (no x0 term), then it is effectively a n-1 bit CRC.
However, there are cases where common code may use different tables for fast computations of 32 bit or 16 bit CRC's, where the only difference in the main part of the code is the constants. The code is written as if the CRC is of the form x32 + ... + x0, but to allow most of the same code to generate a 16 bit CRC, the polynomial is of the form x32 + ... + x16. There's a final step correction done to shift the final CRC right by 16 bits to place the 16 bit CRC in the proper bits. An example of this is in this 500+ line fast crc32/16 assembly example using pclmulqdq insruction (carryless multiply), which in this case is setup to produce a 16 bit CRC.

bitmasking and binary arithmetic

I'd like to know the science behind the following. a 32 bit value is shifted left 32 times in a 64 bit type, then a division is performed. somehow the precision is contained within the last 32 bits and in order to retrieve the value as a floating point number, I can multiply by 1 over the max value of an unsigned 32 bit int.
phase = ((uint64) 44100 << 32) / 48000;
(phase & 0xffffffff) * (1.0f / 4294967296.0f);// == 0.918749988
the same as
(float)44100/48000;// == 0.918749988
(...)
If you lose precision when dividing two integer numbers, you should remember the remainder.
The reminder in C++ can be taken by doing 44100%48000 in your case.
Actually these are constants and it's completely clear that 44100/48000 == 0, so remainder is all you have.
Well, the reminder will even be -- guess what -- 44100!
The float type (imposed by the explicit cast) has only 6 significant digits. So 4294967296.0f will be simply 429496e4 (in mathematics: 429496*10^4). That's why this type isn't valuable for anything but playing around.
The best way to get a value of fixed integer type in which all bits are set, and not miss the correct number of 'f' in 0xfffff, is to use the ~ operator and 0 value. In your case, ~uint32_t(0).
Well, I should have said this in the beginning: 44100.0/48000 should give you the result you want. :P
this is the answer I was looking for
bit shifting left will provide that number of bits in which to store the precision vale from a division.
dividing the integer value represented by these bits by 2 to the power of the bit shift amount will return the precision value
e.g
0000 0001 * 2^8 = 1 0000 0000 = 256(base 10)
1 0000 0000 / 2 = 1000 0000 = 128(base 10)
128 / 2^8 = 0.5

How is this size alignment working

I am not able to understand the below code with respect to the comment provided. What does this code does, and what would be the equivalent code for 8-aligned?
/* segment size must be 4-aligned */
attr->options.ssize &= ~3;
Here, ssize is of unsigned int type.
Since 4 in binary is 100, any value aligned to 4-byte boundaries (i.e. a multiple of 4) will have the last two bits set to zero.
3 in binary is 11, and ~3 is the bitwise negation of those bits, i.e., ...1111100. Performing a bitwise AND with that value will keep every bit the same, except the last two which will be cleared (bit & 1 == bit, and bit & 0 == 0). This gives us a the next lower or equal value that is a multiple of 4.
To do the same operation for 8 (1000 in binary), we need to clear out the lowest three bits. We can do that with the bitwise negation of the binary 111, i.e., ~7.
All powers of two (1, 2, 4, 8, 16, 32...) can be aligned by simple a and operation.
This gives the size rounded down:
size &= ~(alignment - 1);
or if you want to round up:
size = (size + alignment-1) & ~(alignment-1);
The "alignment-1", as long as it's a value that is a power of two, will give you "all ones" up to the bit just under the power of two. ~ inverts all the bits, so you get ones for zeros and zeros for ones.
You can check that something is a power of two by:
bool power_of_two = !(alignment & (alignment-1))
This works because, for example 4:
4 = 00000100
4-1 = 00000011
& --------
0 = 00000000
or for 16:
16 = 00010000
16-1 = 00001111
& --------
0 = 00000000
If we use 5 instead:
5 = 00000101
4-1 = 00000100
& --------
4 = 00000100
So not a power of two!
Perhaps more understandable comment would be
/* make segment size 4-aligned
by zeroing two least significant bits,
effectively rounding down */
Then at least for me, immediate question pops to my mind: should it really be rounded down, when it is size? Wouldn't rounding up be more appropriate:
attr->options.ssize = (attr->options.ssize + 3) & ~3;
As already said in other answers, to make it 8-aligned, 3 bits need to be zeroed, so use 7 instead of 3. So, we might make it into a function:
unsigned size_align(unsigned size, unsigned bit_count_to_zero)
{
unsigned bits = (1 << bit_count_to_zero) - 1;
return (size + bits) & ~bits;
}
~3 is the bit pattern ...111100. When you do a bitwise AND with that pattern, it clears the bottom two bits, i.e. rounds down to the nearest multiple of 4.
~7 does the same thing for 8-aligned.
The code ensures the bottom two bits of ssize are cleared, guaranteeing that ssize is a multiple of 4. Equivalent code for 8-aligned would be
attr->options.ssize &= ~7;
number = number & ~3
The number is rounded off to the nearest multiple of 4 that is lesser than number
Ex:
if number is 0,1,2 or 3, the `number` is rounded off to 0
similarly if number is 4,5,6,or 7,numberis rounded off to 4
But if this is related to memory alignment, the memory must be aligned upwards and not downwards.

Why perform multiplication in this way?

I've run into this function:
static inline INT32 MPY48SR(INT16 o16, INT32 o32)
{
UINT32 Temp0;
INT32 Temp1;
// A1. get the lower 16 bits of the 32-bit param
// A2. multiply them with the 16-bit param
// A3. add 16384 (TODO: why?)
// A4. bitshift to the right by 15 (TODO: why 15?)
Temp0 = (((UINT16)o32 * o16) + 0x4000) >> 15;
// B1. Get the higher 16 bits of the 32-bit param
// B2. Multiply them with the 16-bit param
Temp1 = (INT16)(o32 >> 16) * o16;
// 1. Shift B to the left (TODO: why do this?)
// 2. Combine with A and return
return (Temp1 << 1) + Temp0;
}
The inline comments are mine. It seems that all it's doing is multiplying the two arguments. Is this right, or is there more to it? Why would this be done in such a way?
Those parameters don't represent integers. They represent real numbers in fixed-point format with 15 bits to the right of the radix point. For instance, 1.0 is represented by 1 << 15 = 0x8000, 0.5 is 0x4000, -0.5 is 0xC000 (or 0xFFFFC000 in 32 bits).
Adding fixed-point numbers is simple, because you can just add their integer representation. But if you want to multiply, you first have to multiply them as integers, but then you have twice as many bits to the right of the radix point, so you have to discard the excess by shifting. For instance, if you want to multiply 0.5 by itself in 32-bit format, you multiply 0x00004000 (1 << 14) by itself to get 0x10000000 (1 << 28), then shift right by 15 bits to get 0x00002000 (1 << 13). To get better accuracy, when you discard the lowest 15-bits, you want to round to the nearest number, not round down. You can do this by adding 0x4000 = 1 << 14. Then if the discarded 15 bits is less than 0x4000, it gets rounded down, and if it's 0x4000 or more, it gets rounded up.
(0x3FFF + 0x4000) >> 15 = 0x7FFF >> 15 = 0
(0x4000 + 0x4000) >> 15 = 0x8000 >> 15 = 1
To sum up, you can do the multiplication like this:
return (o32 * o16 + 0x4000) >> 15;
But there's a problem. In C++, the result of a multiplication has the same type as its operands. So o16 is promoted to the same size as o32, then they are multiplied to get a 32-bit result. But this throws away the top bits, because the product needs 16 + 32 = 48 bits for accurate representation. One way to do this is to cast the operands to 64 bits and then multiply, but that might be slower, and it's not supported on all machines. So instead it breaks o32 into two 16-bit pieces, then does two multiplications in 32-bits, and combines the results.
This implements multiplication of fixed-point numbers. The numbers are viewed as being in the Q15 format (having 15 bits in the fractional part).
Mathematically, this function calculates (o16 * o32) / 2^15, rounded to nearest integer (hence the 2^14 factor, which represents 1/2, added to a number in order to round it). It uses unsigned and signed 16-bit multiplications with 32-bit result, which are presumably supported by the instruction set.
Note that there exists a corner case, where each of the numbers has a minimal value (-2^15 and -2^31); in this case, the result (2^31) is not representable in the output, and gets wrapped over (becomes -2^31 instead). For all other combinations of o16 and o32, the result is correct.