I was looking at this page and I saw that the terms of this polynomial:
0xad0424f3 = x^32 +x^30 +x^28 +x^27 +x^25 +x^19 +x^14 +x^11 +x^8 +x^7 +x^6 +x^5 +x^2 +x +1
which seems not correct since converting the Hex:
0xad0424f3 is 10101101000001000010010011110011
It would become:
x^31+ x^29+ x^27+ x^26+ x^24+ x^18+ x^13+ x^10+ x^7+ x^6+ x^5+ x^4+ x^1+ x^0
Can you help me understand which one is correct?
what about 64 bit ECMA polynomial,
0xC96C5795D7870F42
I want to know the number of terms in each polynomial 0xad0424f3 and 0xC96C5795D7870F42.
That page is on Koopman's web site, where he has his own notation for CRC polynomials. Since all CRC polynomials have a 1 term, he drops that term, divides the polynomial by x, and represents that in binary. That's what you're looking at.
The benefit is that with a 64-bit word, you can then represent all 64-bit and shorter CRC polynomials, with the length of the CRC denoted by the most significant 1 in the word.
The downside is that only Koopman uses that notation, as far as I know, resulting in some confusion by others. Like yourself.
As for your 64-bit CRC, that polynomial that you note is from the Wikipedia page is actually the reversed version, and is not in Koopman's notation. The expansion into a polynomial is shown right there, underneath the hex representation. It has 34 terms.
I learned an error detection technique called crc. crc calculations are done in modulo-2 arithmetic without carries in addition or borrows in subtraction. I wonder the reason why crc takes modulo-2 arithmetic rather than regular binary arithmetic. Is it easier to be implemented in digital circuit?
Maybe better late than never for an answer. A CRC treats the data as a string of 1 bit coefficients of a polynomial, since the coefficients are numbers modulo 2. From a math perspective, for an n bit CRC, the data polynomial is multiplied by x^n, effectively adding n 0 bit coefficients to the data, then dividing that data + zeroes by a n+1 bit CRC polynomial, resulting in a n bit remainder, which is the CRC. If "encoding" the data with a CRC, the remainder is "subtracted" from the data + zeroes, but for single bit coefficients, adding and subtracting are both xor, so the CRC is just appended to the data, replacing the zeroes that were appended to the data to generate the CRC.
The reason for no carries or borrows across coefficients is because it's polynomial math.
Although not done for CRC, Reed Solomon codes are somewhat similar, but the polynomial coefficients may be numbers modulo some prime number other than 2, such as 929, and for prime numbers other than 2, it's important to keep track of when addition or subtraction modulo a prime number other than 2 is used.
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
My question stems from the observation made that we can use a Linear Feedback Shift Register to perform a CRC check. Algebraically this normally is of the form;
S(x) = M(x) * x^k % G(x) ( gives the remainder, for a G(x) of order k)
The implementation of this is shown in this question, (and registers are all initialised to zero) and the mathematical bitwise calculation of the XOR division is shown in this question here.
I understand both of these - however, I also know that another common way of using an LFSR is to have no input, but instead preload the registers with non-zero values, and run (with zero as an input) to form a sequence of pseudo random numbers. This is shown in the image below
My question is, just as the CRC can be represented as a modulo-2 division both bitwise and algebraically, can we do the same for an LFSR sequence generator, given the generator polynomial and initial values? And if so, an example would be great!
Thanks very much, feel free to correct me if I've misrepresented or misunderstood a concept!
I am encoding large integers into an array of size_t. I already have the other operations working (add, subtract, multiply); as well as division by a single digit. But I would like match the time complexity of my multiplication algorithms if possible (currently Toom-Cook).
I gather there are linear time algorithms for taking various notions of multiplicative inverse of my dividend. This means I could theoretically achieve division in the same time complexity as my multiplication, because the linear-time operation is "insignificant" by comparison anyway.
My question is, how do I actually do that? What type of multiplicative inverse is best in practice? Modulo 64^digitcount? When I multiply the multiplicative inverse by my divisor, can I shirk computing the part of the data that would be thrown away due to integer truncation? Can anyone provide C or C++ pseudocode or give a precise explanation of how this should be done?
Or is there a dedicated division algorithm that is even better than the inverse-based approach?
Edit: I dug up where I was getting "inverse" approach mentioned above. On page 312 of "Art of Computer Programming, Volume 2: Seminumerical Algorithms", Knuth provides "Algorithm R" which is a high-precision reciprocal. He says its time complexity is less than that of multiplication. It is, however, nontrivial to convert it to C and test it out, and unclear how much overhead memory, etc, will be consumed until I code this up, which would take a while. I'll post it if no one beats me to it.
The GMP library is usually a good reference for good algorithms. Their documented algorithms for division mainly depend on choosing a very large base, so that you're dividing a 4 digit number by a 2 digit number, and then proceed via long division.
Long division will require computing 2 digit by 1 digit quotients; this can either be done recursively, or by precomputing an inverse and estimating the quotient as you would with Barrett reduction.
When dividing a 2n-bit number by an n-bit number, the recursive version costs O(M(n) log(n)), where M(n) is the cost of multiplying n-bit numbers.
The version using Barrett reduction will cost O(M(n)) if you use Newton's algorithm to compute the inverse, but according to GMP's documentation, the hidden constant is a lot larger, so this method is only preferable for very large divisions.
In more detail, the core algorithm behind most division algorithms is an "estimated quotient with reduction" calculation, computing (q,r) so that
x = qy + r
but without the restriction that 0 <= r < y. The typical loop is
Estimate the quotient q of x/y
Compute the corresponding reduction r = x - qy
Optionally adjust the quotient so that the reduction r is in some desired interval
If r is too big, then repeat with r in place of x.
The quotient of x/y will be the sum of all the qs produced, and the final value of r will be the true remainder.
Schoolbook long division, for example, is of this form. e.g. step 3 covers those cases where the digit you guessed was too big or too small, and you adjust it to get the right value.
The divide and conquer approach estimates the quotient of x/y by computing x'/y' where x' and y' are the leading digits of x and y. There is a lot of room for optimization by adjusting their sizes, but IIRC you get best results if x' is twice as many digits of y'.
The multiply-by-inverse approach is, IMO, the simplest if you stick to integer arithmetic. The basic method is
Estimate the inverse of y with m = floor(2^k / y)
Estimate x/y with q = 2^(i+j-k) floor(floor(x / 2^i) m / 2^j)
In fact, practical implementations can tolerate additional error in m if it means you can use a faster reciprocal implementation.
The error is a pain to analyze, but if I recall the way to do it, you want to choose i and j so that x ~ 2^(i+j) due to how errors accumulate, and you want to choose x / 2^i ~ m^2 to minimize the overall work.
The ensuing reduction will have r ~ max(x/m, y), so that gives a rule of thumb for choosing k: you want the size of m to be about the number of bits of quotient you compute per iteration — or equivalently the number of bits you want to remove from x per iteration.
I do not know the multiplicative inverse algorithm but it sounds like modification of Montgomery Reduction or Barrett's Reduction.
I do bigint divisions a bit differently.
See bignum division. Especially take a look at the approximation divider and the 2 links there. One is my fixed point divider and the others are fast multiplication algos (like karatsuba,Schönhage-Strassen on NTT) with measurements, and a link to my very fast NTT implementation for 32bit Base.
I'm not sure if the inverse multiplicant is the way.
It is mostly used for modulo operation where the divider is constant. I'm afraid that for arbitrary divisions the time and operations needed to acquire bigint inverse can be bigger then the standard divisions itself, but as I am not familiar with it I could be wrong.
The most common divider in use I saw in implemetations are Newton–Raphson division which is very similar to approximation divider in the link above.
Approximation/iterative dividers usually use multiplication which define their speed.
For small enough numbers is usually long binary division and 32/64bit digit base division fast enough if not fastest: usually they have small overhead, and let n be the max value processed (not the number of digits!)
Binary division example:
Is O(log32(n).log2(n)) = O(log^2(n)).
It loops through all significant bits. In each iteration you need to compare, sub, add, bitshift. Each of those operations can be done in log32(n), and log2(n) is the number of bits.
Here example of binary division from one of my bigint templates (C++):
template <DWORD N> void uint<N>::div(uint &c,uint &d,uint a,uint b)
{
int i,j,sh;
sh=0; c=DWORD(0); d=1;
sh=a.bits()-b.bits();
if (sh<0) sh=0; else { b<<=sh; d<<=sh; }
for (;;)
{
j=geq(a,b);
if (j)
{
c+=d;
sub(a,a,b);
if (j==2) break;
}
if (!sh) break;
b>>=1; d>>=1; sh--;
}
d=a;
}
N is the number of 32 bit DWORDs used to store a bigint number.
c = a / b
d = a % b
qeq(a,b) is a comparison: a >= b greater or equal (done in log32(n)=N)
It returns 0 for a < b, 1 for a > b, 2 for a == b
sub(c,a,b) is c = a - b
The speed boost is gained from that this does not use multiplication (if you do not count the bit shift)
If you use digit with a big base like 2^32 (ALU blocks), then you can rewrite the whole in polynomial like style using 32bit build in ALU operations.
This is usually even faster then binary long division, the idea is to process each DWORD as a single digit, or recursively divide the used arithmetic by half until hit the CPU capabilities.
See division by half-bitwidth arithmetics
On top of all that while computing with bignums
If you have optimized basic operations, then the complexity can lower even further as sub-results get smaller with iterations (changing the complexity of basic operations) A nice example of that are NTT based multiplications.
The overhead can mess thing up.
Due to this the runtime sometimes does not copy the big O complexity, so you should always measure the tresholds and use faster approach for used bit-count to get the max performance and optimize what you can.
First off, if there's a better site to ask this question then please do migrate this or close it and let me know where to go.
Secondly, we're discussing CRC in one of my classes, and neither us nor the professor understand why CRC polynomials are one bit longer than the name (or resulting checksum) suggest. I've done some searching, but nothing seems to discuss why it's one bit longer.
A CRC is the remainder after dividing the message by the polynomial. By definition, the remainder has to be less than the length of the polynomial. Hence the CRC for a "33-bit" polynomial is 32 bits.
Note that the largest exponent of a "33-bit" polynomial is 32 (the lowest term has exponent zero), so the degree of the polynomial, as well as the length of the CRC is 32.