CRC checksum calculation algorithm - crc

Can anyone with good knowledge of CRC calculation verify that this code
https://github.com/psvanstrom/esphome-p1reader/blob/main/p1reader.h#L120
is actually calculating crc according to this description?
CRC is a CRC16 value calculated over the preceding characters in the data message (from
“/” to “!” using the polynomial: x16+x15+x2
+1). CRC16 uses no XOR in, no XOR out and is
computed with least significant bit first. The value is represented as 4 hexadecimal characters (MSB first).

There's nothing in the linked code about where it starts and ends, and how the result is eventually represented, but yes, that code implements that specification.

Related

Post-inversion of CRC32 result and trailing zeroes

For some very specific values, such as
FF FF FF FF 80 20 83 B8 ED
the CRC32 (using polynomial 0x04C11DB7 and pre and post-inversion) is 0xFFFFFFFF (crccalc.com).
Adding any number of trailing zeroes does not change the result (since that just multiplies the message polynomial).
My doubt is that, according to Wikipedia, post inversion was supposed to prevent just that:
A similar solution can be applied at the end of the message, inverting the CRC register before it is appended to the message. Again, any non-zero change will do; inverting all the bits (XORing with an all-ones pattern) is simply the most common.
That doesn't seem to be the case. Also, this answer by Mark Adler suggests that the post-inversion is just so the CRC of an empty message is 0x00000000.
Is the Wikipedia article incorrect or did I misunderstand something?
For any n-bit CRC and any current state of the CRC, there will exist a sequence of n bits in the message that will bring the internal CRC register to all zero bits. And many sequences of more than n bits that will do the same. From there on, any application of zero bits will leave the register all zeros.
That n-bit sequence is easily found. It is the internal CRC register bits themselves at that point. For example, the standard CRC-32 that you reference, when applied to the nine-byte message "123456789" is 0xcbf43926. Since the final exclusive-or is 0xffffffff, the internal CRC register at the end is the complement of that, 0x340bc6d9. This is a reflected CRC, so you need to feed that value starting from its least significant bit. Then you find that the CRC-32 of "123456789\xd9\xc6\x0b\x34" is 0xffffffff. Now I can follow that message with any number of zeros, and still get 0xffffffff. E.g. "123456789\xd9\xc6\x0b\x34\x00\x00\x00".
However that is the only such sequence of four bytes that will do that. In general, the probability of bringing the internal CRC register to all zeros given any appended sequence of n or more random bits will be 2-n. So unless you're being deliberate about it, this insensitivity to a subsequent sequence of zero bits will happen very infrequently.
Initializing the internal CRC register to a non-zero value, as many CRC definitions do, avoids this behavior at the very start of the process. It may not be unusual for the start of a message to be a sequence of zeros, so you would like for the CRC to be sensitive to the length of that sequence.
The final exclusive-or doesn't change the behavior. All it does is change the final CRC value that you would be stuck at once you arrive at the state of the internal CRC register being zero.
As you noted, the final exclusive-or is often set equal to the initial CRC register value, so that you get what some might consider to be a "nice" behavior that the CRC of an empty message is zero.
If the the CRC of the message before post-inversion is 0, such as the question's example, adding trailing zeroes won't change the CRC. The post-inversion detection of trailing zeroes only works when the CRC before post inversion is not 0.

crc32 hash default/invalid value?

I am building a simple string ID system using crc32 to generate 32 bit integer handles from my strings. I'd like to default the hash inside my StringID wrapper class to an invalid index by default, is there a value that crc32 will never generate? Will I have to use a separate flag?
Clarification: I am not interested in language specific answers. I'd simply like to know if there is an integer outside of the crc32 range that can be used to represent an unhashed value.
Thanks!
Is there a value that crc32 will never generate?
No, it will generate any/all values in the range of a 32-bit integer.
Will I have to use a separate flag?
Not necessarily.
If you decide that (e.g.) 0x00000000 means "CRC not set" and non-zero is the CRC value; then after calculating the CRC (but before storing it or checking the stored value) you can do if(CRCvalue == 0) CRCvalue = 0xFFFFFFFF;.
This weakens the CRC by an extremely tiny amount. Specifically, for 2 random pieces of data, for pure CRC32 there's 1 chance in 4294967296 of the CRCs matching, and with "zero means unset" there's 1 chance in 4294967295.000000000232830643654 of the CRCs matching.
There is an easy demonstration to the fact that you can generate any crc32 value, as it is de division mod P (where P is the generator polynomial) in a galois field (which happens to be a field, as real or complex numbers are), you can subtract (this is a XOR operation, so adding and subtracting are indeed the same thing) to your polynomial its modulus, giving a 0 remainder, then you can add to this multiple of the modulus any of all the possible crc32 values to it (as they are already remainders of divisions, their crc32 is just themselves) to get any of the 2^32 possible values.
It is a common practice to add as many zero bits as necessary to complete a full 32 bit word (this appears as a multiplication by a constant value x^32), and then subtract(xor) the remainder to that, making the result a multiple of of the modulus (remember that the addition and subtraction are the same ---a xor operation) and so making the crc32(pol) = 0x0000;
edit(easier to see)
Indeed, each of the possible 2^32 values for crc32, when divided by the generator polynomial, give themselves as a result (they are coprime with the generator polynomial, as are the numbers 1 .. N when doing arithmetic modulo N on integers) so they all are possible results of the crc32() operator.
The crc operation, as implemented in many places, is not that simple... as some implementations initialize the remainder register as 0xffffffff and look for 0xffffffff at termination(indeed, crc32 does this).... If you do the maths, you'll guess the reason for that: Initializing the register to 0x11111111 is equivalent to having a previous remainder of 0xffffffff in a longer string... and looking for 0xffffffff at the end is like appending 0xffffffff to the original string. This has the effect of concatenating the bit string 0xffffffff before and after your string, making the remainder sensible to appends of a string of zeros before and after the crc32 calculated string (altering the string of bits by appending zeros at either side). Anyway, this modification doesn't alter the original algorithm of calculating a polynomial remainder, so any of the 2**32 values are possible also in this case.
No. A CRC-32 can be any 32-bit value. You'll need to indicate an invalid index somewhere else.
My spoof code allows you to choose bit locations in a message to modify and the desired CRC, and will solve for which of those locations to flip to get exactly that CRC.

Why Zeros are appeneded to actual message while calculating CRC?

I was studying CRC algorithms and was refering to "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS" and "CRC Implementation Code in C"
Both pages mention that, say we need a 8bit CRC, 8 zeros are appended to message before starting calculations. Why this is done so?
So that the CRC can replace those zeros, effectively appending the CRC to the message, resulting in a message+CRC with a total CRC of zero.

Prove the linearity for CRC

I know that CRC is a linear function which means CRC(x xor y) = CRC(x) xor CRC(y), but I don't know how to prove this property for CRC.
Does anyone have any idea?
Thanks a lot!
That is not generally true. It is only true for CRCs that have the property that a CRC of a string of zeros is always zero. (That property is easily derived from your equation.) Most CRCs have pre and post processing, for which one of the purposes of the pre-processing is to assure that that is not the case. You wouldn't want a check algorithm to not be able to distinguish how many zeros there are in a string of zeros. Similarly, for such a check algorithm you could prepend any number of zeros to a message with no change in the check value.
A "pure" CRC without pre or post processing does have the linearity property you define. This can be seen by looking at what CRC register implementation does with a single bit and how that changes if you invert the bit. The one bit rolled off of one end of the register, which is determined by the bit you fed into the other end, determines if the register is exclusive-ored with the polynomial word. If that bit is inverted, that reverses that decision. So the exclusive-or of those two CRCs is the polynomial word. If you feed a single one bit out to that end of the register initialized as zero (this is where the no pre-processing is important), you get the polynomial word. So the CRC of the exclusive-or of the messages is equal to the exclusive-or of the CRCs. This is then extended to all bits by applying this finding one bit at a time.

CRC Procedure - Checking Efficiently

Let us get an m bit-message where the last n bits are the CRC bits. As far as I know, in order to check if it is received correctly or not, we should XOR all m bits with the polynomial of the specific CRC algorithm. If the result is all-zeros, we can say there are no errors.
Here are my questions:
1) What about calculating the n CRC bits using the first (m-n) bits and then compare it to the last n bits of the received message? This way we can say there are no errors if the received and calculated n bits are equal. Is this approach true?
2) If it is true, which is more efficient?
Your description of how to check a CRC doesn't really parse. But anyway, yes, the way that a CRC check is normally done is to calculate the CRC of the pre-CRC bits, and then to compare that to the CRC sent. It is very marginally more efficient that way. More importantly, it is more easily verifiable to be correct, since that is the way the CRC is generated and appended on the other end.
That method extends to any style of check value, where other check values do not have the mathematical property of getting zero if you run the CRC through the algorithm after the data that precedes it. Also CRCs with pre- and post-conditioning, which is most of them, won't have that property either. You would need to un-post-condition, and then compare the result with the pre-condition value.