I know that CRC is a linear function which means CRC(x xor y) = CRC(x) xor CRC(y), but I don't know how to prove this property for CRC.
Does anyone have any idea?
Thanks a lot!
That is not generally true. It is only true for CRCs that have the property that a CRC of a string of zeros is always zero. (That property is easily derived from your equation.) Most CRCs have pre and post processing, for which one of the purposes of the pre-processing is to assure that that is not the case. You wouldn't want a check algorithm to not be able to distinguish how many zeros there are in a string of zeros. Similarly, for such a check algorithm you could prepend any number of zeros to a message with no change in the check value.
A "pure" CRC without pre or post processing does have the linearity property you define. This can be seen by looking at what CRC register implementation does with a single bit and how that changes if you invert the bit. The one bit rolled off of one end of the register, which is determined by the bit you fed into the other end, determines if the register is exclusive-ored with the polynomial word. If that bit is inverted, that reverses that decision. So the exclusive-or of those two CRCs is the polynomial word. If you feed a single one bit out to that end of the register initialized as zero (this is where the no pre-processing is important), you get the polynomial word. So the CRC of the exclusive-or of the messages is equal to the exclusive-or of the CRCs. This is then extended to all bits by applying this finding one bit at a time.
Related
For some very specific values, such as
FF FF FF FF 80 20 83 B8 ED
the CRC32 (using polynomial 0x04C11DB7 and pre and post-inversion) is 0xFFFFFFFF (crccalc.com).
Adding any number of trailing zeroes does not change the result (since that just multiplies the message polynomial).
My doubt is that, according to Wikipedia, post inversion was supposed to prevent just that:
A similar solution can be applied at the end of the message, inverting the CRC register before it is appended to the message. Again, any non-zero change will do; inverting all the bits (XORing with an all-ones pattern) is simply the most common.
That doesn't seem to be the case. Also, this answer by Mark Adler suggests that the post-inversion is just so the CRC of an empty message is 0x00000000.
Is the Wikipedia article incorrect or did I misunderstand something?
For any n-bit CRC and any current state of the CRC, there will exist a sequence of n bits in the message that will bring the internal CRC register to all zero bits. And many sequences of more than n bits that will do the same. From there on, any application of zero bits will leave the register all zeros.
That n-bit sequence is easily found. It is the internal CRC register bits themselves at that point. For example, the standard CRC-32 that you reference, when applied to the nine-byte message "123456789" is 0xcbf43926. Since the final exclusive-or is 0xffffffff, the internal CRC register at the end is the complement of that, 0x340bc6d9. This is a reflected CRC, so you need to feed that value starting from its least significant bit. Then you find that the CRC-32 of "123456789\xd9\xc6\x0b\x34" is 0xffffffff. Now I can follow that message with any number of zeros, and still get 0xffffffff. E.g. "123456789\xd9\xc6\x0b\x34\x00\x00\x00".
However that is the only such sequence of four bytes that will do that. In general, the probability of bringing the internal CRC register to all zeros given any appended sequence of n or more random bits will be 2-n. So unless you're being deliberate about it, this insensitivity to a subsequent sequence of zero bits will happen very infrequently.
Initializing the internal CRC register to a non-zero value, as many CRC definitions do, avoids this behavior at the very start of the process. It may not be unusual for the start of a message to be a sequence of zeros, so you would like for the CRC to be sensitive to the length of that sequence.
The final exclusive-or doesn't change the behavior. All it does is change the final CRC value that you would be stuck at once you arrive at the state of the internal CRC register being zero.
As you noted, the final exclusive-or is often set equal to the initial CRC register value, so that you get what some might consider to be a "nice" behavior that the CRC of an empty message is zero.
If the the CRC of the message before post-inversion is 0, such as the question's example, adding trailing zeroes won't change the CRC. The post-inversion detection of trailing zeroes only works when the CRC before post inversion is not 0.
The algorithm to calculate a CRC involves dividing (mod 2) the data by a polynomial, and that, by nature starts at the biggest bit using the basic long division algorithm and works down (unless you're taking the shortcuts and using tables).
Now, the stream I'm dealing with has the requirements that the data is added little endian and the CRC remainder goes at the end, whereas if the CRC was applied and appended; the CRC remainder bits would appear at the leftmost point in the least significant bit given the bitstream is little endian.
So here's the question. We have a little endian stream with the CRC remainder at the "unexpected" end (correct me if I'm wrong please), should the CRC remainder be added big endian at the end of the bytestream, and then the CRC run on the whole bytestream (this is what I expect from the requirements) or something else?
How in industry is this normally done?
Major update for clarity thanks to Mark Adler's highly helpful answer.
I've read a few posts, but seen nothing where there seems to be a little endian bytestream with a CRC in the MSB (rightmost).
The image above should describe what I'm doing. All the bytes are big endian bit order, but the issue is that the requirements state that the bytes should be little endian byte ordered, and then the CRC tacked on the end.
For the bytestream as a sequence of bits to be validated by the CRC remainder being placed at the end, the CRC remainder bytes should be added bit endian, therefore allowing the message as a whole to be validated with the polynomial. However this involves adding bytes to the stream in a mix of endiannesses, which sounds highly ugly and wrong.
I will assume that by "biggest" bit, you mean most significant. First off, you can take either the most-significant bit or the least-significant bit of the first byte as the highest power of x for polynomial division. Both are in common use. There is no "by nature" here. And that has nothing to do with whether tables are used or not. Taking the least-significant bit as the highest power of x, the one you would call "not by nature" is in very common use, due to slightly faster and simpler software implementations as compared to using the most-significant bit.
Second, bit streams are neither "little endian", nor "big endian". Those terms are used for how integers are broken up into a sequence of bytes. That has nothing to do with the interpretation of a stream of bits as a polynomial. The terms you seem to be looking for are "reflected" and "not reflected" bit streams in and CRCs out. "reflected" means that the highest power of x is the least significant bit, and "not reflected" means it is the most significant bit.
If you look at Greg Cook's catalogue of CRCs, you will see as part of each definition refin=false refout=false or refin=true refout=true, meaning that the data coming in is reflected or not, and the CRC coming out is reflected or not, referring to where the highest power of x is found. For the CRC, the entire n-bits is reflected or not. In actual implementations, no bits are flipped for the input data or the output CRC. Instead, the constant CRC polynomial is reflected to match the data and CRC reflections. That is done once as the code is written, never during execution. (There is one outlier CRC in Greg's catalogue, CRC-12/UMTS, that has refin=false refout=true. For that one, the implementation would in fact have to reflect the CRC result every time.)
Given all that, I am left attempting to intepret your question. What do you mean by "the data is added little endian"? Does that mean the CRC is being calculated using the least-significant bit as the highest power of x (the opposite of your "by nature")? What does "the CRC remainder bits would appear at the leftmost point in the least significant bit given the bitstream is little endian" mean? That one is really confusing, since there is no leftmost point of a bit, and I can't tell at all what you're trying to say about the arrangement of the remainder bits.
The only thing I think I understand and can try to answer here is: "How in industry is this normally done?"
Well, as you can tell from the list of over a hundred CRCs, there is little normalcy established. What I can say is that CRCs have a special property that leads to a "natural" (now I can use that word) ordering of the CRC bits and bytes at the end of the stream of bits and bytes that the CRC was calculated on. That property is that if you append it properly, the CRC of the entire message, including the CRC at the end, will always be the same constant, if there are no errors in the message. Now little and big endian are useful terms, but only for the CRC itself, not the bit or byte stream. The proper order is little endian for reflected CRCs and big endian for non-reflected CRCs. (This assumes that the input and output have the same reflection, so this won't work for that one outlier CRC.)
Of course, I have seen many cases where a reflected CRC is used, but is appended to the stream big-endian, and vice versa, in which case this calculation of the CRC on the entire message doesn't work. That's ok, since the alternative way to check the CRC is to simply repeat what was done before transmission, which is to calculate the CRC only on the data portion of the message, then properly assemble the CRC from the bytes that follow it, and compare the two values. That is what would be done for any other hash that doesn't have that elegant mathematical property of CRCs.
Can anyone with good knowledge of CRC calculation verify that this code
https://github.com/psvanstrom/esphome-p1reader/blob/main/p1reader.h#L120
is actually calculating crc according to this description?
CRC is a CRC16 value calculated over the preceding characters in the data message (from
“/” to “!” using the polynomial: x16+x15+x2
+1). CRC16 uses no XOR in, no XOR out and is
computed with least significant bit first. The value is represented as 4 hexadecimal characters (MSB first).
There's nothing in the linked code about where it starts and ends, and how the result is eventually represented, but yes, that code implements that specification.
Sorry if I should be able to answer this simple question myself!
I am working on an embedded system with a 32bit CRC done in hardware for speed. A utility exists that I cannot modify that initially takes 3 inputs (words) and returns a CRC.
If a standard 32 bit was implemented, would generating a CRC from a 32 bit word of actual data and 2 32 bit words comprising only of zeros produce a less reliable CRC than if I just made up/set some random values for the last 2 32?
Depending on the CRC/polynomial, my limited understanding of CRC would say the more data you put in the less accurate it is. But don't zero'd data reduce accuracy when performing the shifts?
Using zeros will be no different than some other value you might pick. The input word will be just as well spread among the CRC bits either way.
I agree with Mark Adler that zeros are mathematically no worse than other numbers. However, if the utility you can't change does something bad like set the initial CRC to zero, then choose non-zero pad words. An initial CRC=0 + Data=0 + Pads=0 produces a final CRC=0. This is technically valid, but routinely getting CRC=0 is undesirable for data integrity checking. You could compensate for a problem like this with non-zero pad characters, e.g. pad = -1.
Let us get an m bit-message where the last n bits are the CRC bits. As far as I know, in order to check if it is received correctly or not, we should XOR all m bits with the polynomial of the specific CRC algorithm. If the result is all-zeros, we can say there are no errors.
Here are my questions:
1) What about calculating the n CRC bits using the first (m-n) bits and then compare it to the last n bits of the received message? This way we can say there are no errors if the received and calculated n bits are equal. Is this approach true?
2) If it is true, which is more efficient?
Your description of how to check a CRC doesn't really parse. But anyway, yes, the way that a CRC check is normally done is to calculate the CRC of the pre-CRC bits, and then to compare that to the CRC sent. It is very marginally more efficient that way. More importantly, it is more easily verifiable to be correct, since that is the way the CRC is generated and appended on the other end.
That method extends to any style of check value, where other check values do not have the mathematical property of getting zero if you run the CRC through the algorithm after the data that precedes it. Also CRCs with pre- and post-conditioning, which is most of them, won't have that property either. You would need to un-post-condition, and then compare the result with the pre-condition value.