Reducing a CRC32 value to 16 or 8 bit - crc

In a message framing scheme I want to secure packets with a CRC error detection. Those packets are sent over TCP connection.
For small packets with a length smaller than 16 bytes, I'd choose CRC8. For packets smaller than 4096 bytes a CRC16 algorithm. For larger packets the CRC32 algorithm.
The most attractive CRC implementation currently is the CRC32C because of hardware support (at least on some Intel CPUs). But there are no special instructions for 8bit and 16bit CRCs.
My question is now: is it possible to reduce the 32Bit value of the CRC32C algorithm to a 16 or 8 bit value, without hurting the error detection performance in comparison to native CRC16 or CRC8 algorithms?
An example:
char buffer[256];
...
uint32_t crc32 = compute_crc32c_of_block( buffer, 256 );
uint16_t fake_crc16 = ( crc32 >> 16 ) ^ crc32;
uint8_t fake_crc8 = ( fake_crc16 >> 8 ) ^ fake_crc16;
Is fake_crc8 as good as a real CRC8 implementation?
Thanks in advance.

The low 8 bits of a 32-bit CRC will not have as good error-correcting properties as an 8-bit CRC, e.g. the assurance of detecting burst errors. However it may be good enough for your application, depending the characteristics of the noise source. If you have a good number of bit errors with correlation in their location, then you should use a real CRC. If you have either rare bit flips or lots of gross errors, then a portion of a CRC would likely work just as well.
There is no substitute for testing to see how they perform.

If you can specify the polynomial for hardware CRC then padding it with zeros for the bits that you don't want will result in CRC that also has zeros at those bit positions. Then you simply discard them.
Using default data from the calculator 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 http://www.sunshine2k.de/coding/javascript/crc/crc_js.html) initial value 0x0, final xor 0x0 - here is the example
CRC8 with 0x7 poly is 0xF4
CRC16 with 0x700 poly is 0xF400
CRC32 with 0x07000000 is 0xF4000000
Now, the problem is that the hardware may not support this kind of polynomial. For example hardware 16bit SPI hardware CRC calculator in STM32 processors only supports odd polynomials. 0xF4 is odd, but 0xF400 is even and it produces garbage.

Related

Selection of CRC-16-CCITT

Cyclic redundancy checks are used often, and works well with proper config. The ITU's ("CCIT") CRC gets used a lot -
Ref - CRC16-CCITT Reference - Joe Geluso
Why are the ITU's CRC values used so frequently? A common 'default' found, so to speak, just curious as to why
Polynomial 0x11021 is used for floppy disks. Part of the reason for choosing that polynomial is that there are only three 1 bits in 0x1021, which simplifies hardware based CRC calculations. This is also true for 0x10007 (FOP-16) and 0x14003 (CRC16, CRC16-IBM), so I'm not sure why 0x11021 was chosen versus the other two somewhat common ones with only three 1 bits in the lower 16 bits.
0x11021 is also used for XMODEM (a serial file transfer program for old computers), which is typically implemented in software, where the number of 1 bits in the polynomial doesn't matter, but may have been chosen since it was used for floppy disks.
0x11021 is the product of two prime polynomials: 0xf01f and 0x3. The 0x3 (x+1) will detect any odd number of bit errors, and it's 2 bit error detection is good for up to 32751 data bits + 16 crc bits = 32767 bits, good enough for floppy disk sector sizes 128, 256, 512, and 1024 bytes (could also be used for 2048 bytes, but I don't recall a floppy disk with a 2048 byte sector size). I'm not aware of any advantage in the choice of a polynomial for single burst error detection. Some polynomials would be better for single burst error correction, but single burst correction isn't common.
The two other polynomials I mentioned are similar, 0x10007 = 0xfffd * 0x3 , 0x14003 = 0xc001 * 0x3.

gzip compression on non byte aligned data

Is bit packing detrimental to the performance of gzip?
Assume that I have 7 bit values and pack in the following way:
Byte1 Byte2 Byte3 Byte4
[aaaaaaab][bbbbbbcc][cccccddd][dddd...
As far as I understand LZ compression works on a byte basis.
Any repeating pattern in the 7 bits will be obscured.
Is it advisable to stuff with an extra bit for byte alignment to help LZ?
Byte1 Byte2 Byte3 Byte4
[aaaaaaa0][bbbbbbb0][ccccccc0][ddddddd0][...
Are there any results on that in literature?
Likely, yes. If your a, b, c, d's have repeating patterns or statistical bias in their frequency, then it should be better to stuff the zero bits.
The way to know is to simply test it.

How to calculate CRC of a WinRAR file?

I know CRC calculation algorithm from Wikipedia. About structure of RAR file I read here. For example, there was written:
The file has the magic number of:
0x 52 61 72 21 1A 07 00
Which is a break down of the following to describe an Archive Header:
0x6152 - HEAD_CRC
0x72 - HEAD_TYPE
0x1A21 - HEAD_FLAGS
0x0007 - HEAD_SIZE
If I understand correctly, the HEAD_CRC (0x6152) is CRC value of Marker Block (MARK_HEAD). Somewhere I read, that CRC of a WinRAR file is calculated with standard polynomial 0xEDB88320, but when size of CRC is less than 4 bytes, it's necessary to use less significant bytes. In this case (of course if I undestand correctly) CRC value is 0x6152, so it has 2 bytes. Now I don't know, which bytes I have to take as less significant. From the standard polynomial (0xEDB88320)? Then 0x8320 probably are less significant bytes of this polynomial. Next, how to calculate CRC of the Marker Block (i. e. from the following bytes: 0x 52 61 72 21 1A 07 00), if we have already right polynomial?
There was likely a 16-bit check for an older format that is not derived from a 32-bit CRC. The standard 32-bit CRC, used by zip and rar, applied to the last five bytes of the header has no portion equal to the first two bytes. The Polish page appears to be incorrect in claiming that the two-byte check is the low two-bytes of a 32-bit CRC.
It does appear from the documentation that that header is constructed in a standard way as other blocks in the older format, so that the author, for fun, arranged for his format to give the check value "Ra" so that it could spell out "Rar!" followed by a text-terminating control-Z.
I found another 16-bit check in the unrar source code, but that check does not result in those values either.
Oh, and no, you can't take part of a CRC polynomial and expect that to be a good CRC polynomial for a smaller check. What the page in Polish is saying is that you would compute the full 32-bit CRC, and then take the low two bytes of the result. However that doesn't work for the magic number header.
Per WinRAR TechNote.txt file included with the install:
The marker block is actually considered as a fixed byte sequence: 0x52 0x61 0x72 0x21 0x1a 0x07 0x00
And as you already indicated, at the very end you can read:
The CRC is calculated using the standard polynomial 0xEDB88320. In case the size of the CRC is less than 4 bytes, only the low order bytes are used.
In Python, the calculation and grabbing of the 2 low order bytes goes like this:
zlib.crc32(correct_byte_range) & 0xffff
rerar has some code that does this, just like the rarfile library that it uses. ReScene .NET source code has an algorithm in C# for calculating the CRC32 hash. See also How do I calculate CRC32 mathematically?

Difference between byte flip and byte swap

I am trying to find the difference becoz of byte flip functionality I see in Calculator on Mac with Programmer`s view.
So I wrote a program to byte swap a value which we do to go from small to big endian or other way round and I call it as byte swap. But when I see byte flip I do not understand what exactly it is and how is it different than byte swap. I did confirm that the results are different.
For example, for an int with value 12976128
Byte Flip gives me 198;
Byte swap gives me 50688.
I want to implement an algorithm for byte flip since 198 is the value I want to get while reading something. Anything on google says byte flip founds the help byte swap which isnt the case for me.
Byte flip and byte swap are synonyms.
The results you see are just two different ways of swapping the bytes, depending on whether you look at the number as a 32bit number (consisting of 4 bytes), or as the smallest size of a number that can hold 12976128, which is 24 bits or 3 bytes.
The 4byte swap is more usual in computer culture, because 32bit processors are currently predominant (even 64bit architectures still do most of their mathematics in 32bit numbers, partly because of backward compatible software infrastructure, partly because it is enough for many practical purposes). But the Mac Calculator seems to use the minimum-width swap, in this case a 3 byte swap.
12976128, when converted to hexadecimal, gives you 0xC60000. That's 3 bytes total ; each hexadecimal digit is 4 bits, or half a byte wide. The bytes to be swapped are 0xC6, zero, and another zero.
After 3byte swap: 0x0000C6 = 198
After 4byte swap: 0x0000C600 = 50688

compiler and endians

I have an ISA which is "kind" of little endian.
The basic memory unit is an integer and not byte.For example
00000000: BEFC03FF 00008000
Represents that the "low" integer is BEFC03FF and "high" integer is 00008000.
I need to read the value represented by some bits.For example bits 31 till 47.
What I am doing in VS10 (c++) generate uint64_t var = 0x00008000BEFC03FF
after it use relevant mask and check the value of var & mask.
Is it legal to do that way?I do some assumption about uint64_t bits arrangement - is it legal?
Can I suppose that for very compiler and for every OS (without dependency on hw) the arrangement of bits in the uint64_t will be this way?
You are right to be concerned, It does matter.
However, in this particular case, since ISA is little endian, i.e. if it has AD[31:0], the least significant bit of an integer is packed to bit 0. Assuming your processor is also little endian, then nothing to worry about. when the data written to memory, it should have the right byte order
0000 FF
0001 03
0002 ..
suppose, if your external bus protocol is big endian and your processor is little endian. then a 16 bit integer in your processor, say 0x1234 would be 0001_0010_0011_0100 in native format, but 0010_1100_0100_1000 on the bus (assuming it's 16 bit).
In this case, multi byte data crosses endian boundary, the hardware will only swap bits inside a byte, because it must preserve the memory contiguousness between bytes. after hardware swap, it becomes:
0000 0001_0010
0001 0011_0100
then it is up to the software to swap the byte order