Incorrect CRC32 output unless input bytes are equal - crc

I need to reproduce the output of a hardware CRC calculator for testing purposes. It is set up for ethernet CRC 32, in trace32 format:
/CRC 32. 0x04C11DB7 1 1 0FFFFFFFF 0FFFFFFFF (checksum width, polynom, input reflection, output reflection, crc init, final XOR)
If I feed it values where the 4 bytes are equal (e.g 0x12121212 as each byte being 0x12), the output will match what I calculate using CRC32 or python.
However if I feed it any value where the 4 bytes are not equal, the results are off. For example 0x12341234 will return 0x093c454b (should be 0xa1768922).
Or 0x12345678 will return 0xAF6D87D2 (should be 0x4a090e98).
In the HW I can only select the init value and the polynomial, beyond feeding it 4 bytes to calculate. Rolling calculations(multiple words worth) behave the same way, the output is correct as long as each word fed to it has all bytes the same. Anything else, the output is off.
I am not too knowledgeable about CRC32 and has run out of ideas. Any pointers would be appreciated, what am I doing wrong here? I have double checked the polynomial, but if that was wrong, I could only get the right results extremely rarely, right?
Thank you!

You get your "should be" with the correct byte order. You get your "will return" by reversing the bytes. The bytes 12 34 56 78 gives, for the CRC you described, 0x4a090e98. The bytes 78 56 34 12 gives the CRC 0xaf6d87d2.
Your question is very vague, with no code or how it is being used. I can only guess that you are giving your CRC routine a 32-bit value instead of bytes, and it is being processed in little-endian order when you are expecting big-endian order.

Related

Calculating bytes from sextets in C/C++ for steganography

This should be a simple question. I'm well advanced in developing steganographic code in C, which required manipulating the least significant bit in each R, G, and B channel of a 24 bit (3 byte) pixel of an image. A pair of pixels has 6 bits (which I call a sextet for want of a better word) that can be used, and I have developed code that converts a buffer in bytes to a buffer in sextets, where each byte in the latter buffer only uses the 6 lower order bits, with the upper 2 bits being discarded when changing pixels. This all works correctly, and I can encode text in any language in an image.
In doing this the application calculates the number of sextets that can be embedded in an image. However, it is useful to know how many bytes can be processed, as both the input is originally in bytes, and the output is recovered in bytes. As 4 sextets correspond to 3 bytes, I'm using the statement:
maxNumBytes = (3 * maxNumSexts - 2 * (maxNumSexts % 4)) / 4;
which converts and rounds down to a multiple of 3, where maxNumSexts and maxNumBytes are respectively the maximum number of sextets and bytes that can be hidden in an RGB image, and these two variable have the type int32_t. This formula works but is rather cumbersome, and I was wondering if someone could find something simpler that works correctly.
Incidentally, although the code is in C, this applies exactly in C++, hence that has been included as a tag, and some C++ code may be added later.
Many thanks for any suggestions.
I want all values between 24 and 27 to evaluate to 18, and likewise values between 28 and 31 to evaluate to 21, etc.
Since you want only multiples of 3, the last operation should be the multiplication by 3. And the "steps" on the input value is by 4 increments. So you can use this formula in integer arithmetic:
maxNumBytes = 3 * (maxNumSexts / 4);
Note 1: However, the actual number of bytes encoded by 27 sextets is 20, because 27 sextets contain 81 bits.
Note 2: Yes, a half byte is called a "nibble", from the verb. The form "nybble" is known, but rarely used.

JPEG Huffman Table

I have a question regarding the JPEG Huffman Table and using the Huffman Table to construct the symbol/binary string from a Tree. Suppose, that in an Huffman Table for 3-Bit code Length the number of codes is greater than 6, then how do we add all those codes in the Tree? If I am correct only 6 codes can be added at the 3-bit level/depth of the tree. So, how do we add the remaining codes if they won't fit in that level? Do we just ignore them?
Example
code length | Total Codes | Codes
3-Bit | 10 | 25 43 34 53 92 A2 B2 63 73 C2
In the above example if we go by order of constructing symbols/binary string for the code then up 'til A2 we can add codes in the tree at level 3-Bit, but what about B2,63,73,C2 etc? It's not possible to add them at 3-Bit level of the tree? So what do we do with them?
Well, clearly, the absolutely highest number of "things" that can be represented in 3 bits is 8 - (000, 001, 010, 011, 100, 101, 110, 111).
In Huffman encoding, bits represent "left" or "right" in a trie data-structure, to be able to "continue", you have to use SOME codes for "this continues another level", which is why not all 8 values can be encoded in 3 bits. If you have more values to encode, you need to use more bits (for some values - this is the whole point of Huffman coding, that SOME combinations are short, others are longer, and sometimes even longer than the original, but because it's based on what is the most common, it's fine, because they will be rare...)
How to construct and decode a Huffman tree is about four-five pages in your typical Algorithms book, and if you haven't got one of those, you probably want to find one - either a real paper one, or an e-book. There are LOTS of them - I'm not going to recommend one, since the ones I have are all about 15+ years old.
I should add that I think your question is missing something. Clearly, 3 bits can not possibly represent 10 values. And you can't build a [meaningful] Huffman tree on 10 values that all different - unless the idea is to split the values into pairs of {2,5}, {4,3}, {3,4}, {5,3}, {9,2}, {A,2}, {B,2}, {6,3}, {7,3}, {C,2} - which gives a fair number of repeated values - frequency of those are:
2 : 5
3 : 5
4 : 2
5 : 2
6 : 1
7 : 1
9 : 1
A : 1
B : 1
C : 1
But that's stil too many to represent anything meaningful...
Or is it the other way around, that we are supposed to use the bit values of those to decode? In which case we'd need the tree built from the original data to decode it...
In JPEG, a Huffman code can be up to 16-bits. The DHT market contains an array of 16 elements giving the number of codes for each length.
The JPEG standard explains how to use the code counts to do the Huffman translation. It is one of the few things explained in detail.
This book explains how it is done from a programmers perspective.
JPEG Book
The number of codes that exists at any code length depends upon the counts for other lengths.
I am wondering if you are really looking at the count of codes for length 4 rather than 3.
It looks like you're not following the correct procedure when creating your Huffman codes from the JPEG table. The count provided will fit in the number of bits unless the table has been corrupted. The reading out of the codes from a DHT marker is really simple. The more complicated part is how you define your lookup table from that data. A logical (but not practical) way is to create a reverse lookup table that's the maximum code length in size (16-bits = 65536 entries in the table). Then to decode your JPEG data, just pick up 16-bits of compressed data from the input stream and use it as an index in the table where you'll have the symbol and actual length of the code. I came up with a way to use a single, much smaller lookup table. I'm not going to share my specific code table method. What I will share is the basic format of the loop to create the codes from a DHT marker:
int iCurrentCode; // the current Huffman code
int iLength; // the code length in bits that you're working on
int i;
int iCount; // the number of codes defined for this length
int iSymbol; // JPEG symbol defined for each Huffman code
unsigned char *pData; // pointer to the data in the DHT marker
iCurrentCode = 0; // start with a Huffman code of 0
for (iLength = 1; iLength <= 16; iLength++)
{
iCount = *pData++; // get number of symbols for this bit length
for (i=0; i<iCount; i++) // read each of the codes for this bit length
{
iSymbol = *pData++; // get the JPEG symbol value (e.g. RRRR/SSSS value)
// It's up to you to create a lookup table from the code and its value
iCurrentCode++; // the Huffman bit pattern just increments for each code value
} // for each code defined at this bit length
iCurrentCode <<= 1; // shift the code left 1 bit to advance to the next bit length
} // for each bit length

Outputting Huffman codes to file

I have a program that reads a file and saves the frequency of each character. It then constructs a huffman tree based on each character's frequency and then outputs to a file the huffman codes for the tree.
So an input like "Hello World" would output this sequence to a file:
01010101 0010 010 010 01010 0101010 000 01010 00101 010 0001
This makes sense because the most frequent characters have the shortest codes. The issue is, this increases the file size ten-fold. I realized the reason why is because each 1 and 0 is being represented in memory as its own character, so they get each get expanded out to a byte of data.
I was thinking what I could do is convert each code (E.G. "010") to a character and save that to file - but that still would pad the code to be a byte long (Or mess it up if the code is longer than a byte).
How do I go about this? I can give code snippets if needed - I'm basically saving each code into a string so that's why the file's coming out so big (It's outputting each "bit" as a byte). If I were to convert the code to a long for example, then a code like 00010 would be represented as 2 and a code like 010 would also be represented as 2.
You basically have to do it a byte (or a word) at a time. Maintain a byte which you fill with bits, and a record of how many bits have been filled in so far. When you get to 8, write the byte and start over with an empty one.

Hamming SEC/DED extra parity bit

I'm having some troubles with the SEC/DED error correction code. It seems I've found some cases in which the decoder thinks a double bit flip occured but only one really occured. I suppose I did somthing wrong, but I was not able to understand what.
Let me show you an example.
Suppose I want to encode the 4 bits 1011 using a (7,4) code plus an extra bit needed to perform the two-error-detection. The coded word should be 00110011, where the most significant bit is the extra parity bit, the following two are p0 and p1 and so on.
Now, let's suppose that during a transmission the less significant bit is flipped; thus the received word will be 00110010. The receiver will extract from this code the four received data bits 1010 and will construct a new code which will result 01011010. Finally the receiver will perform a bitwise xor of the two codes obtaining 0111. The last three bits says that bit 7 has been flipped (which is right), but the first bit is 0 and, as far as i know, the decoder should consider this situation as if more than a bit flip has occured.
What did I do wrong?
I think I've solved the problem.
In the example above I calculate the syndrome and then I compute a new overall parity bit of the resultant codeword. Instead, I should check the overall parity of the received word and set the error_happened boolean to that value; then calculate the syndrome.

Reading in Intel Hex file for sorting C++

I need to read in an Intel Hex file which looks something like this:
:0300000002F8D42F
:07000300020096000000005E
:07000B000200B50000000037
:030013000200D414
:03001B000200F3ED
(Yes, some lines are missing and sometimes 1 line only contains 1 byte)
The : is the start code
First 2 bytes is the byte count
Next 4 are the address in memory
Next 2 is record type
The rest is the data (except the last 2 bytes)
last 2 bytes are the checksum
More info here (wikipedia)
I need to end up with something like this (no periods, only there for readability):
:10.addr.RT.10bytesofdata.CK
If there is no data from the file for an address I am filling it with 'FF'
So what is the best way to read in and store a file like this if I am going to need to divide up and sort the information by address, byte for byte.
I was hoping to read byte by byte (?) storing the appropriate values into a 2D integer array ordered by the address.
[BC][ADDR][RT][b1][b2][b3][b4][b5][b6][b...16][ck]
[BC][ADDR][RT][b1][b2][b3][b4][b5][b6][b...16][ck]
...
I would like to stay away from using strings so I can more easily calculate checksums.
Also I am using Visual Studio.
Thanks for the help I can post more info if this was not clear enough.
Update So right now I think I'm reading in with something like this:
fscanf_s(in_file,"%2X", &BC);
fscanf_s(in_file,"%4X", &ADDR);
fscanf_s(in_file,"%2X", &RT);
The I'll print out to a file like this:
fprintf_s(out_file,"%2X", BC);
fprintf_s(out_file,"%04X", ADDR); //this pads with zeros if needed and forces 4 "digits"
fprintf_s(out_file,"%2X", RT);
Now I'm working on a routine for the data. Let me know if anyone has any good ideas. Thanks
I would suggest a Dictionary<RT, byte[]>, and just use a single flat array. Then stride through that array calculating checksums and building the output lines, if all bytes in the line were 0xFF then you can skip appending that line to your output.
Maybe Dictionary<RT, List<byte>> if you can't predict the size of each memory space in advance, but since 4 nibbles of address only allows 64k, I'd just allocate each array to that space immediately.
I'm not sure about a 2D array -- I'd just start with a big 1D array representing (in essence) the target address space. Pre-fill it with FF. Walk through the records from the hex file, and:
fill the values into your array, and
keep track of the highest and lowest addresses encountered.
When you're done, start from the lowest address encountered, and encode data 10 (10h?) bytes at a time until you reach the highest address you wrote to.