CRC on portion of data

CRC on portion of data - crc

Is there any idea how to do the following?
For example I have a 0x1100000000 (5 bytes total, 1 actual data byte and 4 zero bytes (padding))
I can calculate the needed CRC (crc32 in my case) on this data portion without any problems.
Then knowing the totacl CRC of 5 bytes and knowing the number of zero padding padding bytes is it possible to evaluate the CRC of the data byte (0x11 in my case)?
-Thanks in advance

Related

What does "Addresses are for individual bytes (8 bits)" mean?

Can someone explain this for me?
Addresses are for individual bytes (8 bits)
I have pasted the entire paragraph below:
The MIPS has a 32 bit architecture, with 32 bit instructions, a 32
bit data word, and 32 bit addresses.
It has 32 addressable internal registers requiring a 5 bit register address. Register 0 always has the the constant value 0.
Addresses are for individual bytes (8 bits) but instructions must have
addresses which are a multiple of 4. This is usually stated as “instructions must be word aligned in memory.
Link to pdf:
http://web.cs.mun.ca/~paul/cs3725/material/review.pdf
In the code below, I don't understand IMem[i] = bitset<8>(line)

explain this “Addresses are for individual bytes (8 bits)” for me?
It means that size of a byte is 8 bits. Two adjacent addresses will be 8 bits apart. A 32 bit word consists of 4 bytes.
Furthermore it means that - even though address operands of instructions must be aligned to a 4 byte boundary as explained in the following sentence - each byte has a unique address.
By unique address, do you mean unique 5 bit values?
No. The memory addresses are 32 bit values.
where are addresses usually saved?
Where-ever any values are saved. In the given description, two possible places have been described: In memory, or in a register.

Unpacking a stream of values of bit size not divisible by 8

I've spent too many hours on this, and at this point I think I need some help from the experts.
I have a const uint8_t* buffer, an integer data type (say, uint16_t), and I know that the buffer contains packed samples m bits each where m is not divisible by 8 (say, m=12 bits). Knowing that buffer holds N samples, I need to return an std::vector<uint16_t> containing the values of these N samples expanded to uint16_t.
So, each three bytes (24 bits) of the buffer contain two 12-bits samples I need to process. I want to implement a generalized function
template <typename OutputType, int BitsPerSample>
std::vector<OutputType> unpack(const uint8_t* data, const size_t numSamplesToUnpack);
Assume the data is big endian and OutputType is some integer type that can hold the sample value without truncating it.
I understand bit manipulation. I understand how this can be implemented, in principle. But I don't understand how to implement it elegantly and concisely. Got any ideas?
Also, is there a special name or term for this problem?

Maybe you can try reading single bits at a time, and keep a running counter of how many bits you have processed. When you consume 8 bits, you can increment your buffer pointer.
This doesn't mean you have finished unpacking that sample, so you'll need to also keep a "bits_left" counter in case you need to shift the buffer pointer before you are done unpacking a sample.

Use a 32-bit word as a buffer. If it has less than 12 bits, read another byte. Otherwise, output a 12-bit word.

Can two different data blocks have the same CRC

Is it possible (even if very low possibility) to have two different data blocks (each 4K for example), and when calculating the CRC, they are found to match ?

Yes there will be conflicts. The number of possible combinations for 4K block would be 24096 * 28. The number of possible combinations for 32-bit CRC would be 232. So on average, 24072 different arrangements of bits in the 4K block will map to the same CRC number. Though the chance that you take two 4K blocks containing random data and they have a CRC match for 32-bit CRC is 2-32 i.e. a quarter of a billion.

Saving a Huffman Tree compactly in C++

Let's say that I've encoded my Huffman tree in with the compressed file. So I have as an example file output:
001A1C01E01B1D
I'm having an issue saving this string to file bit-by-bit. I know that C++ can only output to file one byte at a time, so I'm having an issue storing this string in bytes. Is it possible to convert the first three bits to a char without the program padding to a byte? If it pads to a byte for the traversal codes then my tree (And the codes) will be completely messed up. If I were to chop this up one byte at a time, then what happens if the tree isn't exactly a multiple of 8? What happens if the compressed file's bit-length isn't exactly a multiple of 8?
Hopefully I've been clear enough.

The standard solution to this problem is padding. There are many possible padding schemes. Padding schemes pad up to an even number of bytes (i.e., a multiple of 8 bits). Additionally, they encode either the length of the message in bits, or the number of padding bits (from which the message length in bits can be determined by subtraction). The latter solution obviously results in slightly more efficient paddings.
Most simply, you can append the number of "unused" bits in the last byte as an additional byte value.
One level up, start by assuming that the number of padding bits fits in 3 bits. Define the last 3 bits of an encoded file to encode the number of padding bits. Now if the message uses up no more than 5 bits of the last byte, the padding can fit nicely in the same byte. If it is necessary to add a byte to contain the padding, the maximum gap is 5+2=7 (5 from the unused high bits of the extra byte, and 2 is the maximum possible space free in the last byte, otherwise the 3-bit padding value would've fit there). Since 0-7 is representable in 3 bits, this works (it doesn't work for 2 bits, since the maximum gap is larger and the range of representable values is smaller).
By the way, one of the main advantages of placing the padding information at the end of the file (rather than as a header at the beginning of the file) is that the compression functions can then operate on a stream without having to know its length in advance. Decompression can be stream-based as well, with careful handling of EOF signals.

Simply treat a sequence of n bytes as a sequence of 8n bits. Use the >> or <<, |, and & operators to assemble bytes from the sequence of variable-length bit codes.
The end of the stream is important to handle properly. You need an end of stream code so that the decoder knows to stop and not try to decode the final padding bits that complete the last byte.

Read any number of bits from ifstream

I'm currently working with SWFFiles.
In SWF headers ist RECT, which is built with 5 fields. First one is 5bit field(nBits -> used to specify length of others fields.
How should look like a method, which takes one argument(how many bits read) and reads it from ifstream?
SWF File format specification
Thanks, S.

C++ file streams are byte-oriented. You can't read specific numbers of bits from them (unless the number is a multiple of 8 of course).
To get just 5 bits, you'll have to read an entire byte and then mask off the 8 bits of interest. If that byte also holds another field you'll have to keep it around for use later. If you make this generic enough, you could write your own "bit stream" class that buffers unused portions of bytes internally.
To obtain the low-order (least significant) 5 bits of a byte:
unsigned char bits = byte & 0x1F; // note 0x1F = binary 00011111
To obtain the high-order (most significant) 5 bits:
unsigned char bits = byte >> 3; // shift off the unused 3 low bits

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js