huffman encoding - c++

I am trying to implement the huffman algorithm for compression, which requires writing bits of variable length to a file. Is there any way in C++ to write variable length data with 1-bit granularity to a file?

No, the smallest amount of data you can write to a file is one byte.
You can use a bitset to make manipulating bits easier, then use an ofstream to write to file. If you don't want to use bitset, you can use the bitwise operators to manipulate your data before saving it.

The smallest amount of bits you can access and save is 8 = 1 byte. You can access bits in byte using bit operators ^ & |.
You can set n'th bit to 1 using:
my_byte = my_byte | (1 << n);
where n is 0 to 7.
You can set n'th bit to 0 using:
my_byte = my_byte & ((~1) << n);
You can toggle n'th bit using:
my_byte = my_byte ^ (1 << n);
More details here.

klew's answer is probably the one you want, but just to add something to what Bill said, the Boost libraries have a dynamic_bitset that I found helpful in a similar situation.

All the info you need on bit twiddling is here:
How do you set, clear, and toggle a single bit?
But the smallest object that you can put in a file is a byte.
I would use dynamic_bitset and every time the size got bigger than 8 extract the bottom 8 bits into a char and write this to a file, then shift the remaining bits down 8 places (repeat).

No. You will have to pack bytes. Accordingly, you will need a header in your file that specifies how many elements are in your file, because you are likely to have trailing bits that are unused.

Related

Unpacking a stream of values of bit size not divisible by 8

I've spent too many hours on this, and at this point I think I need some help from the experts.
I have a const uint8_t* buffer, an integer data type (say, uint16_t), and I know that the buffer contains packed samples m bits each where m is not divisible by 8 (say, m=12 bits). Knowing that buffer holds N samples, I need to return an std::vector<uint16_t> containing the values of these N samples expanded to uint16_t.
So, each three bytes (24 bits) of the buffer contain two 12-bits samples I need to process. I want to implement a generalized function
template <typename OutputType, int BitsPerSample>
std::vector<OutputType> unpack(const uint8_t* data, const size_t numSamplesToUnpack);
Assume the data is big endian and OutputType is some integer type that can hold the sample value without truncating it.
I understand bit manipulation. I understand how this can be implemented, in principle. But I don't understand how to implement it elegantly and concisely. Got any ideas?
Also, is there a special name or term for this problem?
Maybe you can try reading single bits at a time, and keep a running counter of how many bits you have processed. When you consume 8 bits, you can increment your buffer pointer.
This doesn't mean you have finished unpacking that sample, so you'll need to also keep a "bits_left" counter in case you need to shift the buffer pointer before you are done unpacking a sample.
Use a 32-bit word as a buffer. If it has less than 12 bits, read another byte. Otherwise, output a 12-bit word.

How do I read and write a stream 3 bits at a time?

I'm trying to make an ultra-compressed variant of brainfuck, which is an esoteric programming language with 8 instructions. Since 3 bits is the minimum amount of storage to store 8 values, I went with that. The part I'm stuck on is how to read a number of bits that isn't a power of 2.
I tried using std::bitset, but that just serializes to a string that is 1 byte per bit, which is the opposite of what I want. How would I go about this?
Read 3 bytes at a time, and split those into 8 packs of 3 bits each using the >> and & operators. Put these in an ordinary uint8_t array to simplify later access and jumps.
You don't read bits from a stream, you read bytes from a stream.
So, you must do so, then shuffle the component bits around as you wish using bitwise arithmetic.
By the way, the fact that computers work in bytes also means that many of your programs (any that don't have a multiple of 8 instructions) are necessarily going to have wasted space.

Saving binary date into file in c++

My algoritm produces stream of 9bits and 17bits I need to find solution to store this data in file. but i can't just store 9 bits as int and 17bits as int_32.
For example if my algoritm produces 10x9bit and 5x17bits the outfile size need to be 22bytes.
Also one of the big problem to solve is that the out file can be very big and size of the file is unknown.
The only idea with I have now is to use bool *vector;
If you have to save dynamic bits, then you should probably save two values: The first being either the number of bits (if bits are consecutive from 0 to x), or a bitmask to say which bits are valid; The second being the 32-bit integer representing your bits.
Taking your example literally: if you want to store 175 bits and it consists of unknown number of entities of two different lengths, then the file absolutely cannot be only 22 bytes. You need to know what is ahead of you in the file, you need the lengths. If you got only two possible sizes, then it can be only a single bit. 0 means 9 bit, 1 means 17 bit.
|0|9bit|0|9bit|1|17bit|0|9bit|1|17bit|1|17bit|...
So for your example, you would need 10*(1+9)+5*(1+17) = 190 bits ~ 24 bytes. The outstanding 2 bits need to be padded with 0's so that you align at byte boundary. The fact that you will go on reading the file as if there was another entity (because you said you don't know how long the file is) shouldn't be a problem because last such padding will be always less than 9 bits. Upon reaching end of file, you can throw away the last incomplete reading.
This approach indeed requires implementing a bit-level manipulation of the byte-level stream. Which means careful masking and logical operations. BASE64 is exactly that, only being simpler than you, consisting only of fixed 6-bit entities, stored in a textfile.

bit shifting - replacing a section of a bitset with a new number

I have a list of numbers encoded as a boost dynamic bitset. I dynamically choose the size of this bitset depending on the maximum value any number in this list can take. So let's say I have numbers from just 0 to 7, I only need three bits and my string 0,2,7 will be encoded as
000010111.
I now need to change say the 2nd number in this list (2) to another number, say 4.
I thought the most efficient way to do this would be to represent 4 as a dynamic bitset of the same length as the list but with all other values set to 1, so 111111011. I would then bitshift this the required amount using with 1s used to fill in values to get 111011111, and then just bitwise AND this with the original bitset to get my desired result.
However, I cannot find a way to do these two things, as it seems with both initialisation of a bitset from an integer, and when bit shifting, the default and fill in values are always set to 0, not 1. How can I get around this problem, or achieve my goal in a different and efficient way.
Thanks
If that is really the implementation, the most general and efficient method I can think of would be to first mask off all the bits for the part you are replacing:
value &= 111000111;
Then "or" in the actual bits for that position:
value |= 000011000;
Hopefully someone here has a better trick for me to learn, but that's what I do.
XOR the old value and the new value:
int valuetoset = oldvalue ^ newvalue; // 4 XOR 2 in your example
Just shift the value you need to set:
int bitstoset = valuetoset << position; // (4 XOR 2) << 3 in your example
Then XOR again bitstoset with your bitset and that's it !
int result = bitstoset ^ bitset;
Would you be able to use a vector of dynamic bitsets? Depending on your needs that might be sufficient and allow for easy updates.
Alternately fill your new bitset similiarly to how you proposed, but exactly inverted. Then right before you do the and at the end, flip all the bits.
I guess your understanding of bitset is elementary wrong:
set means it is NOT ordered, and the idea of a bitset is, that only one bit is necessary to show that the element is in-/outside the set.
So your original set 0,2,7 would have 8 bits because 0..7 are 8 elements and NOT 3 * 3 (3 bits required to represent 0..7), and the bitmap would look like 10000101.
What you describe is just a "packed" coding of the values. In your coding scheme 0,2,7 and 2,0,7 would coded completly different, but in a bitset they are the same.
In a (real) bitset (if that is what you want) you can then really easy "replace" elements by removing the old and adding the new. This happens as T.E.D. describes it.
To get the right mask you can easily use shift operations. So imagine you start counting by 0, you get the mask for value x by doing: 1<<x;
So you remove element x from the set by
value &= ~(1<<x);
and add another elemtn x (which might be the same) with
value | = 1<<x;
From you comment you misuse the bitset, so the masks must be build different (and you already had an almost right idea how to build them).
The command with bitmask for removal of element at position p:
value &= ~(111 p);
This 111 is for the above example where you need 3 bit for a position. If you dont want to hardcode it, you could for just take the next power of 2 and subtract 1 and then you got your only-1-string.
And to add you would just take your suggestest bitlist that contains only the new element and OR it to your bitlist:
value |= new_element_bitlist;

How can I set all bits to '1' in a binary number of an unknown size?

I'm trying to write a function in assembly (but lets assume language agnostic for the question).
How can I use bitwise operators to set all bits of a passed in number to 1?
I know that I can use the bitwise "or" with a mask with the bits I wish to set, but I don't know how to construct a mask based off some a binary number of N size.
~(x & 0)
x & 0 will always result in 0, and ~ will flip all the bits to 1s.
Set it to 0, then flip all the bits to 1 with a bitwise-NOT.
You're going to find that in assembly language you have to know the size of a "passed in number". And in assembly language it really matters which machine the assembly language is for.
Given that information, you might be asking either
How do I set an integer register to all 1 bits?
or
How do I fill a region in memory with all 1 bits?
To fill a register with all 1 bits, on most machines the efficient way takes two instructions:
Clear the register, using either a special-purpose clear instruction, or load immediate 0, or xor the register with itself.
Take the bitwise complement of the register.
Filling memory with 1 bits then requires 1 or more store instructions...
You'll find a lot more bit-twiddling tips and tricks in Hank Warren's wonderful book Hacker's Delight.
Set it to -1. This is usually represented by all bits being 1.
Set x to 1
While x < number
x = x * 2
Answer = number or x - 1.
The code assumes your input is called "number". It should work fine for positive values. Note for negative values which are twos complement the operation attempt makes no sense as the high bit will always be one.
Use T(~T(0)).
Where T is the typename (if we are talking about C++.)
This prevents the unwanted promotion to int if the type is smaller than int.