Read any number of bits from ifstream - c++

I'm currently working with SWFFiles.
In SWF headers ist RECT, which is built with 5 fields. First one is 5bit field(nBits -> used to specify length of others fields.
How should look like a method, which takes one argument(how many bits read) and reads it from ifstream?
SWF File format specification
Thanks, S.

C++ file streams are byte-oriented. You can't read specific numbers of bits from them (unless the number is a multiple of 8 of course).
To get just 5 bits, you'll have to read an entire byte and then mask off the 8 bits of interest. If that byte also holds another field you'll have to keep it around for use later. If you make this generic enough, you could write your own "bit stream" class that buffers unused portions of bytes internally.
To obtain the low-order (least significant) 5 bits of a byte:
unsigned char bits = byte & 0x1F; // note 0x1F = binary 00011111
To obtain the high-order (most significant) 5 bits:
unsigned char bits = byte >> 3; // shift off the unused 3 low bits

Related

Bit ordering / Endianness flac decoding

I'm currently trying to write a FLAC to WAV transcoder as an exercise in C++, and I am currently struggling a bit the wording of the FLAC format regarding bit ordering.
Here is the (little) section talking about ordering:
All numbers used in a FLAC bitstream are integers; there are no floating-point representations. All numbers are big-endian coded. All numbers are unsigned unless otherwise specified.
Does this apply to bit-ordering, as well as byte-ordering?
More specifically, if I read, say, a 7 bits value, do I get the most-significant bit 1st?
Bit ordering should never be an issue unless you're using a struct with bitfields (which is a good reason to avoid then).
Also, you can only read data one byte at a time. If you want to read 7 bits out of a byte, you need to apply a bitmask to the byte value.
For example, if a byte contains one value in the high order bit and another in the low order 7 bits, you would extract them as follows:
field1 = (byte & 0x80) >> 7;
field2 = byte & 0x7f;

Java convert decimal to 12 bits with bit wise operator &

I'm using Java, for this.
I have the code 97 which represents the 'a' character is ascii. I convert 97 to binary which gives me 1100001 (7 bits) I want to convert this to 12 bits, I can add leading 0's to the existing 7 bits until it reaches 12 bits, but this seems inefficient. I've been thinking of using the & bit wise operator to make zeros all but the lowest bits of 97 to reach 12 bits, is this possible and how can I do it?
byte buffer = (byte) (code & 0xff);
Above line of code will give me 01100001 no?
which gives me 1100001 (7 bits)
Your value buffer is 8 bits. Because that's what a byte is: 8 bits.
If code has type int (detail added in comment below) it is already a 32-bit number with, in this case, 25 leading zero bits. You need do nothing with it. It's got all the bits you're asking for.
There is no Java integral type with 12 bits, nor is one directly achievable, since 12 is not a multiple of the byte size. It's unclear why you want exactly 12 bits. What harm do you think an extra 20 zero bits will do?
The important fact is that in Java, integral types (char, byte, int, etc.) have a fixed number of bits, defined by the language specification.
With reference to your original code & 0xff - code has 32 bits. In general these bits could have any value.
In your particular case, you told us that code was 97, and therefore we know the top 25 bits of code were zero; this follows from the binary representation of 97.
Again in general, & 0xff would set all but the low 8 bits to zero. In your case, that had no actual effect because they were already zero. No bits are "added" - they are always there.

Saving a Huffman Tree compactly in C++

Let's say that I've encoded my Huffman tree in with the compressed file. So I have as an example file output:
001A1C01E01B1D
I'm having an issue saving this string to file bit-by-bit. I know that C++ can only output to file one byte at a time, so I'm having an issue storing this string in bytes. Is it possible to convert the first three bits to a char without the program padding to a byte? If it pads to a byte for the traversal codes then my tree (And the codes) will be completely messed up. If I were to chop this up one byte at a time, then what happens if the tree isn't exactly a multiple of 8? What happens if the compressed file's bit-length isn't exactly a multiple of 8?
Hopefully I've been clear enough.
The standard solution to this problem is padding. There are many possible padding schemes. Padding schemes pad up to an even number of bytes (i.e., a multiple of 8 bits). Additionally, they encode either the length of the message in bits, or the number of padding bits (from which the message length in bits can be determined by subtraction). The latter solution obviously results in slightly more efficient paddings.
Most simply, you can append the number of "unused" bits in the last byte as an additional byte value.
One level up, start by assuming that the number of padding bits fits in 3 bits. Define the last 3 bits of an encoded file to encode the number of padding bits. Now if the message uses up no more than 5 bits of the last byte, the padding can fit nicely in the same byte. If it is necessary to add a byte to contain the padding, the maximum gap is 5+2=7 (5 from the unused high bits of the extra byte, and 2 is the maximum possible space free in the last byte, otherwise the 3-bit padding value would've fit there). Since 0-7 is representable in 3 bits, this works (it doesn't work for 2 bits, since the maximum gap is larger and the range of representable values is smaller).
By the way, one of the main advantages of placing the padding information at the end of the file (rather than as a header at the beginning of the file) is that the compression functions can then operate on a stream without having to know its length in advance. Decompression can be stream-based as well, with careful handling of EOF signals.
Simply treat a sequence of n bytes as a sequence of 8n bits. Use the >> or <<, |, and & operators to assemble bytes from the sequence of variable-length bit codes.
The end of the stream is important to handle properly. You need an end of stream code so that the decoder knows to stop and not try to decode the final padding bits that complete the last byte.

How to assign only 16 bits to any integer in a binary file instead of the normal 32 in C++?

I have a program to create a compressed file using LZW algorithm and employing hash tables. My compressed file currently contains integers corresponding to the index of hashtable.
The maximum integer in this compressed file is around 46000, which can easily be represented by 16 bits.
Now when i convert this "compressedfile.txt" to a binary file "binary.bin"(to further reduce the file size) using the following code, I get 32 bit integers in my "binary.bin" file. E.g. if there is a number 84 in my compressed file, it converts to 5400 0000 in my binary file.
std::ifstream in("compressedfile.txt");
std::ofstream out("binary.bin", ios::out | std::ios::binary);
int d;
while(in >> d)
{out.write((char*)&d, 4);}
My question is can't I discard the ending '0000' in '5400 0000' which uses up an extra 2 bytes in my file. This is the case with every integer since my max integer is 46000 which can be represented using only 2 bytes. Is there any code that can set the base of my binary file that way? I hope my question is clear.
It's writing exactly what you tell it to, 4 bytes at the address of d (an integer, 32 bit on many platforms). Use a 16 bit type and write 2 bytes instead:
uint16_t d; // unsigned to ensure it's large enough to hold your max value of 46000
while (in >> d) out.write(reinterpret_cast<char*>(&d), sizeof d);
Edit: As pointed out in the comments, for this code and the data it generates to be portable across processor architectures you should pick an endianness convention for the output. I'd suggest using htons() to convert your uint16_t to network byte order which is widely available, though not (yet) part of the C++ standard.

How would one handle bits from a file?

ifstream inStream;
inStream.open(filename.c_str(), fstream::binary);
if(inStream.fail()){
cout<<" Error in opening file: "<<filename;
exit(1);
}
Let's say we just want to deal with individual bits from the file. I know we can read the file char by char, but can we read it bit by bit just as easily?
Files are typically read in units that are greater than a bit (usually a byte or above). A single bit file would still take at least a whole byte (actually, it would take multiple bytes on disk based on the file system, but the length could be determined in bytes).
However, you could write a wrapper around the stream that provides the next bit every time, while internally reading a character, supplying bits whenever asked, and reading the next character from the file when there is a request that could not longer be filled from the previous character. I assume that you know how to turn a single byte (or char) into a sequence of bits.
Since this is homework, you are probably expected to write this yourself instead of using an existing
library.
You'll have to read from the file byte by byte and then extract bits as needed from the read byte. There is no way to do IO at bit level.
I guess your binary file is the huffman encoded and compressed file. You'll have to read this file byte by byte, then extract bits from these bytes using bitwise operators like:
char byte;
// read byte from file.
unsigned char mask = 0x80; // mask for bit extraction.
byte & mask // will give you the most significant bit.
byte <<= 1; // will left sift the byte by 1 so that you can read the next MSB.
you can use the read bits to descend the huffman tree till you reach a leaf node, at which point you've decoded a symbol.
Depending on what you're doing with the bits, it may be easier to read by the 32-bit word rather than by byte. In either case you're going to be doing mask and shift operations, the specifics of which are left as the proverbial exercise for the reader. :-) Don't be discouraged if it takes several tries; I have to do this sort of this thing moderately often and I still get it wrong the first time more often than not.