divide information into bits bitset library - c++

I am trying to get information from a file. The content of the file is unknown. I am opening the file using a fstream object and storing each piece of data into a unsigned char . The size however of a char is 8 bits. But I need to get the data into 16 bits segments. I am using the bitset library to do this.
while(file>>test2)
{
file>>test2;
bitset<16> foo(test2);
cout<<foo<<endl;
}
first of all only gives me 8 bits of information using characters, if I use another data type the programs does not outputs anything. Is there another library that breaks a data type into bits?

Related

c++ How can I change the size of a void* according to a file I want to process

I am currently trying to make a program that can read a .blend file. Well trying is the important part, since I am already stuck on reading the file block info.
Im gonna quickly explain my problem, please refer this page for context
So in the .blend header there is a char that determines wheter or not the pointer size, later used in the file info block (Or just fileBlock on the linked webpage) among other things, is 4 or 8 bytes long. From what I have read, in c++ the void pointer only changes size according to the target platform it was compiled for ( 8 bytes for 64 bit and 4 bytes for 32 bits ). However .blend files can have either one, regardless of the platform I presume.
Now since blender itself does also read its own files using c, there must be a way to change the pointer to match the required pointer size, according to the info in the header. However my best guess would be to dynamically allocate a void pointer array to either one or two pointers, which then makes actually using the data even more complicated.
Please help me find the intended way of handling the different pointer sizes!
Go back to the top of the wiki page and you will find the File Header structure. The header of a blend file starts with "BLENDER" which is followed by the pointer size for the file -
Size of a pointer
All pointers in the file are stored in this format
'_' (underscore) means 4 bytes or 32 bit
'-' (minus) means 8 bytes or 64 bits.
So by reading the eighth byte of the file you know the size of the pointers in the file.
if (file_bytes[7] == "_")
ptr_size = 4;
else if (file_bytes[7] == "-")
ptr_size = 8;
The copy of blender creating the file determines the sizes used for the file, so a 32bit build will save 32bit pointers in the file while a 64 bit build will save 64bit pointers.
You should also read the next byte, it tells you whether the file was saved as big or little endian, to see if you need to do any byte swapping. The use of blender on big endian machines might be getting smaller, but you may still come across big endian files.
Another important thing that doesn't seem to be mentioned, is that blend files can be compressed and often are. Reading a compressed blend file will mean using gzread() to read the file. A compressed file has the first two bytes set to 0x1f 0x8b
You will find the code that blender uses to read blend files in source/blender/blenloader.
Yup, that's painful. The solution is not to treat them as C++ at all. Instead, create your own class BlendPointer to abstract this away. Those would be read from a BlendFile, and that BlendFile would store whether its BlendPointers are 4 or 8 bytes on disk.

Why use a char array to store contents of a file opened in binary mode?

So from what I understand, when you open a file in binary mode using C++ the contents would be 0s and 1s right? If so, why would the official documentation about input/output with files use a char* array to store the contents? If we're only storing 0s and 1s, why not use a short/int?
The interpretations of short and int are architecture dependent while char is not. This is due to endianness where the bytes of these datatypes can be interpreted in different orders.

Is it possible to read bit to bit from a binary file with c++?

I'm new here so I'll try to be very clear with my issue. I've tried to get a direct answer, but when I check on other questions, they are very particular and I get confused.
I have a binary file and I need to read it for my project. I also have an specification sheet, and I'm reading the file accordingly to those specs. So I've created a cpp file, and writing a simple program to read each element. I use ifstream, and read() functions to read from file.
The problem is when on the specification sheet, I get that I need to read a bitstring with size 12. From the details, it's very clear that I should read only 12 bits for each of this elements. But I'm not really sure if reading bit to bit is possible. Rest of elements were read in bytes. And also, If I read 2 bytes each time and use bit "masks" to get 12 bits only, the rest of elements read after this does not match correctly. So my guess is that I really need to read only 12 bits.
So my question. Is it possible to read 12 bits from a binary file? or reading Bit to bit? . And I mean only 12, without reading bytes and then masking them.
Thanks a lot.
No, this is not possible.
What you should do is read 2 bytes, mask 12 bits to get the result you want but also store the other 4 bits somewhere. Now when you need 12 bits again, read only 1 byte and combine it with the 4 stored bits.
Assuming little endian.
read file to an array of uint8_t that is padded to a multiple of 6 bytes
make your access function
uint16_t get12Bits(uint8_t *ptr, int loc)
{
uint64_t temp;// use lower 48 bits
memcpy(&temp, ptr+(loc&~0x03), 6*uint8_t);//6bytes, 4 elements
return 0xfff&(temp>>(loc&0x03)*12);
}

Size of binary representation of a number

We generally say that the number 5 can be represented as a 3 bit binary number. But, if we convert 5 to its binary representation i.e. 101 and print it into a text file, it actually takes 3 bytes as it is read as a character array. How can I create a file (not necessarily a text file) such that the size of that file is 3 bits?
You can logically represent 5 as three bits, but neither the filesystem nor the memory management system (for RAM) will let you address space in units smaller than one byte.
If you had eight of these numbers, you could pack them into 24 bits = 3 bytes and store those "efficiently" in memory or a file. Efficiently in quotes, because while you save some space, it becomes difficult to work with the packed data as you need to bit-shift things around a lot. CPU instructions, memory loads, array indexing etc all don't work with less-than-byte units.
The most practical way would be to just use a whole byte for your three bits and live with the overhead.
I don't think you'll get a file system that will tell you the file is 3 bits. It will be at least a byte, plus storage for the file's extra information.
But you could simply open a file for writing and write 3 as binary.
FILE *ptr;
ptr = fopen("file", "wb");
fwrite('a', 1, 1, ptr);
You can use the following code and work based on this...the following code stores three numbers (5, 3 and 2) in a single byte. for storing the 3 numbers the file occupy only one byte. in general we can not store data in partial bytes in files.
#include<stdio.h>
struct bits
{
unsigned char first:3,second:3,third:2;
};
main()
{
struct bits b;
FILE *f;
b.first=5;
b.second=3;
b.third=2;
printf("\ninitial data:%u %u %u",b.first,b.second,b.third);
/*storing in file*/
f=fopen("bitsfile","w");
fwrite(&b,sizeof(b),1,f);
fclose(f);
/*reading back from file*/
f=fopen("bitsfile","r");
fread(&b,sizeof(b),1,f);
fclose(f);
printf("\ndata read from file:%u %u %u",b.first,b.second,b.third);
}
You generally can't since binary files have a minimal 'quantum' that is a byte ( 8 bit ).
There is something interesting about storing symbols with not homogeneous bit length using Huffman Encoding. Just to explain before you read the complete article: your alphabet symbols stays each one on a binary tree leaf. There is a single path of 1 ( ie left ) 0 (ie right) starting from the root and landing to your symbol. If the tree is unbalanced ( and it would be) different symbols can be represented uniquely with different bit length. of course there is some effort because you have to read the file always at byte level, and then unpack and handle the bits with your algorithm implementation.

Reading and Huffman compressing 4-byte binary string STD C++ Linux Environment

I am working on some homework for Huffman coding. I already have the Huffman algorithm completed, but need to slightly alter it to work with binary files. I have some spent some time reading related problems, and perhaps due to my lack of understanding of data types and binary files, I am still struggling a bit, so hopefully I am not repeating a prior question (I won't be posting code related to the huffman part of the program).
Here is the key phrase: "You can assume that each symbol, which will be mapped to a codeword, is a 4-byte binary string.", and what I think I know is that Char represents one byte and unsigned int represents four byte, so I am guessing I should be reading the input four bytes at a time into a unsigned int Buffer and then collect my data for the Huffman part of the program.
int main() {
unsigned int buffer;
fstream input;
input.open("test.txt", ios::in | ios::binary);
while(input) {
input.read(reinterpret_cast<char *>(&buffer), 4);
//if buffer does not exist as unique symbol in collection of data add it
//if buffer exists update statistics of symbol
}
input.close();
}
Does this look like a good way to handle the data? How should I handle the very end of the file if there are only 1,2, or 3 bytes left? So then I am just storing buffer as unsigned int in a struct. Just out of curiosity how would I recast buffer to a string of characters?
Edit: What's the best way to store the header of a Huffman compressed a file?
Does this look like a good way to handle the data?
Instead of casting a pointer, I would suggest using union of int and char [4] and passing pointer to the char array as you should be. Don't know what's the rest of the logic, so can't say if the actual handling (which is not in the code you posted) is done in a good way, but it seems to me rather trivial.
How should I handle the very end of the file if there are only 1,2, or 3 bytes left?
Assuming each symbol is 4 bytes long, I would expect that not be a valid input.
So then I am just storing buffer as unsigned int in a struct. Just out of curiosity how would I recast buffer to a string of characters?
Why would you do that? In your data, a "character" is 4 bytes. But you can just use casting to array of bytes if you want (or, better, use bitwise operations to extract the actual bytes, if the order matters).